Re: RGW versioned objects lost after Octopus 15.2.3 -> 15.2.4 upgrade

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Chris,

There is new lifecycle processing logic backported to Octopus, it
looks like, in 15.2.3.  I'm looking at the non-current calculation to
see if it could incorrectly rely on a stale value (from an eralier
entry).

thanks,

Matt

On Wed, Aug 5, 2020 at 8:52 AM Chris Palmer <chris@xxxxxxxxxxxxxxxxxxxxx> wrote:
>
> This is starting to look like a regression error in Octopus 15.2.4.
>
> After cleaning things up by deleting all old versions, and deleting and
> recreating the bucket lifecycle policy (see below), I then let it run.
> Each day a new version got created, dating back to 17 July (correct).
> Until this morning when we went over the 18-day non-current expiration:
> this morning all but the latest version of each object disappeared -
> which is what happened the midnight following after our 15.2.3->15.2.4
> upgrade.
>
> So instead of a gently rolling 18-days of versions (on 15.2.3), we now
> build up to 18 days after which all non-current versions get deleted (on
> 15.2.4).
>
> Anyone come across versioning problems on 15.2.4?
>
> Thanks, Chris
>
> On 17/07/2020 09:11, Chris Palmer wrote:
> > This got worse this morning. An RGW daemon crashed at midnight with a
> > segfault, and the backtrace hints that it was processing the
> > expiration rule:
> >
> >     "backtrace": [
> >         "(()+0x12730) [0x7f97b8c4e730]",
> >         "(()+0x15878a) [0x7f97b862378a]",
> >         "(std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >::compare(std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > const&) const+0x23)
> > [0x7f97c25d3e43]",
> >         "(LCOpAction_DMExpiration::check(lc_op_ctx&,
> > std::chrono::time_point<ceph::time_detail::real_clock,
> > std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >
> > >*)+0x87) [0x7f97c283d127]",
> >         "(LCOpRule::process(rgw_bucket_dir_entry&, DoutPrefixProvider
> > const*)+0x1b8) [0x7f97c281cbc8]",
> >         "(()+0x5b836d) [0x7f97c281d36d]",
> >         "(WorkQ::entry()+0x247) [0x7f97c28302d7]",
> >         "(()+0x7fa3) [0x7f97b8c43fa3]",
> >         "(clone()+0x3f) [0x7f97b85c44cf]"
> >
> > One object version got removed when it should not have.
> >
> > In an attempt to clean things up I have manually deleted all
> > non-current versions, and removed and recreated the (same) lifecycle
> > policy. I will also create a new test bucket with a similar policy and
> > test that in parallel. We will see what happens tomorrow....
> >
> > Thanks, Chris
> >
> >
> > On 16/07/2020 08:22, Chris Palmer wrote:
> >> I have an RGW bucket (backups) that is versioned. A nightly job
> >> creates a new version of a few objects. There is a lifecycle policy
> >> (see below) that keeps 18 days of versions. This has been working
> >> perfectly and has not been changed. Until I upgraded Octopus...
> >>
> >> The nightly job creates separate log files, including a listing of
> >> the object versions. From these I can see that:
> >>
> >> 13/7  02:14   versions from 13/7 01:13 back to 24/6 01:17 (correct)
> >>
> >> 14/7  02:14   versions from 14/7 01:13 back to 25/6 01:14 (correct)
> >>
> >> 14/7  10:00   upgrade Octopus 15.2.3 -> 15.2.4
> >>
> >> 15/7  02:14   versions from 15/7 01:13 back to 25/6 01:14 (would have
> >> expected 25/6 to have expired)
> >>
> >> 16/7  02:14   versions from 16/7 01:13 back to 15/7 01:13 (now all
> >> pre-upgrade versions have wrongly disappeared)
> >>
> >> It's not a big deal for me as they are only backups, providing it
> >> continues to work correctly from now on. However it may affect some
> >> other people  much more.
> >>
> >> Any ideas on the root cause? And if it is likely to be stable again now?
> >>
> >> Thanks, Chris
> >>
> >> {
> >>     "Rules": [
> >>         {
> >>             "Expiration": {
> >>                 "ExpiredObjectDeleteMarker": true
> >>             },
> >>             "ID": "Expiration & incomplete uploads",
> >>             "Prefix": "",
> >>             "Status": "Enabled",
> >>             "NoncurrentVersionExpiration": {
> >>                 "NoncurrentDays": 18
> >>             },
> >>             "AbortIncompleteMultipartUpload": {
> >>                 "DaysAfterInitiation": 1
> >>             }
> >>         }
> >>     ]
> >> }
> >>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux