Re: RGW versioned objects lost after Octopus 15.2.3 -> 15.2.4 upgrade

Chris Palmer <chris@xxxxxxxxxxxxxxxxxxxxx> · Fri, 17 Jul 2020 09:11:43 +0100

This got worse this morning. An RGW daemon crashed at midnight with a 
segfault, and the backtrace hints that it was processing the expiration 
rule:

    "backtrace": [
        "(()+0x12730) [0x7f97b8c4e730]",
        "(()+0x15878a) [0x7f97b862378a]",
        "(std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >::compare(std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const&) const+0x23) 
[0x7f97c25d3e43]",
        "(LCOpAction_DMExpiration::check(lc_op_ctx&, 
std::chrono::time_point<ceph::time_detail::real_clock, 
std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > 
>*)+0x87) [0x7f97c283d127]",
        "(LCOpRule::process(rgw_bucket_dir_entry&, DoutPrefixProvider 
const*)+0x1b8) [0x7f97c281cbc8]",
        "(()+0x5b836d) [0x7f97c281d36d]",
        "(WorkQ::entry()+0x247) [0x7f97c28302d7]",
        "(()+0x7fa3) [0x7f97b8c43fa3]",
        "(clone()+0x3f) [0x7f97b85c44cf]"

One object version got removed when it should not have.

In an attempt to clean things up I have manually deleted all non-current 
versions, and removed and recreated the (same) lifecycle policy. I will 
also create a new test bucket with a similar policy and test that in 
parallel. We will see what happens tomorrow....

Thanks, Chris

On 16/07/2020 08:22, Chris Palmer wrote:
I have an RGW bucket (backups) that is versioned. A nightly job 
creates a new version of a few objects. There is a lifecycle policy 
(see below) that keeps 18 days of versions. This has been working 
perfectly and has not been changed. Until I upgraded Octopus...

The nightly job creates separate log files, including a listing of the 
object versions. From these I can see that:

13/7  02:14   versions from 13/7 01:13 back to 24/6 01:17 (correct)

14/7  02:14   versions from 14/7 01:13 back to 25/6 01:14 (correct)

14/7  10:00   upgrade Octopus 15.2.3 -> 15.2.4

15/7  02:14   versions from 15/7 01:13 back to 25/6 01:14 (would have 
expected 25/6 to have expired)

16/7  02:14   versions from 16/7 01:13 back to 15/7 01:13 (now all 
pre-upgrade versions have wrongly disappeared)

It's not a big deal for me as they are only backups, providing it 
continues to work correctly from now on. However it may affect some 
other people  much more.

Any ideas on the root cause? And if it is likely to be stable again now?

Thanks, Chris

{
    "Rules": [
        {
            "Expiration": {
                "ExpiredObjectDeleteMarker": true
            },
            "ID": "Expiration & incomplete uploads",
            "Prefix": "",
            "Status": "Enabled",
            "NoncurrentVersionExpiration": {
                "NoncurrentDays": 18
            },
            "AbortIncompleteMultipartUpload": {
                "DaysAfterInitiation": 1
            }
        }
    ]
}

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx