Re: BlueStore _txc_add_transaction errors (possibly related to bug #38724)

Sage Weil <sage@xxxxxxxxxxxx> · Fri, 9 Aug 2019 14:27:18 +0000 (UTC)

On Fri, 9 Aug 2019, Florian Haas wrote:
> Hi everyone,
> 
> it seems there have been several reports in the past related to
> BlueStore OSDs crashing from unhandled errors in _txc_add_transaction:
> 
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-April/034444.html
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-January/032172.html
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-December/031960.html
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-December/031964.html
> 
> Bug #38724 tracks this, has been fixed in master with
> https://github.com/ceph/ceph/pull/27929, and is pending backports (and,
> I dare say, is *probably* misclassified as being only minor, as this
> does cause potential data loss as soon as it affects enough OSDs
> simultaneously):
> 
> https://tracker.ceph.com/issues/38724
> 
> We just ran into a similar issue with a couple of BlueStore OSDs that we
> recently added to a Luminous (12.2.12) cluster that was upgraded from
> Jewel, and hence, still largely runs on FileStore. I say similar because
> evidently other people reporting this problem have been running into
> ENOENT (No such file or directory) or ENOTEMPTY (Directory not empty);
> for us it's interestingly E2BIG (Argument list too long):
> 
> https://tracker.ceph.com/issues/38724#note-26

        {
            "op_num": 2,
            "op_name": "truncate",
            "collection": "2.293_head",
            "oid": "#-4:c96337db:::temp_recovering_2.293_11123'6472830_288833_head:head#",
            "offset": 4457615932
        },

That offsize (size) is > 4 GB.  BlueStore has a hard limit of 2^32-1 for 
object sizes (because it uses a uint32_t).  This cluster appears to have 
some ginormous rados objects.  Until those are removed, you 
can't/shouldn't use bluestore.

This makes me think we should have scrub issue errors if it encounters 
rados objects that are bigger than the configured limit.  And bluestore 
should refuse to start if the configured limit is > 4GB.  Or something 
along those lines...

sage

> So I'm wondering if someone could shed light on these questions:
> 
> * Is this the same issue as that which
> https://github.com/ceph/ceph/pull/27929 fixes?
> 
> * Thus, since https://github.com/ceph/ceph/pull/29115 (the Nautilus
> backport for that fix) has been merged, but is not yet included in a
> release, do *Nautilus* users get a fix in the upcoming 14.2.3 release,
> and once they update, would this bug go away with no further
> intervention required?
> 
> * For users on *Luminous*, since https://tracker.ceph.com/issues/39694
> (the Luminous version of 38724) says "non-trivial backport", is it fair
> to say that a fix might still take a while for that release?
> 
> * Finally, are Luminous users safe from this bug if they keep using, or
> revert to, FileStore?
> 
> Thanks in advance for your thoughts! Please keep Erik CC'd on your reply.
> 
> Cheers,
> Florian
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx