Re: crashing OSDs: ceph_assert(h->file->fnode.ino != 1)

Igor Fedotov <ifedotov@xxxxxxx> · Fri, 29 May 2020 13:36:56 +0300

Hi Simon,

your analysis is correct, you've stepped into an unexpected state for 
BlueFS log.

This is the second occurrence of the issue, the first one is mentioned at

https://tracker.ceph.com/issues/45519

Looking if we can get out of this state and how to fix that...

Thanks,

Igor

On 5/29/2020 1:05 PM, Simon Leinen wrote:
Colleague of Harry's here...

Harald Staub writes:
This is again about our bad cluster, with too much objects, and the
hdd OSDs have a DB device that is (much) too small (e.g. 20 GB, i.e. 3
GB usable). Now several OSDs do not come up any more.
Typical error message:
/build/ceph-14.2.8/src/os/bluestore/BlueFS.cc: 2261: FAILED
ceph_assert(h->file->fnode.ino != 1)
The context of that line is "we should never run out of log space here":

   // previously allocated extents.
   bool must_dirty = false;
   if (allocated < offset + length) {
     // we should never run out of log space here; see the min runway check
     // in _flush_and_sync_log.
     ceph_assert(h->file->fnode.ino != 1);

So I guess we are violating that "should", and the Bluestore code
doesn't handle that case.  And the "min runway" check may not be
reliable.  Should we file a bug?

Again, help on how to proceed would be greatly appreciated...
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx