bluefs transaction serialization atomicity.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi folks,

the preface:

This email has been inspired by the following ticket: https://tracker.ceph.com/issues/52079

which reports about two apparently similar bluefs corruptions caused by unexpected power off.

In the both cases BlueFS finally refuses to replay the resulting log with pretty the same pattern:

- read the next 4K block (op header),

- then do more block readings (additional 8K, apparently due to bluefs op/transaction being longer than 4K)

- which finally causes faulty replay termination due to transaction (not op header!) checksum failure (both times 'expected checksum', i.e. the one read from disk in the last(!) block, is equal to 0).

Here is the sample:

"debug 2021-08-14T05:35:38.827+0000 7f71b2b39080 10 bluefs _read h 0x559f918f7f80 0xb5000~1000 from file(ino 1 size 0xb5000 mtime 0.000000 allocated 410000 extents [1:0xb558660000~10000,1:0xb652780000~400000])

debug 2021-08-14T05:35:38.827+0000 7f71b2b39080 20 bluefs _read left 0x5b000 len 0x1000 debug 2021-08-14T05:35:38.827+0000 7f71b2b39080 20 bluefs _read got 4096debug 2021-08-14T05:35:38.827+0000 7f71b2b39080 20 bluefs _replay need 0x2000 more bytes debug 2021-08-14T05:35:38.827+0000 7f71b2b39080 10 bluefs _read h 0x559f918f7f80 0xb6000~2000 from file(ino 1 size 0xb5000 mtime 0.000000 allocated 410000 extents [1:0xb558660000~10000,1:0xb652780000~400000]) debug 2021-08-14T05:35:38.827+0000 7f71b2b39080 20 bluefs _read left 0x5a000 len 0x2000
debug 2021-08-14T05:35:38.827+0000 7f71b2b39080 20 bluefs _read got 8192
debug 2021-08-14T05:35:38.827+0000 7f71b2b39080 -1 bluefs _replay 0xb5000: stop: failed to decode: bad crc 1492738775 expected 0: Malformed input

debug 2021-08-14T05:35:38.827+0000 7f71b2b39080 -1 bluefs mount failed to replay log: (5) Input/output error

"

The above makes me think about the incomplete multi-block transaction writing happened during power off. This state should be fine for recover IMO but it looks like we don't have proper handling in log replay for this case. Shouldn't we treat transaction decoding failure (multi-block one only?) as a good stop condition, not as unrecoverable one we have for now:

    bluefs_transaction_t t;
    try {
      auto p = bl.cbegin();
      decode(t, p);
    }
    catch (ceph::buffer::error& e) {
      derr << __func__ << " 0x" << std::hex << pos << std::dec
           << ": stop: failed to decode: " << e.what()
           << dendl;
      delete log_reader;
      return -EIO;
    }

What do you think? Is the above analysis valid?

Any other thoughts?


Thanks,

Igor

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux