Re: rocksdb: Corruption: missing start of fragmented record

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 2, 2017 at 8:55 AM Michael <mehe.schmid@xxxxxx> wrote:
Christian Balzer wrote:
> Your exact system configuration (HW, drives, controller, settings, etc)
> would be interesting as I can think of plenty scenarios on how to corrupt
> things that normally shouldn't be affected by such actions
Oh, the hardware in question is consumer grade and not new. Some old i7
machine. But my current guess is that the specific hardware is
semi-unrelated.

I think there probably is just a WAL log or entry in that that wasn't
finished writing to disk or corrupted. But WAL aren't something that
should be fully assumed to be complete and correct in the case of some
failure, right? They are WAL.

If this can't be fixed automatically with some command, I would simply
like to have a look at & tinker trivially the with these DB files if
possible (which I still haven't figured a way how to do).

Your hardware and configuration is very relevant. As you note, the WAL should be able to handle being incompletely-written, and both Ceph and RocksDB are designed to handle failures mid-write. That RocksDB *isn't* doing that here implies either
1) there's a fatal bug in rocksdb, or
2) your hardware configuration is not honoring the storage consistency APIs correctly.

Given past experience, (2) is far more likely than (1) is. The RocksDB community may be able to give you more immediate guidance to what exactly went wrong, but I'd look at whether you have a writeback cache somewhere that isn't reflecting ordering requirements, or if your disk passes a crash consistency tester. (No, I don't know one off-hand. But many disks lie horribly even about stuff like flushes.)
-Greg
 


Christian Balzer wrote:
> Now that bit is quite disconcerting, though you're one release behind the
> curve and from what I read .2 has plenty more bug fixes coming.
Fair point. I just tried with 12.2.1 (on pre-release Ubuntu bionic now).

Doesn't change anything - fsck doesn't fix rocksdb, the bluestore won't
mount, the OSD won't activate and the error is the same.

Is there any fix in .2 that might address this, or do you just mean that
in general there will be bug fixes?


Thanks for your response!

- Michael
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux