Re: rocksdb: Corruption: missing start of fragmented record

Christian Balzer <chibi@xxxxxxx> · Wed, 1 Nov 2017 17:53:30 +0900

Hello,

On Wed, 1 Nov 2017 09:30:06 +0100 Michael wrote:

> Hello everyone,
> 
> I've conducted some crash tests (unplugging drives, the machine, 

Your exact system configuration (HW, drives, controller, settings, etc)
would be interesting as I can think of plenty scenarios on how to corrupt
things that normally shouldn't be affected by such actions.

> terminating and restarting ceph systemd services) with Ceph 12.2.0 on

Now that bit is quite disconcerting, though you're one release behind the
curve and from what I read .2 has plenty more bug fixes coming.

Christian

> Ubuntu and quite easily managed to corrupt what appears to be rocksdb's 
> log replay on a bluestore OSD:
> 
> # ceph-bluestore-tool fsck  --path /var/lib/ceph/osd/ceph-2/
> [...]
> 4 rocksdb: 
> [/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/version_set.cc:2859] 
> Recovered from manifest file:db/MANIFEST-000975 
> succeeded,manifest_file_number is 975, next_file_number is 1008, 
> last_sequence is 51965907, log_number is 0,prev_log_number is 
> 0,max_column_family is 0
> 4 rocksdb: 
> [/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/version_set.cc:2867] 
> Column family [default] (ID 0), log number is 1005
> 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1509298585082794, "job": 1, 
> "event": "recovery_started", "log_files": [1003, 1005]}
> 4 rocksdb: 
> [/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/db_impl_open.cc:482] 
> Recovering log #1003 mode 0
> 4 rocksdb: 
> [/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/db_impl_open.cc:482] 
> Recovering log #1005 mode 0
> 3 rocksdb: 
> [/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/db_impl_open.cc:424] 
> db/001005.log: dropping 3225 bytes; Corruption: missing start of 
> fragmented record(2)
> 4 rocksdb: 
> [/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/db_impl.cc:217] Shutdown: 
> canceling all background work
> 4 rocksdb: 
> [/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/db_impl.cc:343] Shutdown 
> complete
> -1 rocksdb: Corruption: missing start of fragmented record(2)
> -1 bluestore(/var/lib/ceph/osd/ceph-2/) _open_db erroring opening db:
> 1 bluefs umount
> 1 bdev(0x557f5b6a4240 /var/lib/ceph/osd/ceph-2//block) close
> 
> If I understand this right, rocksdb is  just trying to replay WAL type 
> logs, of which presumably "001005.log" is corrupted. It then throws an 
> error that stops everything.
> 
> I did try to mount the bluestore, as I was assuming that would probably 
> where I'd find the rocksdb's files somewhere, but that also doesn't seem 
> possible:
> 
> #ceph-objectstore-tool --op fsck --data-path /var/lib/ceph/osd/ceph-2/ 
> --mountpoint /mnt/bluestore-repair/
> fsck failed: (5) Input/output error
> # ceph-objectstore-tool --op fuse --data-path /var/lib/ceph/osd/ceph-2 
> --mountpoint /mnt/bluestore-repair/
> Mount failed with '(5) Input/output error'
> # ceph-objectstore-tool --op fuse --force --skip-journal-replay 
> --data-path /var/lib/ceph/osd/ceph-2 --mountpoint /mnt/bluestore-repair/
> Mount failed with '(5) Input/output error'
> 
> Adding --debug shows the ultimate culprit is just the above rocksdb 
> error again.
> 
> Q: Is there some way in which I can tell rockdb to truncate or delete / 
> skip the respective log entries? Or can I get access to rocksdb('s 
> files) in some other way to just manipulate it or delete corrupted WAL 
> files manually?
> 
> -Michael
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com