On Wed, 12 Jun 2019, Harald Staub wrote: > Also opened an issue about the rocksdb problem: > https://tracker.ceph.com/issues/40300 Thanks! The 'rocksdb: Corruption: file is too short' the root of the problem here. Can you try starting the OSD with 'debug_bluestore=20' and 'debug_bluefs=20'? (And attach them to the ticket, or ceph-post-file and put the uuid in the ticket..) Thanks! sage > > On 12.06.19 16:06, Harald Staub wrote: > > We ended in a bad situation with our RadosGW (Cluster is Nautilus > > 14.2.1, 350 OSDs with BlueStore): > > > > 1. There is a bucket with about 60 million objects, without shards. > > > > 2. radosgw-admin bucket reshard --bucket $BIG_BUCKET --num-shards 1024 > > > > 3. Resharding looked fine first, it counted up to the number of objects, > > but then it hang. > > > > 4. 3 OSDs crashed with a segfault: "rocksdb: Corruption: file is too short" > > > > 5. Trying to start the OSDs manually led to the same segfaults. > > > > 6. ceph-bluestore-tool repair ... > > > > 7. The repairs all aborted, with the same rocksdb error as above. > > > > 8. Now 1 PG is stale. It belongs to the radosgw bucket index pool, and > > it contained the index of this big bucket. > > > > Is there any hope in getting these rocksdbs up again? > > > > Otherwise: how would we fix the bucket index pool? Our ideas: > > > > 1. ceph pg $BAD_PG mark_unfound_lost delete > > 2. rados -p .rgw.buckets ls, search $BAD_BUCKET_ID and remove these > > objects. The hope of this step would be to make the following step > > faster, and avoid another similar problem. > > 3. radosgw-admin bucket check --check-objects > > > > Will this really rebuild the bucket index? Is it ok to leave the > > existing bucket indexes in place? Is it ok to run for all buckets at > > once, or has it to be run bucket by bucket? Is there a risk that the > > indexes that are not affected by the BAD_PG will be broken afterwards? > > > > Some more details that may be of interest. > > > > ceph-bluestore-repair says: > > > > 2019-06-12 11:15:38.345 7f56269670c0 -1 rocksdb: Corruption: file is too > > short (6139497190 bytes) to be an sstabledb/079728.sst > > 2019-06-12 11:15:38.345 7f56269670c0 -1 > > bluestore(/var/lib/ceph/osd/ceph-49) _open_db erroring opening db: > > error from fsck: (5) Input/output error > > > > The repairs also showed several warnings like: > > > > tcmalloc: large alloc 17162051584 bytes == 0x56167918a000 @ > > 0x7f5626521887 0x56126a287229 0x56126a2873a3 0x56126a5dc1ec > > 0x56126a584ce2 0x56126a586a05 0x56126a587dd0 0x56126a589344 > > 0x56126a38c3cf 0x56126a2eae94 0x56126a30654e 0x56126a337ae1 > > 0x56126a1a73a1 0x7f561b228b97 0x56126a28077a > > > > The processes showed up with like 45 GB of RAM used. Fortunately, there > > was no Out-Of-Memory. > > > > Harry > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com