Hi!
originally your issue looked like the ones from https://tracker.ceph.com/issues/42223
And it looks like lack of some key information for
FreeListManager in RocksDB.
Once you have it present we can check the content of the RocksDB
to prove this hypothesis, please let me know if you want the
guideline for that.
The last log is different, the key record is probably:
-2> 2019-10-09 23:03:47.011 7fb4295a7700 -1 rocksdb:
submit_common error: Corruption: block checksum mismatch: expected
2181709173, got 2130853119 in db/204514.sst offset 0 size 61648
code = 2 Rocksdb transaction:
which most probably denotes data corruption in DB. Unfortunately
for now I can't say if this is related to the original issue or
not.
This time it reminds the issue shared in this mailing list a
while ago by Stefan Priebe. The post caption is "Bluestore OSDs
keep crashing in BlueStore.cc: 8808: FAILED assert(r == 0)"
So first of all I'd suggest to distinguish these issues for now
and try to troubleshoot them separately.
As for the first case I'm wondering if you have any OSDs still
failing this way, i.e. asserting in allocator and showing 0
extents loaded: "_open_alloc loaded 0 B in 0 extents"
If so lets check DB content first.
For the second case I'm wondering the most if the issue is
permanent for a specific OSD or it disappears after OSD/node
restart as it occurred in Stefan's case?
Thanks,
Igor
On 10/10/2019 1:59 PM, cephuser2345
user wrote:
Hi igor
since the last osd crash we had some 4 more tried to
check RocksDB with ceph-kvstore-tool :
ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-71
compact
ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-71
repair
ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-71
destructive-repair
nothing helped we had to redeploy the osd by removing
it from the cluster and reinstalling
we have updated to ceph 14.2.4 2 weeks or more ago
still osd's falling in the same way
i have manged to to capture the first fault by using :
ceph crash ls added the log+meta to this email
can something dose this logs can shed some light ?
Hi,
this line:
-2> 2019-09-12 16:38:15.101 7fcd02fd1f80 1
bluestore(/var/lib/ceph/osd/ceph-71) _open_alloc
loaded 0 B in 0 extents
tells me that OSD is unable to load free list
manager properly, i.e. list of free/allocated blocks
in unavailable.
You might want to set 'debug bluestore = 10" and
check additional log output between
these two lines:
-3> 2019-09-12 16:38:15.093 7fcd02fd1f80 1
bluestore(/var/lib/ceph/osd/ceph-71) _open_alloc
opening allocation metadata
-2> 2019-09-12 16:38:15.101 7fcd02fd1f80 1
bluestore(/var/lib/ceph/osd/ceph-71) _open_alloc
loaded 0 B in 0 extents
And/or check RocksDB records prefixed with "b"
prefix using ceph-kvstore-tool.
Igor
P.S.
Sorry, might be unresponsive for the next two week
as I'm going on vacation.
On 9/12/2019 7:04 PM, cephuser2345 user wrote:
Hi
we have updated the ceph version from 14.2.2
to version 14.2.3.
the osd getting :
-21 76.68713 host osd048
66 hdd 12.78119 osd.66 up
1.00000 1.00000
67 hdd 12.78119 osd.67 up
1.00000 1.00000
68 hdd 12.78119 osd.68 up
1.00000 1.00000
69 hdd 12.78119 osd.69 up
1.00000 1.00000
70 hdd 12.78119 osd.70 up
1.00000 1.00000
71 hdd 12.78119 osd.71 down
0 1.00000
we can not get the osd up getting error
its happening in alot of osds
can you please assist :) added txt log
bluestore(/var/lib/ceph/osd/ceph-71)
_open_alloc opening allocation metadata
-2> 2019-09-12 16:38:15.101 7fcd02fd1f80
1 bluestore(/var/lib/ceph/osd/ceph-71)
_open_alloc loaded 0 B in 0 extents
-1> 2019-09-12 16:38:15.101 7fcd02fd1f80
-1
/build/ceph-14.2.3/src/os/bluestore/fastbmap_allocator_impl.h:
In function 'void
AllocatorLevel02<T>::_mark_allocated(uint64_t,
uint64_t) [with L1 = AllocatorLevel01Loose;
uint64_t = long unsigned int]' thread
7fcd02fd1f80 time 2019-09-12 16:38:15.102539
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com