Re: ceph version 14.2.3-OSD fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Am 11.10.2019 um 14:07 schrieb Igor Fedotov <ifedotov@xxxxxxx>:



Hi!

originally your issue looked like the ones from https://tracker.ceph.com/issues/42223

And it looks like lack of some key information for FreeListManager in RocksDB.

Once you have it present we can check the content of the RocksDB to prove this hypothesis, please let me know if you want the guideline for that.


The last log is different, the key record is probably:

-2> 2019-10-09 23:03:47.011 7fb4295a7700 -1 rocksdb: submit_common error: Corruption: block checksum mismatch: expected 2181709173, got 2130853119  in db/204514.sst offset 0 size 61648 code = 2 Rocksdb transaction:

which most probably denotes data corruption in DB. Unfortunately for now I can't say if this is related to the original issue or not.

This time it reminds the issue shared in this mailing list a while ago by Stefan Priebe. The post caption is "Bluestore OSDs keep crashing in BlueStore.cc: 8808: FAILED assert(r == 0)"

So first of all I'd suggest to distinguish these issues for now and try to troubleshoot them separately.


As for the first case I'm wondering if you have any OSDs still failing this way, i.e. asserting in allocator and showing 0 extents loaded: "_open_alloc loaded 0 B in 0 extents"

If so lets check DB content first.


For the second case I'm wondering the most if the issue is permanent for a specific OSD or it disappears after OSD/node restart as it occurred in Stefan's case?


Just a note it came back shortly after some days. I‘m still waiting for a ceph release which fixes the issue v12.2.13...


Stefan


Thanks,

Igor


On 10/10/2019 1:59 PM, cephuser2345 user wrote:
Hi igor
since the last osd crash we had some 4 more  tried to check RocksDB with ceph-kvstore-tool :
ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-71 compact
ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-71  repair
ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-71  destructive-repair

nothing helped  we had  to redeploy the osd by removing it from the cluster and reinstalling

we have updated  to ceph  14.2.4   2 weeks or more ago still osd's falling in the same way 
i have manged to to  capture the first fault  by using : ceph crash ls added the log+meta  to this email
can something dose this logs can shed some light ?










On Thu, Sep 12, 2019 at 7:20 PM Igor Fedotov <ifedotov@xxxxxxx> wrote:

Hi,

this line:

    -2> 2019-09-12 16:38:15.101 7fcd02fd1f80  1 bluestore(/var/lib/ceph/osd/ceph-71) _open_alloc loaded 0 B in 0 extents

tells me that OSD is unable to load free list manager properly, i.e. list of free/allocated blocks in unavailable.

You might want to set 'debug bluestore = 10" and check additional log output between

these two lines:

    -3> 2019-09-12 16:38:15.093 7fcd02fd1f80  1 bluestore(/var/lib/ceph/osd/ceph-71) _open_alloc opening allocation metadata
    -2> 2019-09-12 16:38:15.101 7fcd02fd1f80  1 bluestore(/var/lib/ceph/osd/ceph-71) _open_alloc loaded 0 B in 0 extents

And/or check RocksDB records prefixed with "b" prefix using ceph-kvstore-tool.


Igor


P.S.

Sorry, might be unresponsive for the next two week as I'm going on vacation.


On 9/12/2019 7:04 PM, cephuser2345 user wrote:
Hi
we have updated  the ceph version from 14.2.2 to version 14.2.3.
the osd getting :

  -21        76.68713     host osd048                        
 66   hdd  12.78119         osd.66      up  1.00000 1.00000
 67   hdd  12.78119         osd.67      up  1.00000 1.00000
 68   hdd  12.78119         osd.68      up  1.00000 1.00000
 69   hdd  12.78119         osd.69      up  1.00000 1.00000
 70   hdd  12.78119         osd.70      up  1.00000 1.00000
 71   hdd  12.78119         osd.71    down        0 1.00000

we can not   get the osd  up  getting error its happening in alot of osds
can you please assist :)  added txt log
bluestore(/var/lib/ceph/osd/ceph-71) _open_alloc opening allocation metadata
    -2> 2019-09-12 16:38:15.101 7fcd02fd1f80  1 bluestore(/var/lib/ceph/osd/ceph-71) _open_alloc loaded 0 B in 0 extents
    -1> 2019-09-12 16:38:15.101 7fcd02fd1f80 -1 /build/ceph-14.2.3/src/os/bluestore/fastbmap_allocator_impl.h: In function 'void AllocatorLevel02<T>::_mark_allocated(uint64_t, uint64_t) [with L1 = AllocatorLevel01Loose; uint64_t = long unsigned int]' thread 7fcd02fd1f80 time 2019-09-12 16:38:15.102539

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux