Re: [EXTERNAL] Re: OSDs flapping with "_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB"

Dave Piper <david.piper@xxxxxxxxxxxxx> · Mon, 20 Sep 2021 16:02:09 +0000

Okay - I've finally got full debug logs from the flapping OSDs. The raw logs are both 100M each - I can email them directly if necessary. (Igor I've already sent these your way.)

Both flapping OSDs are reporting the same "bluefs _allocate failed to allocate" errors as before.  I've also noticed additional errors about corrupt blocks which I haven't noticed previously.  E.g.

2021-09-08T10:42:13.316+0000 7f705c4f2f00  3 rocksdb: [table/block_based_table_reader.cc:1117] Encountered error while reading data from compression dictionary block Corruption: block checksum mismatch: expected 0, got 2324967111  in db/501397.sst offset 18446744073709551615 size 18446744073709551615

FTR (I realised I never posted this before) our osd tree is:

[qs-admin@condor_sc0 ~]$ sudo docker exec fe4eb75fc98b ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME            STATUS  REWEIGHT  PRI-AFF
-1         1.02539  root default
-7         0.34180      host condor_sc0
 1    ssd  0.34180          osd.1          down         0  1.00000
-5         0.34180      host condor_sc1
 0    ssd  0.34180          osd.0            up   1.00000  1.00000
-3         0.34180      host condor_sc2
 2    ssd  0.34180          osd.2          down   1.00000  1.00000

I've still not managed to get the ceph-bluestore-tool output - will get back to you on that.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx