3 OSD down and unable to start

Jordi Blasco <jbllistes@xxxxxxxxx> · Wed, 28 Aug 2019 19:32:07 +1200

Hello,
I've been facing some issues with a single node ceph cluster (mimic). I know an environment like this shouldn't be in production but the server end up dealing with operational workloads for the last 2 years.

Some users detected some issues in cephfs; some files not being accessible and hanging the node while trying to list the content of affected folders.

I noticed a heavy memory load on the server. Main memory was consumed by cache as well as quite a reasonable swap.

The command "ceph health detail" reported some inactive PGs. Those PGs didn't exist.
After rebooting the node, an fsck was run in the 3 affected OSDs.
ceph-bluestore-tool fsck --deep yes --path /var/lib/ceph/osd/ceph-1/

Unfortunately, all of them crashed with a core dump and now they don't start anymore. 
The logs report messages like:
2019-08-28 03:00:12.999 7f21d787c240  4 rocksdb: [/build/ceph-13.2.1/src/rocksdb/db/version_set.cc:3088] Recovering from manifest file: MANIFEST-004059
2019-08-28 03:00:12.999 7f21d787c240  4 rocksdb: [/build/ceph-13.2.1/src/rocksdb/db/db_impl.cc:252] Shutdown: canceling all background work
2019-08-28 03:00:12.999 7f21d787c240  4 rocksdb: [/build/ceph-13.2.1/src/rocksdb/db/db_impl.cc:397] Shutdown complete
2019-08-28 03:00:12.999 7f21d787c240 -1 rocksdb: NotFound:
2019-08-28 03:00:12.999 7f21d787c240 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_db erroring opening db:
2019-08-28 03:00:12.999 7f21d787c240  1 bluefs umount
2019-08-28 03:00:12.999 7f21d787c240  1 stupidalloc 0x0x5650c5255800 shutdown
2019-08-28 03:00:12.999 7f21d787c240  1 bdev(0x5650c5604a80 /var/lib/ceph/osd/ceph-0/block) close
2019-08-28 03:00:13.247 7f21d787c240  1 bdev(0x5650c5604700 /var/lib/ceph/osd/ceph-0/block) close
2019-08-28 03:00:13.479 7f21d787c240 -1 osd.0 0 OSD:init: unable to mount object store
2019-08-28 03:00:13.479 7f21d787c240 -1  ** ERROR: osd init failed: (5) Input/output error

I'm not sure if the fsck has introduced additional damage.

After that, I tried to mark unfound as lost with the following commands:
ceph pg 4.1e mark_unfound_lost revert
ceph pg 9.1d mark_unfound_lost revert
ceph pg 13.3 mark_unfound_lost revert
ceph pg 13.e mark_unfound_lost revert
Currently, since there are 3 OSD down, there are:
316 unclean PGs
76 inactive PGs

root@ceph-s01:~# ceph osd tree
ID CLASS WEIGHT   TYPE NAME             STATUS REWEIGHT PRI-AFF
-2        0.43599 root ssd
-4        0.43599     disktype ssd_disk
12   ssd  0.43599         osd.12            up  1.00000 1.00000
-1       60.03792 root default
-5       60.03792     disktype hdd_disk
 0   hdd        0         osd.0           down  1.00000 1.00000
 1   hdd  5.45799         osd.1           down        0 1.00000
 2   hdd  5.45799         osd.2             up  1.00000 1.00000
 3   hdd  5.45799         osd.3             up  1.00000 1.00000
 4   hdd  5.45799         osd.4             up  1.00000 1.00000
 5   hdd  5.45799         osd.5             up  1.00000 1.00000
 6   hdd  5.45799         osd.6             up  1.00000 1.00000
 7   hdd  5.45799         osd.7           down        0 1.00000
 8   hdd  5.45799         osd.8             up  1.00000 1.00000
 9   hdd  5.45799         osd.9             up  1.00000 1.00000
10   hdd  5.45799         osd.10            up  1.00000 1.00000
11   hdd  5.45799         osd.11            up  1.00000 1.00000

Running the following command, a MANIFEST file appeared in the folder db/lost. I guess that the repair moved here.
# ceph-bluestore-tool bluefs-export --path /var/lib/ceph/osd/ceph-7 --out-dir osd7/
...
db/LOCK
db/MANIFEST-000001
db/OPTIONS-018543
db/OPTIONS-018581
db/lost/
db/lost/MANIFEST-018578
Any ideas? Suggestions?

Thank you.

Regards,

Jordi

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx