I have a 3 node ceph cluster for my house that I have been using for a few years now without issue. Each node is a MON, MGR, and MDS, and has 2-3 OSDs on them. It has, however been slow. I decided to finally move the bluestore DBs to SSDs. I did one OSD as a test case to make sure everything was going to go OK. I deleted the OSD, then created a new OSD using the ceph-deploy tool and pointed the DB at a LVM partition on a SSD. Everything went OK, and recovery started. Later in the day I noticed that my MDS daemon is damaged (PGs are still recovering). I've tried the cephfs-journal-tool --rank=cephfs:all journal export backup.bin command, but it gave me: 2020-02-23 17:50:03.589 7f7d8b225740 -1 Missing object 200.00c30b6d 2020-02-23 17:50:07.919 7f7d8b225740 -1 Bad entry start ptr (0x30c2dbb92003) at 0x30c2d3a125ea (both lines have several repeats) and will not complete. Looking at the log file of the mds that was active at the time shows: 2020-02-23 17:13:09.091 7fad40029700 0 mds.0.journaler.mdlog(ro) _finish_read got error -2 2020-02-23 17:13:09.091 7fad40029700 0 mds.0.journaler.mdlog(ro) _finish_read got error -2 2020-02-23 17:13:09.091 7fad40029700 0 mds.0.journaler.mdlog(ro) _finish_read got error -2 2020-02-23 17:13:09.091 7fad3e826700 0 mds.0.log _replay journaler got error -2, aborting 2020-02-23 17:13:09.091 7fad3e826700 -1 log_channel(cluster) log [ERR] : missing journal object One other thing that happened about the same time, I noticed I was having memory pressure on all the nodes with only 200MB of free ram. I've tweaked the bluestore osd_memory_target to try to help that not happen again. Even so, I'm a bit confused how that could cause catastrophic failure, as I had 2 other MDSes on standby. Any help would be appreciated. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx