My main question is this - is there a way to stop any replay or journaling during OSD startup and bring up the pool/fs in read-only mode? Here is a description of what I'm seeing. I have a Luminous cluster with CephFS and 16 8TB SSDs, using size=3. I had a problem with one of my SAS controllers, and now I have at least 3 OSDs that refuse to start. The hardware appears to be fine now. I have my essential data backed up, but there are a few files that I wouldn't mind saving so I want to use this as disaster recovery practice. The two problems I am seeing are: 1) On two of OSDs, there is a startup replay error after successfully replaying quite a few blocks: 2019-07-06 16:08:05.281063 7f6baec66e40 10 bluefs _replay 0x1543000: stop: uuid c366a2d6-e221-98b3-59fe-0f324c9dac8e != super.uuid 263428d5-8963-4339-8815-92ab6067e7a4 2019-07-06 16:08:05.281064 7f6baec66e40 10 bluefs _replay log file size was 0x1543000 2019-07-06 16:08:05.281085 7f6baec66e40 -1 bluefs _replay file with link count 0: file(ino 1485 size 0x15f4c43 mtime 2019-07-04 20:39:39.387601 bdev 1 allocated 1600000 extents [1:0x35771500000+100000,1:0x35771600000+100000,1:0x35771700000+100000,1:0x35771c00000+100000,1:0x35771d00000+100000,1:0x35772200000+100000,1:0x35772300000+100000,1:0x35772800000+100000,1:0x35772900000+100000,1:0x35772a00000+100000,1:0x35772b00000+100000,1:0x35772c00000+100000,1:0x35772d00000+100000,1:0x35772e00000+100000,1:0x35773300000+100000,1:0x35773400000+100000,1:0x35773500000+100000,1:0x35773600000+100000,1:0x35773700000+100000,1:0x35773800000+100000,1:0x35773900000+100000,1:0x35773a00000+100000]) 2019-07-06 16:08:05.281093 7f6baec66e40 -1 bluefs mount failed to replay log: (5) Input/output error 2) The following error happens on at least two OSDs: 2019-07-06 15:58:46.621008 7fdcee030e40 -1 bluestore(/var/lib/ceph/osd/ceph-74) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x147db0c5, expected 0x8f052c9, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# The system was archiving some unimportant files when it went down, so I really don't care about any of the recent writes. What are my recovery options here? I was thinking that turning off replaying and running in read-only mode would be feasible, but maybe there are better options? Thanks, Mark _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com