Hello, > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> Cześć, >> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> >> Cześć, >> >> >> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> >> >> Cześć, >> >> >> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: >> >> >> >> Cześć, >> >> >> >> >> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: >> >> >> >> >> I'm not sleeping for over 30 hours, and still can't find solution. I >> >> >> >> >> did, as You wrote, but turning off this >> >> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue... >> >> >> >> >> >> >> >> > The important bit is: >> >> >> >> >> >> >> >> > "blocked": "peering is blocked due to down osds", >> >> >> >> > "down_osds_we_would_probe": [ >> >> >> >> > 6, >> >> >> >> > 10, >> >> >> >> > 33, >> >> >> >> > 37, >> >> >> >> > 72 >> >> >> >> > ], >> >> >> >> > "peering_blocked_by": [ >> >> >> >> > { >> >> >> >> > "osd": 6, >> >> >> >> > "current_lost_at": 0, >> >> >> >> > "comment": "starting or marking this osd lost may let >> >> >> >> > us proceed" >> >> >> >> > }, >> >> >> >> > { >> >> >> >> > "osd": 10, >> >> >> >> > "current_lost_at": 0, >> >> >> >> > "comment": "starting or marking this osd lost may let >> >> >> >> > us proceed" >> >> >> >> > }, >> >> >> >> > { >> >> >> >> > "osd": 37, >> >> >> >> > "current_lost_at": 0, >> >> >> >> > "comment": "starting or marking this osd lost may let >> >> >> >> > us proceed" >> >> >> >> > }, >> >> >> >> > { >> >> >> >> > "osd": 72, >> >> >> >> > "current_lost_at": 113771, >> >> >> >> > "comment": "starting or marking this osd lost may let >> >> >> >> > us proceed" >> >> > These are the osds (6, 10, 37, 72). >> >> >> >> >> > } >> >> >> >> > ] >> >> >> >> > }, >> >> >> >> >> >> >> >> > Are any of those OSDs startable? >> >> > This >> >> osd 6 - isn't startable > Disk completely 100% dead, or just borken enough that ceph-osd won't > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs > from this osd to recover any important writes on that osd. 2017-05-24 11:21:23.341938 7f6830a36940 0 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd), process ceph-osd, pid 1375 2017-05-24 11:21:23.350180 7f6830a36940 0 filestore(/var/lib/ceph/osd/ceph-6) backend btrfs (magic 0x9123683e) 2017-05-24 11:21:23.350610 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: FIEMAP ioctl is supported and appears to work 2017-05-24 11:21:23.350617 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option 2017-05-24 11:21:23.350633 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: splice is supported 2017-05-24 11:21:23.351897 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2017-05-24 11:21:23.351951 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: CLONE_RANGE ioctl is supported 2017-05-24 11:21:23.351970 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to create simple subvolume test_subvol: (17) File exists 2017-05-24 11:21:23.351981 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE is supported 2017-05-24 11:21:23.351984 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted 2017-05-24 11:21:23.351987 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed with EPERM as non-root; remount with -o user_subvol_rm_allowed 2017-05-24 11:21:23.351996 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: snaps enabled, but no SNAP_DESTROY ioctl; DISABLING 2017-05-24 11:21:23.352573 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: START_SYNC is supported (transid 252877) 2017-05-24 11:21:23.353001 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: WAIT_SYNC is supported 2017-05-24 11:21:23.353012 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: removing old async_snap_test 2017-05-24 11:21:23.353016 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove old async_snap_test: (1) Operation not permitted 2017-05-24 11:21:23.353021 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE_V2 is supported 2017-05-24 11:21:23.353022 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted 2017-05-24 11:21:23.353027 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove test_subvol: (1) Operation not permitted 2017-05-24 11:21:23.355156 7f6830a36940 0 filestore(/var/lib/ceph/osd/ceph-6) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2017-05-24 11:21:23.355881 7f6830a36940 -1 filestore(/var/lib/ceph/osd/ceph-6) could not find -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory 2017-05-24 11:21:23.355891 7f6830a36940 -1 osd.6 0 OSD::init() : unable to read osd superblock 2017-05-24 11:21:23.356411 7f6830a36940 -1 ^[[0;31m ** ERROR: osd init failed: (22) Invalid argument^[[0m it is all I get for this osd in logs, when I try to start it. >> osd 10, 37, 72 are startable > With those started, I'd repeat the original sequence and get a fresh pg > query to confirm that it still wants just osd.6. You mean about procedure with loop and taking down OSDs, which broken PGs are pointing to ? pg 1.60 is down+remapped+peering, acting [66,40] pg 1.165 is down+peering, acting [67,88,48] for pg 1.60 <--> 66 down, then in loop check pg query ? > use ceph-objectstore-tool to export the pg from osd.6, stop some other > ranodm osd (not one of these ones), import the pg into that osd, and start > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that > point. repeat with the same basic process with the other pg. I have already did 'ceph osd lost 6', do I need to do this once again ? -- Regards Łukasz Chrustek -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html