Re: Problem with query and any operation on PGs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

> On Wed, 24 May 2017, Łukasz Chrustek wrote:

>> Hello,
>> 
>> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> Cześć,
>> >> 
>> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> >> Cześć,
>> >> >> 
>> >> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> >> >> Cześć,
>> >> >> >> 
>> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> >> >> >> Cześć,
>> >> >> >> >> 
>> >> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> >> >> >> >> I'm  not  sleeping for over 30 hours, and still can't find solution. I
>> >> >> >> >> >> did,      as      You      wrote,     but     turning     off     this
>> >> >> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
>> >> >> >> >> 
>> >> >> >> >> > The important bit is:
>> >> >> >> >> 
>> >> >> >> >> >             "blocked": "peering is blocked due to down osds",
>> >> >> >> >> >             "down_osds_we_would_probe": [
>> >> >> >> >> >                 6,
>> >> >> >> >> >                 10,
>> >> >> >> >> >                 33,
>> >> >> >> >> >                 37,
>> >> >> >> >> >                 72
>> >> >> >> >> >             ],
>> >> >> >> >> >             "peering_blocked_by": [
>> >> >> >> >> >                 {
>> >> >> >> >> >                     "osd": 6,
>> >> >> >> >> >                     "current_lost_at": 0,
>> >> >> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> >> >> > us proceed"
>> >> >> >> >> >                 },
>> >> >> >> >> >                 {
>> >> >> >> >> >                     "osd": 10,
>> >> >> >> >> >                     "current_lost_at": 0,
>> >> >> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> >> >> > us proceed"
>> >> >> >> >> >                 },
>> >> >> >> >> >                 {
>> >> >> >> >> >                     "osd": 37,
>> >> >> >> >> >                     "current_lost_at": 0,
>> >> >> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> >> >> > us proceed"
>> >> >> >> >> >                 },
>> >> >> >> >> >                 {
>> >> >> >> >> >                     "osd": 72,
>> >> >> >> >> >                     "current_lost_at": 113771,
>> >> >> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> >> >> > us proceed"
>> >> 
>> >> > These are the osds (6, 10, 37, 72).
>> >> 
>> >> >> >> >> >                 }
>> >> >> >> >> >             ]
>> >> >> >> >> >         },
>> >> >> >> >> 
>> >> >> >> >> > Are any of those OSDs startable?
>> >> 
>> >> > This
>> >> 
>> >> osd 6 - isn't startable
>> 
>> > Disk completely 100% dead, or just borken enough that ceph-osd won't 
>> > start?  ceph-objectstore-tool can be used to extract a copy of the 2 pgs
>> > from this osd to recover any important writes on that osd.
>> 
>> 2017-05-24 11:21:23.341938 7f6830a36940  0 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd), process ceph-osd, pid 1375
>> 2017-05-24 11:21:23.350180 7f6830a36940  0 filestore(/var/lib/ceph/osd/ceph-6) backend btrfs (magic 0x9123683e)
>> 2017-05-24 11:21:23.350610 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: FIEMAP ioctl is supported and appears to work
>> 2017-05-24 11:21:23.350617 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
>> 2017-05-24 11:21:23.350633 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: splice is supported
>> 2017-05-24 11:21:23.351897 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
>> 2017-05-24 11:21:23.351951 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: CLONE_RANGE ioctl is supported
>> 2017-05-24 11:21:23.351970 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to create simple subvolume test_subvol: (17) File exists
>> 2017-05-24 11:21:23.351981 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE is supported
>> 2017-05-24 11:21:23.351984 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted
>> 2017-05-24 11:21:23.351987 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed with EPERM as non-root; remount with -o user_subvol_rm_allowed
>> 2017-05-24 11:21:23.351996 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: snaps enabled, but no SNAP_DESTROY ioctl; DISABLING
>> 2017-05-24 11:21:23.352573 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: START_SYNC is supported (transid 252877)
>> 2017-05-24 11:21:23.353001 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: WAIT_SYNC is supported
>> 2017-05-24 11:21:23.353012 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: removing old async_snap_test
>> 2017-05-24 11:21:23.353016 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove old async_snap_test: (1) Operation not permitted
>> 2017-05-24 11:21:23.353021 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE_V2 is supported
>> 2017-05-24 11:21:23.353022 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted
>> 2017-05-24 11:21:23.353027 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove test_subvol: (1) Operation not permitted
>> 2017-05-24 11:21:23.355156 7f6830a36940  0 filestore(/var/lib/ceph/osd/ceph-6) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
>> 2017-05-24 11:21:23.355881 7f6830a36940 -1 filestore(/var/lib/ceph/osd/ceph-6) could not find -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory
>> 2017-05-24 11:21:23.355891 7f6830a36940 -1 osd.6 0 OSD::init() : unable to read osd superblock
>> 2017-05-24 11:21:23.356411 7f6830a36940 -1 ^[[0;31m ** ERROR: osd init failed: (22) Invalid argument^[[0m
>> 
>> it is all I get for this osd in logs, when I try to start it.
>> 
>> >> osd 10, 37, 72 are startable
>> 
>> > With those started, I'd repeat the original sequence and get a fresh pg
>> > query to confirm that it still wants just osd.6.
>> 
>> You  mean about procedure with loop and taking down OSDs, which broken
>> PGs are pointing to ?
>> pg 1.60 is down+remapped+peering, acting [66,40]
>> pg 1.165 is down+peering, acting [67,88,48]
>> 
>> for pg 1.60 <--> 66 down, then in loop check pg query ?

> Right.

And  now  it  is very weird.... I made osd.37 up, and loop
while true;do; ceph tell 1.165 query ;done

catch this:

https://pastebin.com/zKu06fJn

Can You tell, what is wrong now ?

>> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
>> > ranodm osd (not one of these ones), import the pg into that osd, and start
>> > again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
>> > point.  repeat with the same basic process with the other pg.
>> 
>> I have already did 'ceph osd lost 6', do I need to do this once again ?

> Hmm not sure, if the OSD is empty then there is no harm in doing it again.
> Try that first since it might resolve it.  If not, do the query loop 
> above.

> s



-- 
Regards,,
 Łukasz Chrustek

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux