Re: Problem with query and any operation on PGs

Łukasz Chrustek <skidoo@xxxxxxx> · Tue, 23 May 2017 16:43:58 +0200

Cześć,

> On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> Cześć,
>> 
>> Hello,
>> 
>> After terrible outage coused by failure of 10Gbit switch, ceph cluster
>> went  to HEALTH_ERR (three whole storage servers go offline in the same time
>> and didn't back in short time). After cluster recovery two PGs goto to
>> incomplite state, I can't them query, and can't do with them anything,

> The thing where you can't query a PG is because the OSD is throttling 
> incoming work and the throttle is exhausted (the PG can't do work so it
> isn't making progress).  A workaround for jewel is to restart the OSD 
> serving the PG and do the query quickly after that (probably in a loop so
> that you catch it after it starts up but before the throttle is 
> exhausted again).  (In luminous this is fixed.)

Thank You for claryfication.

> Once you have the query output ('ceph tell $pgid query') you'll be able to
> tell what is preventing the PG from peering.

Hm..  what  kind of loop You sugests ? When I do ceph tell $pgid query
it hangs, not relasing to the console.

> You can identify the osd(s) hosting the pg with 'ceph pg map $pgid'.

it is somehting strange here for 1.165, how it is posible, that acting
is 37 and it isn't in range of [84,38,48] ?:

ceph pg map 1.165
osdmap e114855 pg 1.165 (1.165) -> up [84,38,48] acting [37]

second one is ok, but also no ability to make pg query:

[root@cc1 ~]# ceph pg map 1.60
osdmap e114855 pg 1.60 (1.60) -> up [66,84,40] acting [66,69,40]

do I need to restart all three osds in the same time ?

Can   You   advice  how to unblock access to one of pool for this kind
of command:

[root@cc1 ~]# rbd ls volumes
^C

strace  for this is here: https://pastebin.com/hpbDg6gP - this time it
hangs  on  some futex function. Are this cases (pg query hang and this
rbd ls problem) are connected each other ?

If I find solution for this, You will make my day (and night :) ).

Regards
Lukasz

> HTH!
> sage

>> what   would   allow   back  working cluster back. here is strace of
>> this command: https://pastebin.com/HpNFvR8Z. But... this cluster isn't enteriely off:
>> 
>> [root@cc1 ~]# rbd ls management-vms
>> os-mongodb1
>> os-mongodb1-database
>> os-gitlab-root
>> os-mongodb1-database2
>> os-wiki-root
>> [root@cc1 ~]# rbd ls volumes
>> ^C
>> [root@cc1 ~]#
>> 
>> and for all mon hosts (don't put all three here)
>> 
>> [root@cc1 ~]# rbd -m 192.168.128.1 list management-vms
>> os-mongodb1
>> os-mongodb1-database
>> os-gitlab-root
>> os-mongodb1-database2
>> os-wiki-root
>> [root@cc1 ~]# rbd -m 192.168.128.1 list volumes
>> ^C
>> [root@cc1 ~]#
>> 
>> and  all other POOLs from list, except (most important) volumes, I can
>> list images.
>> 
>> Fanny thing, I can list rbd info for particular image:
>> 
>> [root@cc1 ~]# rbd info
>> volumes/volume-197602d7-40f9-40ad-b286-cdec688b1497
>> rbd image 'volume-197602d7-40f9-40ad-b286-cdec688b1497':
>>         size 20480 MB in 1280 objects
>>         order 24 (16384 kB objects)
>>         block_name_prefix: rbd_data.64a21a0a9acf52
>>         format: 2
>>         features: layering
>>         flags:
>>         parent: images/37bdf0ca-f1f3-46ce-95b9-c04bb9ac8a53@snap
>>         overlap: 3072 MB
>> 
>> but can't list the whole content of pool volumes.
>> 
>> [root@cc1 ~]# ceph osd pool ls
>> volumes
>> images
>> backups
>> volumes-ssd-intel-s3700
>> management-vms
>> .rgw.root
>> .rgw.control
>> .rgw
>> .rgw.gc
>> .log
>> .users.uid
>> .rgw.buckets.index
>> .users
>> .rgw.buckets.extra
>> .rgw.buckets
>> volumes-cached
>> cache-ssd
>> 
>> here is ceph osd tree:
>> 
>> ID  WEIGHT    TYPE NAME            UP/DOWN REWEIGHT PRIMARY-AFFINITY
>>  -7  20.88388 root ssd-intel-s3700
>> -11   3.19995     host ssd-stor1
>>  56   0.79999         osd.56            up  1.00000          1.00000
>>  57   0.79999         osd.57            up  1.00000          1.00000
>>  58   0.79999         osd.58            up  1.00000          1.00000
>>  59   0.79999         osd.59            up  1.00000          1.00000
>>  -9   2.12999     host ssd-stor2
>>  60   0.70999         osd.60            up  1.00000          1.00000
>>  61   0.70999         osd.61            up  1.00000          1.00000
>>  62   0.70999         osd.62            up  1.00000          1.00000
>>  -8   2.12999     host ssd-stor3
>>  63   0.70999         osd.63            up  1.00000          1.00000
>>  64   0.70999         osd.64            up  1.00000          1.00000
>>  65   0.70999         osd.65            up  1.00000          1.00000
>> -10   4.19998     host ssd-stor4
>>  25   0.70000         osd.25            up  1.00000          1.00000
>>  26   0.70000         osd.26            up  1.00000          1.00000
>>  27   0.70000         osd.27            up  1.00000          1.00000
>>  28   0.70000         osd.28            up  1.00000          1.00000
>>  29   0.70000         osd.29            up  1.00000          1.00000
>>  24   0.70000         osd.24            up  1.00000          1.00000
>> -12   3.41199     host ssd-stor5
>>  73   0.85300         osd.73            up  1.00000          1.00000
>>  74   0.85300         osd.74            up  1.00000          1.00000
>>  75   0.85300         osd.75            up  1.00000          1.00000
>>  76   0.85300         osd.76            up  1.00000          1.00000
>> -13   3.41199     host ssd-stor6
>>  77   0.85300         osd.77            up  1.00000          1.00000
>>  78   0.85300         osd.78            up  1.00000          1.00000
>>  79   0.85300         osd.79            up  1.00000          1.00000
>>  80   0.85300         osd.80            up  1.00000          1.00000
>> -15   2.39999     host ssd-stor7
>>  90   0.79999         osd.90            up  1.00000          1.00000
>>  91   0.79999         osd.91            up  1.00000          1.00000
>>  92   0.79999         osd.92            up  1.00000          1.00000
>>  -1 167.69969 root default
>>  -2  33.99994     host stor1
>>   6   3.39999         osd.6           down        0          1.00000
>>   7   3.39999         osd.7             up  1.00000          1.00000
>>   8   3.39999         osd.8             up  1.00000          1.00000
>>   9   3.39999         osd.9             up  1.00000          1.00000
>>  10   3.39999         osd.10          down        0          1.00000
>>  11   3.39999         osd.11          down        0          1.00000
>>  69   3.39999         osd.69            up  1.00000          1.00000
>>  70   3.39999         osd.70            up  1.00000          1.00000
>>  71   3.39999         osd.71          down        0          1.00000
>>  81   3.39999         osd.81            up  1.00000          1.00000
>>  -3  20.99991     host stor2
>>  13   2.09999         osd.13            up  1.00000          1.00000
>>  12   2.09999         osd.12            up  1.00000          1.00000
>>  14   2.09999         osd.14            up  1.00000          1.00000
>>  15   2.09999         osd.15            up  1.00000          1.00000
>>  16   2.09999         osd.16            up  1.00000          1.00000
>>  17   2.09999         osd.17            up  1.00000          1.00000
>>  18   2.09999         osd.18          down        0          1.00000
>>  19   2.09999         osd.19            up  1.00000          1.00000
>>  20   2.09999         osd.20            up  1.00000          1.00000
>>  21   2.09999         osd.21            up  1.00000          1.00000
>>  -4  25.00000     host stor3
>>  30   2.50000         osd.30            up  1.00000          1.00000
>>  31   2.50000         osd.31            up  1.00000          1.00000
>>  32   2.50000         osd.32            up  1.00000          1.00000
>>  33   2.50000         osd.33          down        0          1.00000
>>  34   2.50000         osd.34            up  1.00000          1.00000
>>  35   2.50000         osd.35            up  1.00000          1.00000
>>  66   2.50000         osd.66            up  1.00000          1.00000
>>  67   2.50000         osd.67            up  1.00000          1.00000
>>  68   2.50000         osd.68            up  1.00000          1.00000
>>  72   2.50000         osd.72          down        0          1.00000
>>  -5  25.00000     host stor4
>>  44   2.50000         osd.44            up  1.00000          1.00000
>>  45   2.50000         osd.45            up  1.00000          1.00000
>>  46   2.50000         osd.46          down        0          1.00000
>>  47   2.50000         osd.47            up  1.00000          1.00000
>>   0   2.50000         osd.0             up  1.00000          1.00000
>>   1   2.50000         osd.1             up  1.00000          1.00000
>>   2   2.50000         osd.2             up  1.00000          1.00000
>>   3   2.50000         osd.3             up  1.00000          1.00000
>>   4   2.50000         osd.4             up  1.00000          1.00000
>>   5   2.50000         osd.5             up  1.00000          1.00000
>>  -6  14.19991     host stor5
>>  48   1.79999         osd.48            up  1.00000          1.00000
>>  49   1.59999         osd.49            up  1.00000          1.00000
>>  50   1.79999         osd.50            up  1.00000          1.00000
>>  51   1.79999         osd.51          down        0          1.00000
>>  52   1.79999         osd.52            up  1.00000          1.00000
>>  53   1.79999         osd.53            up  1.00000          1.00000
>>  54   1.79999         osd.54            up  1.00000          1.00000
>>  55   1.79999         osd.55            up  1.00000          1.00000
>> -14  14.39999     host stor6
>>  82   1.79999         osd.82            up  1.00000          1.00000
>>  83   1.79999         osd.83            up  1.00000          1.00000
>>  84   1.79999         osd.84            up  1.00000          1.00000
>>  85   1.79999         osd.85            up  1.00000          1.00000
>>  86   1.79999         osd.86            up  1.00000          1.00000
>>  87   1.79999         osd.87            up  1.00000          1.00000
>>  88   1.79999         osd.88            up  1.00000          1.00000
>>  89   1.79999         osd.89            up  1.00000          1.00000
>> -16  12.59999     host stor7
>>  93   1.79999         osd.93            up  1.00000          1.00000
>>  94   1.79999         osd.94            up  1.00000          1.00000
>>  95   1.79999         osd.95            up  1.00000          1.00000
>>  96   1.79999         osd.96            up  1.00000          1.00000
>>  97   1.79999         osd.97            up  1.00000          1.00000
>>  98   1.79999         osd.98            up  1.00000          1.00000
>>  99   1.79999         osd.99            up  1.00000          1.00000
>> -17  21.49995     host stor8
>>  22   1.59999         osd.22            up  1.00000          1.00000
>>  23   1.59999         osd.23            up  1.00000          1.00000
>>  36   2.09999         osd.36            up  1.00000          1.00000
>>  37   2.09999         osd.37            up  1.00000          1.00000
>>  38   2.50000         osd.38            up  1.00000          1.00000
>>  39   2.50000         osd.39            up  1.00000          1.00000
>>  40   2.50000         osd.40            up  1.00000          1.00000
>>  41   2.50000         osd.41          down        0          1.00000
>>  42   2.50000         osd.42            up  1.00000          1.00000
>>  43   1.59999         osd.43            up  1.00000          1.00000
>> [root@cc1 ~]#
>> 
>> and ceph health detail:
>> 
>> ceph health detail | grep down
>> HEALTH_WARN 23 pgs backfilling; 23 pgs degraded; 2 pgs down; 2 pgs
>> peering; 2 pgs stuck inactive; 25 pgs stuck unclean; 23 pgs
>> undersized; recovery 176211/14148564 objects degraded (1.245%);
>> recovery 238972/14148564 objects misplaced (1.689%); noout flag(s) set
>> pg 1.60 is stuck inactive since forever, current state
>> down+remapped+peering, last acting [66,69,40]
>> pg 1.165 is stuck inactive since forever, current state
>> down+remapped+peering, last acting [37]
>> pg 1.60 is stuck unclean since forever, current state
>> down+remapped+peering, last acting [66,69,40]
>> pg 1.165 is stuck unclean since forever, current state
>> down+remapped+peering, last acting [37]
>> pg 1.165 is down+remapped+peering, acting [37]
>> pg 1.60 is down+remapped+peering, acting [66,69,40]
>> 
>> 
>> problematic pgs are 1.165 and 1.60.
>> 
>> Please  advice  how  to  unblock pool volumes and/or make this two pgs
>> working  -  in a last night and day, when we tried to solve this issue
>> these pgs are for 100% empty from data.
>> 
>> 
>> 
>> 
>> -- 
>> Pozdrowienia,
>>  Łukasz Chrustek
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 

-- 
Pozdrowienia,
 Łukasz Chrustek

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html