Re: Cluster crashing when stopping some host

Eugen Block <eblock@xxxxxx> · Mon, 17 Oct 2022 12:12:21 +0000

If this is reproducable I'd recommend to create a tracker issue:  
https://tracker.ceph.com/

Zitat von Murilo Morais <murilo@xxxxxxxxxxxxxx>:

Eugen, it worked and it didn't.

I had to bootstrap in v17.2.3, using v17.2.4 this behavior is occurring.
I did numerous tests with 3 VMs, two with disks and another only for MON,
in v17.2.4 the cluster simply crashes when one of the hosts with disk dies
even with three MONs.
I don't understand why this happened.

Em sex., 14 de out. de 2022 às 03:53, Eugen Block <eblock@xxxxxx> escreveu:

To me this sounds more like either your MONs didn't have a quorum
anymore or your clients didn't have all MONs in their ceph.conf, maybe
just the failed one? Then the issue is resolved now?

Zitat von Murilo Morais <murilo@xxxxxxxxxxxxxx>:

> Unfortunately I can't verify if ceph reports any inactive PG. As soon as
> the second host disconnects practically everything is locked, nothing
> appears even using "ceph -w". It only appears that the OSDs are offline
> when dcs2 returns.
>
> Note: Apparently there was a new update recently. When I was in the test
> environment, this behavior was not happening, dcs1 was UP with all
services
> without crashing even with dcs2 DOWN, performing reading and writing,
even
> without adding dcs3.
>
> ### COMMANDS ###
> [ceph: root@dcs1 /]# ceph osd tree
> ID  CLASS  WEIGHT    TYPE NAME       STATUS  REWEIGHT  PRI-AFF
> -1         65.49570  root default
> -3         32.74785      host dcs1
>  0    hdd   2.72899          osd.0       up   1.00000  1.00000
>  1    hdd   2.72899          osd.1       up   1.00000  1.00000
>  2    hdd   2.72899          osd.2       up   1.00000  1.00000
>  3    hdd   2.72899          osd.3       up   1.00000  1.00000
>  4    hdd   2.72899          osd.4       up   1.00000  1.00000
>  5    hdd   2.72899          osd.5       up   1.00000  1.00000
>  6    hdd   2.72899          osd.6       up   1.00000  1.00000
>  7    hdd   2.72899          osd.7       up   1.00000  1.00000
>  8    hdd   2.72899          osd.8       up   1.00000  1.00000
>  9    hdd   2.72899          osd.9       up   1.00000  1.00000
> 10    hdd   2.72899          osd.10      up   1.00000  1.00000
> 11    hdd   2.72899          osd.11      up   1.00000  1.00000
> -5         32.74785      host dcs2
> 12    hdd   2.72899          osd.12      up   1.00000  1.00000
> 13    hdd   2.72899          osd.13      up   1.00000  1.00000
> 14    hdd   2.72899          osd.14      up   1.00000  1.00000
> 15    hdd   2.72899          osd.15      up   1.00000  1.00000
> 16    hdd   2.72899          osd.16      up   1.00000  1.00000
> 17    hdd   2.72899          osd.17      up   1.00000  1.00000
> 18    hdd   2.72899          osd.18      up   1.00000  1.00000
> 19    hdd   2.72899          osd.19      up   1.00000  1.00000
> 20    hdd   2.72899          osd.20      up   1.00000  1.00000
> 21    hdd   2.72899          osd.21      up   1.00000  1.00000
> 22    hdd   2.72899          osd.22      up   1.00000  1.00000
> 23    hdd   2.72899          osd.23      up   1.00000  1.00000
>
>
> [ceph: root@dcs1 /]# ceph osd pool ls detail
> pool 1 '.mgr' replicated size 2 min_size 1 crush_rule 0 object_hash
> rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 26 flags
> hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
> pool 2 'cephfs.ovirt_hosted_engine.meta' replicated size 2 min_size 1
> crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on
> last_change 77 lfor 0/0/47 flags hashpspool stripe_width 0
> pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs
> pool 3 'cephfs.ovirt_hosted_engine.data' replicated size 2 min_size 1
> crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
> last_change 179 lfor 0/0/47 flags hashpspool max_bytes 107374182400
> stripe_width 0 application cephfs
> pool 6 '.nfs' replicated size 2 min_size 1 crush_rule 0 object_hash
> rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 254 lfor
> 0/0/252 flags hashpspool stripe_width 0 application nfs
> pool 7 'cephfs.ovirt_storage_sas.meta' replicated size 2 min_size 1
> crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on
> last_change 322 lfor 0/0/287 flags hashpspool stripe_width 0
> pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs
> pool 8 'cephfs.ovirt_storage_sas.data' replicated size 2 min_size 1
> crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
> last_change 291 lfor 0/0/289 flags hashpspool stripe_width 0 application
> cephfs
> pool 9 'cephfs.ovirt_storage_iso.meta' replicated size 2 min_size 1
> crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on
> last_change 356 lfor 0/0/325 flags hashpspool stripe_width 0
> pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs
> pool 10 'cephfs.ovirt_storage_iso.data' replicated size 2 min_size 1
> crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
> last_change 329 lfor 0/0/327 flags hashpspool stripe_width 0 application
> cephfs
>
>
> [ceph: root@dcs1 /]# ceph osd crush rule dump replicated_rule
> {
>     "rule_id": 0,
>     "rule_name": "replicated_rule",
>     "type": 1,
>     "steps": [
>         {
>             "op": "take",
>             "item": -1,
>             "item_name": "default"
>         },
>         {
>             "op": "chooseleaf_firstn",
>             "num": 0,
>             "type": "host"
>         },
>         {
>             "op": "emit"
>         }
>     ]
> }
>
>
> [ceph: root@dcs1 /]# ceph pg ls-by-pool cephfs.ovirt_hosted_engine.data
> PG    OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES      OMAP_BYTES*
>  OMAP_KEYS*  LOG    STATE         SINCE  VERSION    REPORTED   UP
>  ACTING      SCRUB_STAMP                      DEEP_SCRUB_STAMP
>     LAST_SCRUB_DURATION  SCRUB_SCHEDULING
> 3.0        69         0          0        0  285213095            0
>   0  10057  active+clean    41m  530'20632  530:39461    [1,23]p1
>  [1,23]p1  2022-10-13T03:19:33.649837+0000
2022-10-10T14:57:31.136809+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T07:24:46.314217+0000
> 3.1        58         0          0        0  242319360            0
>   0  10026  active+clean    41m  530'11926  530:21424    [6,19]p6
>  [6,19]p6  2022-10-13T02:15:23.395162+0000
2022-10-10T14:57:31.136809+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T11:42:17.682881+0000
> 3.2        71         0          0        0  294629376            0
>   0  10012  active+clean    41m  530'12312  530:25506  [10,16]p10
>  [10,16]p10  2022-10-13T06:12:48.839013+0000
>  2022-10-11T21:09:49.405860+0000                    1  periodic scrub
> scheduled @ 2022-10-14T12:35:23.917129+0000
> 3.3        63         0          0        0  262520832            0
>   0  10073  active+clean    41m  530'20204  530:42834  [13,11]p13
>  [13,11]p13  2022-10-13T01:16:17.672947+0000
>  2022-10-11T16:43:27.935298+0000                    1  periodic scrub
> scheduled @ 2022-10-14T11:48:42.643271+0000
> 3.4        59         0          0        0  240611328            0
>   0  10017  active+clean    41m  530'17883  530:32537  [10,22]p10
>  [10,22]p10  2022-10-12T22:09:09.376552+0000
>  2022-10-10T15:00:52.196397+0000                    1  periodic scrub
> scheduled @ 2022-10-14T01:16:35.682204+0000
> 3.5        67         0          0        0  281018368            0
>   0  10017  active+clean    41m  530'18825  530:31531   [18,3]p18
> [18,3]p18  2022-10-12T18:13:50.835870+0000
2022-10-10T14:57:31.136809+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T02:17:12.292237+0000
> 3.6        60         0          0        0  239497216            0
>   0  10079  active+clean    41m  530'22537  530:34790    [0,21]p0
>  [0,21]p0  2022-10-12T20:38:44.998414+0000
2022-10-10T14:57:31.136809+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T08:12:12.106892+0000
> 3.7        54         0          0        0  221261824            0
>   0  10082  active+clean    41m  530'30718  530:37349    [4,12]p4
>  [4,12]p4  2022-10-12T20:26:54.091307+0000
2022-10-10T14:57:31.136809+0000
>                    1  periodic scrub scheduled @
> 2022-10-13T20:51:54.792643+0000
> 3.8        70         0          0        0  293588992            0
>   0   4527  active+clean    41m   530'4527  530:16905  [11,21]p11
>  [11,21]p11  2022-10-13T07:16:50.226814+0000
>  2022-10-10T14:57:31.136809+0000                    1  periodic scrub
> scheduled @ 2022-10-14T13:02:27.444761+0000
> 3.9        47         0          0        0  192938407            0
>   0  10065  active+clean    41m  530'11065  530:21345  [19,11]p19
>  [19,11]p19  2022-10-13T05:05:36.274216+0000
>  2022-10-10T14:57:31.136809+0000                    1  periodic scrub
> scheduled @ 2022-10-14T08:17:25.165367+0000
> 3.a        60         0          0        0  251658240            0
>   0  10044  active+clean    41m  530'14744  530:23145   [18,1]p18
> [18,1]p18  2022-10-13T04:29:38.891055+0000
2022-10-10T14:57:31.136809+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T11:10:38.556482+0000
> 3.b        52         0          0        0  209567744            0
>   0   4949  active+clean    41m   530'4949  530:26757    [7,23]p7
>  [7,23]p7  2022-10-12T22:08:45.621201+0000
2022-10-10T15:00:36.799456+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T02:28:08.061560+0000
> 3.c        68         0          0        0  276607307            0
>   0  10003  active+clean    41m  530'18828  530:39884   [18,8]p18
> [18,8]p18  2022-10-12T18:25:36.991393+0000
2022-10-10T14:57:31.136809+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T00:43:12.804024+0000
> 3.d        67         0          0        0  272621888            0
>   0   6708  active+clean    41m   530'8359  530:33988   [13,7]p13
> [13,7]p13  2022-10-12T21:42:29.600145+0000
2022-10-10T14:57:31.136809+0000
>                    1  periodic scrub scheduled @
> 2022-10-13T23:30:29.341646+0000
> 3.e        68         0          0        0  276746240            0
>   0   5178  active+clean    41m   530'5278  530:16051   [13,1]p13
> [13,1]p13  2022-10-13T05:47:06.004714+0000
2022-10-11T21:04:57.978685+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T11:45:33.438178+0000
> 3.f        65         0          0        0  269307904            0
>   0  10056  active+clean    41m  530'34965  530:49963   [23,4]p23
> [23,4]p23  2022-10-13T08:58:09.493284+0000
2022-10-10T15:00:36.390467+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T12:18:58.610252+0000
> 3.10       66         0          0        0  271626240            0
>   0   4272  active+clean    41m   530'4431  530:19010   [12,9]p12
> [12,9]p12  2022-10-13T03:52:14.952046+0000
2022-10-10T14:57:31.136809+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T07:48:12.441144+0000
> 3.11       58         0          0        0  239075657            0
>   0   6466  active+clean    41m   530'8563  530:24677   [18,0]p18
> [18,0]p18  2022-10-12T22:25:17.255090+0000
2022-10-10T15:00:43.412084+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T03:25:34.048845+0000
> 3.12       45         0          0        0  186254336            0
>   0  10084  active+clean    41m  530'16084  530:31273    [6,14]p6
>  [6,14]p6  2022-10-13T03:05:14.109923+0000
2022-10-10T14:57:31.136809+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T03:35:11.159743+0000
> 3.13       68         0          0        0  275124224            0
>   0  10013  active+clean    41m  530'28676  530:52278   [16,8]p16
> [16,8]p16  2022-10-12T21:46:50.747741+0000
2022-10-11T16:48:56.632027+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T07:03:49.125496+0000
> 3.14       58         0          0        0  240123904            0
>   0   7531  active+clean    41m   530'8212  530:26075   [23,4]p23
> [23,4]p23  2022-10-13T04:25:39.131070+0000
2022-10-13T04:25:39.131070+0000
>                    4  periodic scrub scheduled @
> 2022-10-14T05:36:16.428326+0000
> 3.15       59         0          0        0  247382016            0
>   0   8890  active+clean    41m   530'8890  530:18892   [23,3]p23
> [23,3]p23  2022-10-13T04:45:48.156899+0000
2022-10-10T14:57:31.136809+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T14:55:14.651919+0000
> 3.16       57         0          0        0  237285376            0
>   0   6900  active+clean    41m   530'8766  530:20717   [19,9]p19
> [19,9]p19  2022-10-13T00:13:35.716060+0000
2022-10-10T14:57:31.136809+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T07:08:16.779024+0000
> 3.17       56         0          0        0  234303488            0
>   0  10012  active+clean    41m  530'21461  530:31490    [0,13]p0
>  [0,13]p0  2022-10-13T07:42:57.775955+0000
2022-10-10T14:57:31.136809+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T14:52:30.758744+0000
> 3.18       47         0          0        0  197132288            0
>   0  10001  active+clean    41m  530'14783  530:20829  [10,14]p10
>  [10,14]p10  2022-10-13T00:41:44.050740+0000
>  2022-10-10T14:57:31.136809+0000                    1  periodic scrub
> scheduled @ 2022-10-14T09:30:02.438044+0000
> 3.19       50         0          0        0  209715200            0
>   0  10058  active+clean    41m  499'19880  530:27891    [8,23]p8
>  [8,23]p8  2022-10-13T10:58:13.948274+0000
2022-10-10T14:57:31.136809+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T19:55:12.268345+0000
> 3.1a       58         0          0        0  240123904            0
>   0  10037  active+clean    41m  530'36799  530:50997   [16,9]p16
> [16,9]p16  2022-10-13T02:03:18.026427+0000
2022-10-10T14:57:31.136809+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T04:55:58.684437+0000
> 3.1b       53         0          0        0  219996160            0
>   0  10051  active+clean    41m  530'18388  530:29223    [0,22]p0
>  [0,22]p0  2022-10-12T19:19:25.675030+0000
2022-10-12T19:19:25.675030+0000
>                    4  periodic scrub scheduled @
> 2022-10-14T00:21:49.935082+0000
> 3.1c       66         0          0        0  276762624            0
>   0  10027  active+clean    41m  530'16327  530:38127   [20,5]p20
> [20,5]p20  2022-10-13T00:04:49.227288+0000
2022-10-10T15:00:38.834351+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T01:15:26.524544+0000
> 3.1d       49         0          0        0  201327104            0
>   0  10020  active+clean    41m  530'26433  530:51593   [17,9]p17
> [17,9]p17  2022-10-13T03:49:02.466987+0000
2022-10-10T14:57:31.136809+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T09:04:39.909179+0000
> 3.1e       61         0          0        0  249098595            0
>   0   8790  active+clean    41m   530'8790  530:17807    [3,21]p3
>  [3,21]p3  2022-10-12T22:28:19.417597+0000
2022-10-10T15:00:39.474873+0000
>                    1  periodic scrub scheduled @
> 2022-10-13T23:49:55.974786+0000
> 3.1f       53         0          0        0  222056448            0
>   0  10053  active+clean    41m  530'35776  530:50234    [0,15]p0
>  [0,15]p0  2022-10-13T07:16:46.787818+0000
2022-10-10T14:57:31.136809+0000
>                    1  periodic scrub scheduled @
> 2022-10-14T16:24:45.860894+0000
>
> * NOTE: Omap statistics are gathered during deep scrub and may be
> inaccurate soon afterwards depending on utilization. See
> http://docs.ceph.com/en/latest/dev/placement-group/#omap-statistics for
> further details.
>
> Em qui., 13 de out. de 2022 às 13:54, Eugen Block <eblock@xxxxxx>
escreveu:
>
>> Could you share more details? Does ceph report inactive PGs when one
>> node is down? Please share:
>> ceph osd tree
>> ceph osd pool ls detail
>> ceph osd crush rule dump <rule of affected pool>
>> ceph pg ls-by-pool <affected pool>
>> ceph -s
>>
>> Zitat von Murilo Morais <murilo@xxxxxxxxxxxxxx>:
>>
>> > Thanks for answering.
>> > Marc, but there is no mechanism to prevent IO pause? At the moment I
>> don't
>> > worry about data loss.
>> > I understand that putting it as replica x1 can work, but I need it to
be
>> x2.
>> >
>> > Em qui., 13 de out. de 2022 às 12:26, Marc <Marc@xxxxxxxxxxxxxxxxx>
>> > escreveu:
>> >
>> >>
>> >> >
>> >> > I'm having strange behavior on a new cluster.
>> >>
>> >> Not strange, by design
>> >>
>> >> > I have 3 machines, two of them have the disks. We can name them
like
>> >> > this:
>> >> > dcs1 to dcs3. The dcs1 and dcs2 machines contain the disks.
>> >> >
>> >> > I started bootstrapping through dcs1, added the other hosts and
left
>> mgr
>> >> > on
>> >> > dcs3 only.
>> >> >
>> >> > What is happening is that if I take down dcs2 everything hangs and
>> >> > becomes
>> >> > irresponsible, including the mount points that were pointed to
dcs1.
>> >>
>> >> You have to have disks in 3 machines. (Or set the replication to 1x)
>> >>
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx