Re: ceph Nautilus lost two disk over night everything hangs

Rainer Krienke <krienke@xxxxxxxxxxxxxx> · Tue, 30 Mar 2021 13:05:56 +0200

Hello,

yes your assumptions are correct pxa-rbd ist the metadata pool for 
pxa-ec which uses a erasure coding 4+2 profile.

In the last hours ceph repaired most of the damage. One inactive PG 
remained and in ceph health detail then told me:

---------
HEALTH_WARN Reduced data availability: 1 pg inactive, 1 pg incomplete; 
15 daemons have recently crashed; 150 slow ops, oldest one blocked for 
26716 sec, daemons [osd.60,osd.67] have slow ops.
PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg incomplete
    pg 36.15b is remapped+incomplete, acting 
[60,2147483647,23,96,2147483647,36] (reducing pool pxa-ec min_size from 
5 may help; search ceph.com/docs for 'incomplete')
RECENT_CRASH 15 daemons have recently crashed
    osd.90 crashed on host ceph6 at 2021-03-29 21:14:10.442314Z
    osd.67 crashed on host ceph5 at 2021-03-30 02:21:23.944205Z
    osd.67 crashed on host ceph5 at 2021-03-30 01:39:14.452610Z
    osd.90 crashed on host ceph6 at 2021-03-29 21:14:24.222223Z
    osd.67 crashed on host ceph5 at 2021-03-30 02:35:43.373845Z
    osd.67 crashed on host ceph5 at 2021-03-30 01:19:58.762393Z
    osd.67 crashed on host ceph5 at 2021-03-30 02:09:42.297941Z
    osd.67 crashed on host ceph5 at 2021-03-30 02:28:29.981528Z
    osd.67 crashed on host ceph5 at 2021-03-30 01:50:05.374278Z
    osd.90 crashed on host ceph6 at 2021-03-29 21:13:51.896849Z
    osd.67 crashed on host ceph5 at 2021-03-30 02:00:22.593745Z
    osd.67 crashed on host ceph5 at 2021-03-30 01:29:39.170134Z
    osd.90 crashed on host ceph6 at 2021-03-29 21:14:38.114768Z
    osd.67 crashed on host ceph5 at 2021-03-30 00:54:06.629808Z
    osd.67 crashed on host ceph5 at 2021-03-30 01:10:21.824447Z
---------

All osds except for 67 and 90 are up and I followed the hint in health 
detail  and lowered min_size from 5 to 4 for pxa-ec. Since then ceph is 
again repairing and in between some VMs in the attached proxmox cluster 
are working again.

So I hope that after repairing all PGs are up, so that I can restart all 
VMs again.

Thanks
Rainer

Am 30.03.21 um 11:41 schrieb Eugen Block:
Hi,

from what you've sent my conclusion about the stalled I/O would be 
indeed the min_size of the EC pool.
There's only one PG reported as incomplete, I assume that is the EC 
pool, not the replicated pxa-rbd, right? Both pools are for rbd so I'm 
guessing the rbd headers are in pxa-rbd while the data is stored in 
pxa-ec, could you confirm that?

You could add 'ceph health detail' output to your question to see which 
PG is incomplete.
I assume that both down OSDs are in the acting set of the inactive PG, 
and since the pool's min_size is 5 the I/O pauses. If you can't wait for 
recovery to finish and can't bring up at least one of those OSDs you 
could set the min_size of pxa-ec to 4, but if you do, be aware that one 
more disk failure could mean data loss! So think carefully about it 
(maybe you could instead speed up recovery?) and don't forget to 
increase min_size back to 5 when the recovery has finished, that's very 
important!

Regards,
Eugen

Zitat von Rainer Krienke <krienke@xxxxxxxxxxxxxx>:

Hello,

i run a ceph Nautilus cluster with 9 hosts and 144 OSDs. Last night we 
lost two disks, so two OSDs (67,90) are down. The two disks are on two 
different hosts. A third ODS on a third host repotrts slow ops. ceph 
is repairing at the moment.

Pools affected are eg these ones:
 pool 35 'pxa-rbd' replicated size 3 min_size 2 crush_rule 0 
object_hash rjenkins pg_num 256 pgp_num 256 last_change 192082 lfor 
0/27841/27845 flags hashpspool,selfmanaged_snaps stripe_width 0 
pg_num_min 128 target_size_ratio 0.0001 application rbd

pool 36 'pxa-ec' erasure size 6 min_size 5 crush_rule 7 object_hash 
rjenkins pg_num 512 pgp_num 512 last_change 192177 lfor 
0/172580/172578 flags hashpspool,ec_overwrites,selfmanaged_snaps 
stripe_width 16384 pg_num_min 512 target_size_ratio 0.15 application rbd

At the mmoment the proxmox-cluster using storage from the seperate 
ceph cluster hangs. The ppols with date are erasure coded with the 
following profile:

crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=4
m=2
plugin=jerasure
technique=reed_sol_van
w=8

What I do not understand is why access on the virtualization seem to 
block. Could that be related to min_size of the pools cause this 
behaviour? How can I find out if this is true or what else is causing 
the blocking behaviour seen?

This is the current status:
    health: HEALTH_WARN
            Reduced data availability: 1 pg inactive, 1 pg incomplete
            Degraded data redundancy: 42384/130014984 objects degraded 
(0.033%), 4 pgs degraded, 5 pgs undersized
            15 daemons have recently crashed
            150 slow ops, oldest one blocked for 15901 sec, daemons 
[osd.60,osd.67] have slow ops.

  services:
    mon: 3 daemons, quorum ceph2,ceph5,ceph8 (age 4h)
    mgr: ceph2(active, since 7w), standbys: ceph5, ceph8, ceph-admin
    mds: cephfsrz:1 {0=ceph6=up:active} 2 up:standby
    osd: 144 osds: 142 up (since 4h), 142 in (since 5h); 6 remapped pgs

  task status:
    scrub status:
        mds.ceph6: idle

  data:
    pools:   15 pools, 2632 pgs
    objects: 21.70M objects, 80 TiB
    usage:   139 TiB used, 378 TiB / 517 TiB avail
    pgs:     0.038% pgs not active
             42384/130014984 objects degraded (0.033%)
             2623 active+clean
             3    active+undersized+degraded+remapped+backfilling
             3    active+clean+scrubbing+deep
             1    active+undersized+degraded+remapped+backfill_wait
             1    active+undersized+remapped+backfill_wait
             1    remapped+incomplete

  io:
    client:   2.2 MiB/s rd, 3.6 MiB/s wr, 8 op/s rd, 179 op/s wr
    recovery: 51 MiB/s, 12 objects/s

Thanks a lot
Rainer
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 
1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html,    ; Fax: +49261287 
1001312
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html,     Fax: +49261287 
1001312
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx