Hi, this is odd. The problem with recovery when sufficiently many but less than min_size shards are present should have been resolved with osd_allow_recovery_below_min_size=true. It is really dangerous to reduce min_size below k+1 and, in fact, should never be necessary for recovery. Can you check if this option is present and set to true? If it is not working as intended, a tracker ticker might be in order. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Rainer Krienke <krienke@xxxxxxxxxxxxxx> Sent: 30 March 2021 13:05:56 To: Eugen Block; ceph-users@xxxxxxx Subject: Re: ceph Nautilus lost two disk over night everything hangs Hello, yes your assumptions are correct pxa-rbd ist the metadata pool for pxa-ec which uses a erasure coding 4+2 profile. In the last hours ceph repaired most of the damage. One inactive PG remained and in ceph health detail then told me: --------- HEALTH_WARN Reduced data availability: 1 pg inactive, 1 pg incomplete; 15 daemons have recently crashed; 150 slow ops, oldest one blocked for 26716 sec, daemons [osd.60,osd.67] have slow ops. PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg incomplete pg 36.15b is remapped+incomplete, acting [60,2147483647,23,96,2147483647,36] (reducing pool pxa-ec min_size from 5 may help; search ceph.com/docs for 'incomplete') RECENT_CRASH 15 daemons have recently crashed osd.90 crashed on host ceph6 at 2021-03-29 21:14:10.442314Z osd.67 crashed on host ceph5 at 2021-03-30 02:21:23.944205Z osd.67 crashed on host ceph5 at 2021-03-30 01:39:14.452610Z osd.90 crashed on host ceph6 at 2021-03-29 21:14:24.222223Z osd.67 crashed on host ceph5 at 2021-03-30 02:35:43.373845Z osd.67 crashed on host ceph5 at 2021-03-30 01:19:58.762393Z osd.67 crashed on host ceph5 at 2021-03-30 02:09:42.297941Z osd.67 crashed on host ceph5 at 2021-03-30 02:28:29.981528Z osd.67 crashed on host ceph5 at 2021-03-30 01:50:05.374278Z osd.90 crashed on host ceph6 at 2021-03-29 21:13:51.896849Z osd.67 crashed on host ceph5 at 2021-03-30 02:00:22.593745Z osd.67 crashed on host ceph5 at 2021-03-30 01:29:39.170134Z osd.90 crashed on host ceph6 at 2021-03-29 21:14:38.114768Z osd.67 crashed on host ceph5 at 2021-03-30 00:54:06.629808Z osd.67 crashed on host ceph5 at 2021-03-30 01:10:21.824447Z --------- All osds except for 67 and 90 are up and I followed the hint in health detail and lowered min_size from 5 to 4 for pxa-ec. Since then ceph is again repairing and in between some VMs in the attached proxmox cluster are working again. So I hope that after repairing all PGs are up, so that I can restart all VMs again. Thanks Rainer Am 30.03.21 um 11:41 schrieb Eugen Block: > Hi, > > from what you've sent my conclusion about the stalled I/O would be > indeed the min_size of the EC pool. > There's only one PG reported as incomplete, I assume that is the EC > pool, not the replicated pxa-rbd, right? Both pools are for rbd so I'm > guessing the rbd headers are in pxa-rbd while the data is stored in > pxa-ec, could you confirm that? > > You could add 'ceph health detail' output to your question to see which > PG is incomplete. > I assume that both down OSDs are in the acting set of the inactive PG, > and since the pool's min_size is 5 the I/O pauses. If you can't wait for > recovery to finish and can't bring up at least one of those OSDs you > could set the min_size of pxa-ec to 4, but if you do, be aware that one > more disk failure could mean data loss! So think carefully about it > (maybe you could instead speed up recovery?) and don't forget to > increase min_size back to 5 when the recovery has finished, that's very > important! > > Regards, > Eugen > > > Zitat von Rainer Krienke <krienke@xxxxxxxxxxxxxx>: > >> Hello, >> >> i run a ceph Nautilus cluster with 9 hosts and 144 OSDs. Last night we >> lost two disks, so two OSDs (67,90) are down. The two disks are on two >> different hosts. A third ODS on a third host repotrts slow ops. ceph >> is repairing at the moment. >> >> Pools affected are eg these ones: >> pool 35 'pxa-rbd' replicated size 3 min_size 2 crush_rule 0 >> object_hash rjenkins pg_num 256 pgp_num 256 last_change 192082 lfor >> 0/27841/27845 flags hashpspool,selfmanaged_snaps stripe_width 0 >> pg_num_min 128 target_size_ratio 0.0001 application rbd >> >> pool 36 'pxa-ec' erasure size 6 min_size 5 crush_rule 7 object_hash >> rjenkins pg_num 512 pgp_num 512 last_change 192177 lfor >> 0/172580/172578 flags hashpspool,ec_overwrites,selfmanaged_snaps >> stripe_width 16384 pg_num_min 512 target_size_ratio 0.15 application rbd >> >> At the mmoment the proxmox-cluster using storage from the seperate >> ceph cluster hangs. The ppols with date are erasure coded with the >> following profile: >> >> crush-device-class= >> crush-failure-domain=host >> crush-root=default >> jerasure-per-chunk-alignment=false >> k=4 >> m=2 >> plugin=jerasure >> technique=reed_sol_van >> w=8 >> >> What I do not understand is why access on the virtualization seem to >> block. Could that be related to min_size of the pools cause this >> behaviour? How can I find out if this is true or what else is causing >> the blocking behaviour seen? >> >> This is the current status: >> health: HEALTH_WARN >> Reduced data availability: 1 pg inactive, 1 pg incomplete >> Degraded data redundancy: 42384/130014984 objects degraded >> (0.033%), 4 pgs degraded, 5 pgs undersized >> 15 daemons have recently crashed >> 150 slow ops, oldest one blocked for 15901 sec, daemons >> [osd.60,osd.67] have slow ops. >> >> services: >> mon: 3 daemons, quorum ceph2,ceph5,ceph8 (age 4h) >> mgr: ceph2(active, since 7w), standbys: ceph5, ceph8, ceph-admin >> mds: cephfsrz:1 {0=ceph6=up:active} 2 up:standby >> osd: 144 osds: 142 up (since 4h), 142 in (since 5h); 6 remapped pgs >> >> task status: >> scrub status: >> mds.ceph6: idle >> >> data: >> pools: 15 pools, 2632 pgs >> objects: 21.70M objects, 80 TiB >> usage: 139 TiB used, 378 TiB / 517 TiB avail >> pgs: 0.038% pgs not active >> 42384/130014984 objects degraded (0.033%) >> 2623 active+clean >> 3 active+undersized+degraded+remapped+backfilling >> 3 active+clean+scrubbing+deep >> 1 active+undersized+degraded+remapped+backfill_wait >> 1 active+undersized+remapped+backfill_wait >> 1 remapped+incomplete >> >> io: >> client: 2.2 MiB/s rd, 3.6 MiB/s wr, 8 op/s rd, 179 op/s wr >> recovery: 51 MiB/s, 12 objects/s >> >> Thanks a lot >> Rainer >> -- >> Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 >> 56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 >> 1312 >> PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 >> 1001312 >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312 PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx