Re: ceph Nautilus lost two disk over night everything hangs

Frank Schilder <frans@xxxxxx> · Tue, 30 Mar 2021 13:34:31 +0000

Ahh, right. I saw it fixed here https://tracker.ceph.com/issues/18749 a long time ago, but it seems the back-port never happened.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>
Sent: 30 March 2021 15:23:10
To: Frank Schilder
Cc: Rainer Krienke; Eugen Block; ceph-users@xxxxxxx
Subject: Re:  Re: ceph Nautilus lost two disk over night everything hangs

I thought that recovery below min_size for EC pools wasn't expected to work until Octopus. From the Octopus release notes: "Ceph will allow recovery below min_size for Erasure coded pools, wherever possible."

Josh

On Tue, Mar 30, 2021 at 6:53 AM Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>> wrote:
Dear Rainer,

hmm, maybe the option is ignored or not implemented properly. This option set to true should have the same effect as reducing min_size *except* that new writes will not go to non-redundant storage. When reducing min-size, a critically degraded PG will accept new writes, which is the danger of data-loss mentioned before and avoided if only recovery ops are allowed on such PGs.

Can you open a tracker about your observation that reducing min-size was necessary and helped despite osd_allow_recovery_below_min_size=true?

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Rainer Krienke <krienke@xxxxxxxxxxxxxx<mailto:krienke@xxxxxxxxxxxxxx>>
Sent: 30 March 2021 13:30:00
To: Frank Schilder; Eugen Block; ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
Subject: Re:  Re: ceph Nautilus lost two disk over night everything hangs

Hello Frank,

the option is actually set. On one of my monitors:

# ceph daemon /var/run/ceph/ceph-mon.*.asok config show|grep
osd_allow_recovery_below_min_size
     "osd_allow_recovery_below_min_size": "true",

Thank you very much
Rainer

Am 30.03.21 um 13:20 schrieb Frank Schilder:
> Hi, this is odd. The problem with recovery when sufficiently many but less than min_size shards are present should have been resolved with osd_allow_recovery_below_min_size=true. It is really dangerous to reduce min_size below k+1 and, in fact, should never be necessary for recovery. Can you check if this option is present and set to true? If it is not working as intended, a tracker ticker might be in order.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>

  --
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html,     Fax: +49261287
1001312
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx