Re: 1 pg inconsistent and does not recover

Frank Schilder <frans@xxxxxx> · Wed, 28 Jun 2023 14:41:13 +0000

Hi Stefan,

after you wrote that you issue hundreds of deep-scrub commands per day I was already suspecting something like

> [...] deep_scrub daemon requests a deep-scrub [...]

Its not a Minion you hired that types these commends every so many seconds by hand and hopes for the best.  My guess is rather that you are actually running a cluster specifically configured for manual scrub scheduling as was discussed in a thread some time ago to solve the problem of "not deep scrubbed in time" messages due to the built-in scrub scheduler not using the last-scrubbed timestamp for priority (among other things).

On such a system I would not be surprised that these commands have their desired effect. To know why it works for you it would be helpful to disclose the whole story, for example what ceph config parameters are active within the context of your daemon executing the deep-scrub instructions and what other ceph-commands surround it in the same way that the pg repair is surrounded by injectargs instructions in the script I posted.

It doesn't work like that on a ceph cluster with default config. For example, on our cluster there is a very high likelihood that at least one OSD of any PG is part of a scrub at any time already. In that case, if a PG is not eligible for scrubbing because one of its OSDs has already max-scrubs (default=1) scrubs running, the reservation has no observable effect.

Some time ago I had a ceph-user thread discussing exactly that, I wanted to increase the concurrent scrubs running without increasing max-scrubs. The default scheduler seems to be very poor with ordering scrubs in such a way that a maximum number of pgs is scrubbed at any given time. One of the suggestions was to run manual scheduling, which seems exactly like what you are doing.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Stefan Kooman <stefan@xxxxxx>
Sent: Wednesday, June 28, 2023 2:17 PM
To: Frank Schilder; Alexander E. Patrakov; Niklas Hambüchen
Cc: ceph-users@xxxxxxx
Subject: Re:  Re: 1 pg inconsistent and does not recover

On 6/28/23 10:45, Frank Schilder wrote:
> Hi Stefan,
>
> we run Octopus. The deep-scrub request is (immediately) cancelled if the PG/OSD is already part of another (deep-)scrub or if some peering happens. As far as I understood, the commands osd/pg deep-scrub and pg repair do not create persistent reservations. If you issue this command, when does the PG actually start scrubbing? As soon as another one finishes or when it is its natural turn? Do you monitor the scrub order to confirm it was the manual command that initiated a scrub?

We request a deep-scrub ... a few seconds later it starts
deep-scrubbing. We do not verify in this process if the PG really did
start, but they do. See example from a PG below:

Jun 27 22:59:50 mon1 pg_scrub[2478540]: [27-06-2023 22:59:34] Scrub PG
5.48a (last deep-scrub: 2023-06-16T22:54:58.684038+0200)

^^ deep_scrub daemon requests a deep-scrub, based on latest deep-scrub
timestamp. After a couple of minutes it's deep-scrubbed. See below the
deep-scrub timestamp (info from a PG query of 5.48a):

"last_deep_scrub_stamp": "2023-06-27T23:06:01.823894+0200"

We have been using this in Octopus (actually since Luminous, but in a
different way). Now we are on Pacific.

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx