Re: A lot of pg repair, IO performance drops seriously

Frank Lee <by.yecao@xxxxxxxxx> · Sun, 30 Oct 2022 05:21:44 +0800

Hi, Eugen, Thank you very much!

I tried many ways to reduce IO load, I executed these commands:

# ceph osd set nodeep-scrub
# ceph tell osd.* injectargs --osd-scrub-sleep=120 --osd-max-scrubs=1
--osd-deep-scrub-stride=131072

Slow ops have some relief, but still more than 10 slow ops. The change of
osd-max-scrubs cannot stop the repair pg that has already started, so the
IO load is not reduced, and most osds still have delays greater than 20ms.

Finally I tried to run node by node:
# systemctl restart ceph.target

Surprisingly, because of my mistake, the restart of osd / mon triggered
degraded data redundancy, and then the pg in the
"active+clean+scrubbing+deep+repair" state changed, and began to
degraded+remapped+backfilling, IO performance began to recover quickly ,
and then backfilling is done within a few hours, and all pg in repair state
disappears.

Then I started to slowly fix the pg in the
"active+clean+scrubbing+deep+inconsistent" state, which didn't affect
performance.

I don't know if restarting osd achieves the same effect. Anyway, it solved.
Thank you for your help.

Eugen Block <eblock@xxxxxx> 于2022年10月29日周六 18:01写道：

> Hi,
>
> maybe you could try to set 'ceph osd set nodeep-scrub' and wait for
> the cluster to settle. Then you should also reduce osd_max_scrubs to 1
> or 2 to not overload the OSDs, that should resolve the slow requests
> (hopefully). The warning "Too many repaired reads on 1 OSDs" can be
> resolved later, it's probably not critical at the moment.
> If the slow requests resolve you can repair one PG at a time after
> inspecting the output of 'rados -p <POOL> list-inconsistent-obj
> <PG_ID>'.
>
>
> Zitat von Frank Lee <by.yecao@xxxxxxxxx>:
>
> > Hi again,
> >
> > My CEPH came up a while ago: 3 pgs not deep-scrubbed in time.
> >
> > I googled to increase osd_scrub_begin_hour and osd_scrub_end_hour but not
> > seems to work.
> >
> > There was a discussion on proxmox, a similar situation, he ran "ceph osd
> > repair all" and got it fixed. But it doesn't seem to work a day after I
> > execute it. When I continued searching and came across a blog, I ran:
> >
> > ceph tell osd.* injectargs --osd_max_scrubs=100
> > ceph tell mon.* injectargs --osd_max_scrubs=100
> >
> > This is the wrong start, the madness appears: pg
> > active+clean+scrubbing+deep+repair
> >
> > I lowered this configuration immediately, but it was too late. Now:
> >
> >   cluster:
> >     id: 48ff8b6e-1203-4dc8-b16e-d1e89f66e28f
> >     health: HEALTH_ERR
> >             110 scrub errors
> >             Too many repaired reads on 1 OSDs
> >             Possible data damage: 12 pgs inconsistent
> >             16 pgs not deep-scrubbed in time
> >             23 slow ops, oldest one blocked for 183 sec, daemons
> > [osd.1,osd.13,osd.14,osd.15,osd.16,osd.17,osd.18,osd.19,osd.2,osd
> > .22]...have slow ops.
> >
> >   services:
> >     mon: 3 daemons, quorum ceph-node-1,ceph-node-2,ceph-node-3 (age 5M)
> >     mgr: ceph-node-2(active, since 7M), standbys: ceph-node-1,
> ceph-node-3
> >     osd: 32 osds: 32 up (since 21h), 32 in (since 4M)
> >
> >   data:
> >     pools: 2 pools, 1025 pgs
> >     objects: 6.78M objects, 25 TiB
> >     usage: 76 TiB used, 41 TiB / 118 TiB avail
> >     pgs: 624 active+clean
> >              389 active+clean+scrubbing+deep+repair
> >              12 active+clean+scrubbing+deep+inconsistent
> >
> >   io:
> >     client: 6.9 MiB/s rd, 18 MiB/s wr, 648 op/s rd, 1.21k op/s wr
> >
> > ceph osd perf
> >                                                           Sat
> > osd  commit_latency(ms)  apply_latency(ms)
> >  31                  11                 11
> >  28                  17                 17
> >  25                   1                  1
> >  24                   5                  5
> >  21                   1                  1
> >  17                   6                  6
> >   7                   0                  0
> >  30                  16                 16
> >  29                  13                 13
> >  26                  37                 37
> >  19                   6                  6
> >   3                  12                 12
> >   2                   4                  4
> >   1                   2                  2
> >   0                  15                 15
> >  13                  27                 27
> >  15                  33                 33
> >  12                  21                 21
> >  14                  36                 36
> >  18                  15                 15
> >   9                  26                 26
> >   8                   5                  5
> >   6                   1                  1
> >   5                   1                  1
> >   4                   6                  6
> >  27                   1                  1
> >  23                   5                  5
> >  10                  11                 11
> >  11                  17                 17
> >  20                   6                  6
> >  16                   6                  6
> >  22                   0                  0
> >
> > And in the past >30 hours, except for the increase of pg inconsistent,
> the
> > number of active+clean+scrubbing+deep+repair pg has not changed.
> >
> > Now ceph configuration:
> >
> > ceph tell osd.* injectargs '--osd_scrub_begin_hour 0'
> > ceph tell osd.* injectargs '--osd_scrub_end_hour 0'
> > ceph tell mon.* injectargs '--osd_scrub_begin_hour 0'
> > ceph tell mon.* injectargs '--osd_scrub_end_hour 0'
> >
> > ceph tell osd.* injectargs '--osd_max_scrubs 10'
> > ceph tell osd.* injectargs '--osd_scrub_chunk_min 5'
> > ceph tell osd.* injectargs '--osd_scrub_chunk_max 25'
> > ceph tell osd.* injectargs '--osd_deep_scrub_stride 196608'
> > ceph tell osd.* injectargs '--osd_scrub_priority 5'
> > ceph tell osd.* injectargs '--osd_scrub_load_threshold 10'
> >
> > What should I do? Do I just have to wait for ceph to finish? Or is there
> > any way to make the repair stop? I heard that restarting the osd has an
> > effect, but I am afraid to do it now, because it may aggravate the error.
> >
> > Thanks for any suggestions!
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx