Re: Quincy recovery load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Also never had problems with backfill / rebalance / recovery but now seen
runaway CPU usage even with very conservative recovery settings after
upgrading to quincy from pacific.

osd_recovery_sleep_hdd = 0.1
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_delay_start = 600

Tried:
osd_mclock_profile = "high_recovery_ops"
It did not help.

The CPU eventually runs away so much (regardless of config) that the OSD
gets health check problems, and causes even more problems, so I tried
nodown,noout,noscrub,nodeep-scrub
But none of that helped progress the recovery forward either.

The only way back to a health cluster for now seems to be
ceph osd set norebalance

When toggling off rebalance and the cluster is slowly finishing the
rebalances in progress, I noticed that the whole cluster has almost no IO
on the disks, except on one of the hosts 100% on a single disk is bouncing
around from disk to disk.

Example of the host with the bouncing load:
root@ceph-server-04:~# !dstat
dstat -cd --disk-util --disk-tps --net
----total-usage---- -dsk/total-
nvme-sdb--sda--sdc--sdd--sde--sdf--sdg--sdh--sdi--sdj--sdk- -dsk/total-
-net/total-
usr sys idl wai stl| read
 writ|util:util:util:util:util:util:util:util:util:util:util:util|#read
#writ| recv  send
 74  12   9   3   0|2542k  246M|7.49:99.3:   0:   0:   0:27.2:   0:99.3:
0:   0:   0:   0|   9   636 |1251k  829k
 75  11  10   3   0|  29M  254M|7.65: 101:   0:   0:74.1:20.1:   0: 101:
0:   0:   0:   0| 205   686 |4246k 7841k
 61  26   9   3   0|6340k  250M|2.81: 101:   0:   0:12.9:   0:   0:99.7:
0:   0:   0:   0|  45   660 |  35M   35M
 69  20   8   2   0|   0   243M|5.20:98.5:   0:   0:   0:   0:   0:99.7:
0:   0:   0:   0|   0   649 | 650k  442k
 71  20   8   0   0|   0   150M|5.13:87.9:   0:   0:   0:   0:   0:68.2:
0:   0:   0:   0|   0   360 | 703k  443k
 72  16  11  57   0|8168B   51M|5.18:   0:   0:   0:   0:   0:   0:1.99:
0:   0:86.5:   0|   2   129 | 702k  524k
 72  16  11   1   0|   0  5865k|7.28:   0:   0:   0:   0:   0:   0:   0:
0:   0:90.6:   0|   0    36 |1578k 1184k
 71  16  12   0   0|   0  6519k|7.25:   0:   0:   0:   0:   0:   0:   0:
0:   0: 112:   0|   0    38 | 904k  553k
 75  11  11   2   0| 522k   32M|1.96:   0:   0:   0:1.96:   0:   0:   0:
0:   0:98.5:   0|   2    81 |1022k  847k
 72  14  12   1   0|   0    60M|5.72:   0:   0:   0:   0:   0:   0:   0:
0:   0: 102:   0|   0   160 | 826k  550k
 65  19  13   2   0|   0   124M|5.57:   0:   0:99.1:   0:   0:   0:   0:
0:   0:   0:   0|   0   339 | 648k  340k
 69  17  11   2   0|   0   125M|2.82:   0:   0: 101:   0:   0:   0:   0:
0:   0:   0:   0|   0   333 | 694k  482k
 75  15   9   1   0|   0   123M|3.56:   0:   0:99.3:   0:   0:   0:   0:
0:   0:   0:   0|   0   331 |1760k 1368k
 79  10   9   1   0|   0   114M|2.01:   0:   0: 101:   0:   0:   0:   0:
0:   0:   0:   0|   0   335 | 893k  636k
 77  14   8   0   0| 685k   72M|4.41:   0:   0:82.9:   0:   0:   0:   0:
0:1.20:   0:   0|   1   195 |1590k 1482k

You can see that the "active" io host is not doing much network traffic.

The weird part is the osds on the idle machines see huge CPU load even
during periods of no IO. There are "some" explanations for that since
the cluster is completely jerasure code HDDs in k=6, m=3, but it seems
weird that such a small amount of data would be so CPU intensive to
recovery when there is no performance degradation to client operations.

My best guess is some sort of weird spin lock or equivalent waiting for
contended io on OSDs due to a changed behaviour in responses for queued
recovery operations?


Setting just:
osd_op_queue = "wpq"
fixes my cluster, now recovery going at the same speed is using on average
3-6% cpu per OSD down from 100-300%.






On Tue, Jul 12, 2022 at 7:56 PM Sridhar Seshasayee <sseshasa@xxxxxxxxxx>
wrote:

> Hi Chris,
>
> While we look into this, I have a couple of questions:
>
> 1. Did the recovery rate stay at 1 object/sec throughout? In our tests we
> have seen that
>     the rate is higher during the starting phase of recovery and eventually
> tapers off due
>     to throttling by mclock.
>
> 2. Can you try speeding up the recovery by changing to "high_recovery_ops"
> profile on
>     all the OSDs to see if it improves things (both CPU load and recovery
> rate)?
>
> 3. On the OSDs that showed high CPU usage, can you run the following
> command and
>     revert back? This just dumps the mclock settings on the OSDs.
>
>     sudo ceph daemon osd.N config show | grep osd_mclock
>
> I will update the tracker with these questions as well so that the
> discussion can
> continue there.
>
> Thanks,
> -Sridhar
>
> On Tue, Jul 12, 2022 at 4:49 PM Chris Palmer <chris.palmer@xxxxxxxxx>
> wrote:
>
> > I've created tracker https://tracker.ceph.com/issues/56530 for this,
> > including info on replicating it on another cluster.
> >
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux