Re: slow "rados ls"

Stefan Kooman <stefan@xxxxxx> · Thu, 10 Sep 2020 17:15:05 +0200

On 2020-09-01 10:51, Marcel Kuiper wrote:
> As a matter of fact we did. We doubled the storage nodes from 25 to 50.
> Total osds now 460.
> 
> You want to share your thoughts on that?

OK, I'm really curious if you observed the following behaviour:

During, or shortly after the rebalance, did you see high CPU usage of
the OSDs? In particular the ones that hosted the PGs before they were
moved to the new nodes? As in ~ 300 % CPU per OSD (increasing from a few
percent to 300% non stop)? RocksDB is doing housekeeping, And we
observed before, and today again, on Mimic 13.2.8, that with a lot of
OMAP/META data the OSDs that have to clean up consume a ridiculous
amount of CPU (for hours on end). Triggering loads of slow ops and
latency spikes in the somtimes (tens) of seconds.

Are you running nautilus? If you haven't seen this behaviour this might
have been fixed in Nautlilus. Or you cluster is different from ours. We
will do PG expansion after we have upgraded to Nautilus, so we'll
definitely know by then.

Thanks,

Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx