On 2020-10-06 13:05, Igor Fedotov wrote: > > On 10/6/2020 1:04 PM, Kristof Coucke wrote: >> Another strange thing is going on: >> >> No client software is using the system any longer, so we would expect >> that all IOs are related to the recovery (fixing of the degraded PG). >> However, the disks that are reaching high IO are not a member of the >> PGs that are being fixed. >> >> So, something is heavily using the disk, but I can't find the process >> immediately. I've read something that there can be old client >> processes that keep on connecting to an OSD for retrieving data for a >> specific PG while that PG is no longer available on that disk. >> >> > I bet it's rather PG removal happening in background.... ^^ This, and probably the accompanying RocksDB housekeeping that goes with it. As only removing PGs shouldn't be a too big a deal at all. Especially with very small files (and a lot of them) you probably have a lot of OMAP / META data, (ceph osd df will tell you). If that's indeed the case than there is a (way) quicker option to get out of this situation: offline compacting of the OSDs. This process happens orders of magnitude faster than when the OSDs are still online. To check if this hypothesis is true: are the OSD servers under CPU stress where the PGs were located previously (and not the new hosts)? Offline compaction per host: systemctl stop ceph-osd.target for osd in `ls /var/lib/ceph/osd/`; do (ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/$osd compact &);done Gr. Stefan _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx