> What is being done is a serial tree walk and copy in 3
> replicas of all objects in the CephFS metadata pool, so it
> depends on both the read and write IOPS rate for the metadata
> pools, but mostly in the write IOPS. [...] Wild guess:
> metadata is on 10x 3.84TB SSDs without persistent cache, data
> is on 48x 8TB devices probably HDDs. Very cost effective :-).

I do not know if those guesses are right, but in general most
Ceph instances I have seen have been designed with the "cost
effective" choice of providing enough IOPS to run the user
workload (but often not even that), but not also more to be able
to run the admin workload quickly (checking, scanning,
scrubbing, migrating, 'fsck' or 'resilvering' of the underlying
filesystem). There is often a similar situation for non HPC
filesystem types, but the scale and pressure on instances of
those are usually much lower than for HPC filesystem instances,
so the consequencesa are less obvious.
