Hoping someone may be able to help point out where my bottleneck(s) may be. I have an 80TB kRBD image on an EC8:2 pool, with an XFS filesystem on top of that. This was not an ideal scenario, rather it was a rescue mission to dump a large, aging raid array before it was too late, so I'm working with the hand I was dealt. To further conflate the issues, the main directory structure consists of lots and lots of small file sizes, and deep directories. My goal is to try and rsync (or otherwise) data from the RBD to cephfs, but its just unbearably slow and will take ~150 days to transfer ~35TB, which is far from ideal. > 15.41G 79% 4.36MB/s 0:56:09 (xfr#23165, ir-chk=4061/27259) > avg-cpu: %user %nice %system %iowait %steal %idle > 0.17 0.00 1.34 13.23 0.00 85.26 > > Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz aqu-sz %util > rbd0 124.00 0.66 0.00 0.00 17.30 5.48 50.00 0.17 0.00 0.00 31.70 3.49 0.00 0.00 0.00 0.00 0.00 0.00 3.39 96.40 Rsync progress and iostat (during the rsync) from the rbd to a local ssd, to remove any bottlenecks doubling back to cephfs. About 16G in 1h, not exactly blazing, this being 5 of the 7000 directories I'm looking to offload to cephfs. Currently running 15.2.11, and the host is Ubuntu 20.04 (5.4.0-72-generic) with a single E5-2620, 64GB of memory, and 4x10GbT bond talking to ceph, iperf proves it out. EC8:2, across about 16 hosts, 240 OSDs, with 24 of those being 8TB 7.2k SAS, and the other 216 being 2TB 7.2K SATA. So there are quite a few spindles in play here. Only 128 PGs, in this pool, but its the only RBD image in this pool. Autoscaler recommends going to 512, but was hoping to avoid the performance overhead of the PG splits if possible, given perf is bad enough as is. Examining the main directory structure it looks like there are 7000 files per directory, about 60% of which are <1MiB, and in all totaling nearly 5GiB per directory. My fstab for this is: > xfs _netdev,noatime 0 0 I tried to increase the read_ahead_kb to 4M from 128K at /sys/block/rbd0/queue/read_ahead_kb to match the object/stripe size of the EC pool, but that doesn't appear to have had much of an impact. The only thing I can think of that I could possibly try as a change would be to increase the queue depth in the rbdmap up from 128, so thats my next bullet to fire. Attaching xfs_info in case there are any useful nuggets: > meta-data=/dev/rbd0 isize=256 agcount=81, agsize=268435455 blks > = sectsz=512 attr=2, projid32bit=0 > = crc=0 finobt=0, sparse=0, rmapbt=0 > = reflink=0 > data = bsize=4096 blocks=21483470848, imaxpct=5 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0, ftype=0 > log =internal log bsize=4096 blocks=32768, version=2 > = sectsz=512 sunit=0 blks, lazy-count=0 > realtime =none extsz=4096 blocks=0, rtextents=0 And rbd-info: > rbd image 'rbd-image-name: > size 85 TiB in 22282240 objects > order 22 (4 MiB objects) > snapshot_count: 0 > id: a09cac2b772af5 > data_pool: rbd-ec82-pool > block_name_prefix: rbd_data.29.a09cac2b772af5 > format: 2 > features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, data-pool > op_features: > flags: > create_timestamp: Mon Apr 12 18:44:38 2021 > access_timestamp: Mon Apr 12 18:44:38 2021 > modify_timestamp: Mon Apr 12 18:44:38 2021 Any other ideas or hints are greatly appreciated. Thanks, Reed _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx