Re: why does 3 copies take so much more time than 2?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Charles,


Going from 40s to 4.5m seems excessive to me at least.  Can you tell if the drives or OSDs are hitting their limits?  Tools like iostart, sar, or collectl might help.


Longer answer: There are a couple of potential issues.  One is that you are bound by the latency of writing the slowest copy of the data. IE let's say that you have a 25% chance of having a slow write when writing a copy of data.  Depending on the replication factor, that might result in a higher chance of any given replica write slowing down the whole write:


1x: 25%

2x: 100% - (100%-25%)^2 = 43.75%

3x: 100% - (100%-25%)^3 = 57.8%


That only tells part of the story though.  In the 2x and 3x cases, you are not just dealing with a potentially higher probability of hitting a high latency event, but you are working the system harder at the same time.  There's more work for the drives, more metadata for RocksDB, more network traffic, and more work for the async msgr threads.  If you are using multiple active/active MDSes, the behavior of the dynamic subtree partitioning can be somewhat volatile as well.  The trick is likely going to be to figure out what it is that's holding you back and whether or not it's a local phenomena (slow drive/node) or global.


Mark


On 1/4/23 13:32, Charles Hedrick wrote:
I'm testing cephfs. I have 3 nodes, with 2 hard disks and one ssd on each. cephfs is set to put metadata on ssd and data on hdd.

With the two pools set size = 3, untar'ing a 19 G file with 90K files in it takes 4.5 minutes.
With size = 2, it takes 40 sec. (The tar file is stored in a file system that's in memory.)

Is that expected?

This is the current version of ceph, deployed with cephadm. The only non-default setup is allocating metadata to ssd and data to hdd.

   data_devices:
     rotational: 1
   db_devices:
     rotational: 0

ceph osd crush rule create-replicated replicated_hdd default host hdd
ceph osd crush rule create-replicated replicated_ssd default host ssd
ceph osd pool set cephfs.main.data crush_rule replicated_hdd
ceph osd pool set cephfs.main.meta crush_rule replicated_ssd





_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux