Also, I am able to reproduce the network read amplification when I try to do very small reads from larger files. e.g. for i in $(seq 1 10000); do dd if=test_${i} of=/dev/null bs=5k count=10 done This piece of code generates a network traffic of 3.3 GB while it actually reads approx 500 MB of data. Thanks and Regards, Ashu Pachauri On Fri, Mar 10, 2023 at 9:22 PM Ashu Pachauri <ashu210890@xxxxxxxxx> wrote: > We have an internal use case where we back the storage of a proprietary > database by a shared file system. We noticed something very odd when > testing some workload with a local block device backed file system vs > cephfs. We noticed that the amount of network IO done by cephfs is almost > double compared to the IO done in case of a local file system backed by an > attached block device. > > We also noticed that CephFS thrashes through the page cache very quickly > compared to the amount of data being read and think that the two issues > might be related. So, I wrote a simple test. > > 1. I wrote 10k files 400KB each using dd (approx 4 GB data). > 2. I dropped the page cache completely. > 3. I then read these files serially, again using dd. The page cache usage > shot up to 39 GB for reading such a small amount of data. > > Following is the code used to repro this in bash: > > for i in $(seq 1 10000); do > dd if=/dev/zero of=test_${i} bs=4k count=100 > done > > sync; echo 1 > /proc/sys/vm/drop_caches > > for i in $(seq 1 10000); do > dd if=test_${i} of=/dev/null bs=4k count=100 > done > > > The ceph version being used is: > ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus > (stable) > > The ceph configs being overriden: > WHO MASK LEVEL OPTION VALUE > RO > mon advanced auth_allow_insecure_global_id_reclaim false > > mgr advanced mgr/balancer/mode upmap > > mgr advanced mgr/dashboard/server_addr 127.0.0.1 > * > mgr advanced mgr/dashboard/server_port 8443 > * > mgr advanced mgr/dashboard/ssl false > * > mgr advanced mgr/prometheus/server_addr 0.0.0.0 > * > mgr advanced mgr/prometheus/server_port 9283 > * > osd advanced bluestore_compression_algorithm lz4 > > osd advanced bluestore_compression_mode > aggressive > osd advanced bluestore_throttle_bytes 536870912 > > osd advanced osd_max_backfills 3 > > osd advanced osd_op_num_threads_per_shard_ssd 8 > * > osd advanced osd_scrub_auto_repair true > > mds advanced client_oc false > > mds advanced client_readahead_max_bytes 4096 > > mds advanced client_readahead_max_periods 1 > > mds advanced client_readahead_min 0 > > mds basic mds_cache_memory_limit > 21474836480 > client advanced client_oc false > > client advanced client_readahead_max_bytes 4096 > > client advanced client_readahead_max_periods 1 > > client advanced client_readahead_min 0 > > client advanced fuse_disable_pagecache false > > > The cephfs mount options (note that readahead was disabled for this test): > /mnt/cephfs type ceph > (rw,relatime,name=cephfs,secret=<hidden>,acl,rasize=0) > > Any help or pointers are appreciated; this is a major performance issue > for us. > > > Thanks and Regards, > Ashu Pachauri > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx