We have an internal use case where we back the storage of a proprietary database by a shared file system. We noticed something very odd when testing some workload with a local block device backed file system vs cephfs. We noticed that the amount of network IO done by cephfs is almost double compared to the IO done in case of a local file system backed by an attached block device. We also noticed that CephFS thrashes through the page cache very quickly compared to the amount of data being read and think that the two issues might be related. So, I wrote a simple test. 1. I wrote 10k files 400KB each using dd (approx 4 GB data). 2. I dropped the page cache completely. 3. I then read these files serially, again using dd. The page cache usage shot up to 39 GB for reading such a small amount of data. Following is the code used to repro this in bash: for i in $(seq 1 10000); do dd if=/dev/zero of=test_${i} bs=4k count=100 done sync; echo 1 > /proc/sys/vm/drop_caches for i in $(seq 1 10000); do dd if=test_${i} of=/dev/null bs=4k count=100 done The ceph version being used is: ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable) The ceph configs being overriden: WHO MASK LEVEL OPTION VALUE RO mon advanced auth_allow_insecure_global_id_reclaim false mgr advanced mgr/balancer/mode upmap mgr advanced mgr/dashboard/server_addr 127.0.0.1 * mgr advanced mgr/dashboard/server_port 8443 * mgr advanced mgr/dashboard/ssl false * mgr advanced mgr/prometheus/server_addr 0.0.0.0 * mgr advanced mgr/prometheus/server_port 9283 * osd advanced bluestore_compression_algorithm lz4 osd advanced bluestore_compression_mode aggressive osd advanced bluestore_throttle_bytes 536870912 osd advanced osd_max_backfills 3 osd advanced osd_op_num_threads_per_shard_ssd 8 * osd advanced osd_scrub_auto_repair true mds advanced client_oc false mds advanced client_readahead_max_bytes 4096 mds advanced client_readahead_max_periods 1 mds advanced client_readahead_min 0 mds basic mds_cache_memory_limit 21474836480 client advanced client_oc false client advanced client_readahead_max_bytes 4096 client advanced client_readahead_max_periods 1 client advanced client_readahead_min 0 client advanced fuse_disable_pagecache false The cephfs mount options (note that readahead was disabled for this test): /mnt/cephfs type ceph (rw,relatime,name=cephfs,secret=<hidden>,acl,rasize=0) Any help or pointers are appreciated; this is a major performance issue for us. Thanks and Regards, Ashu Pachauri _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx