Re: CephFS thrashing through the page cache

Ashu Pachauri <ashu210890@xxxxxxxxx> · Fri, 10 Mar 2023 21:41:04 +0530

Also, I am able to reproduce the network read amplification when I try to
do very small reads from larger files. e.g.

for i in $(seq 1 10000); do
  dd if=test_${i} of=/dev/null bs=5k count=10
done

This piece of code generates a network traffic of 3.3 GB while it actually
reads approx 500 MB of data.

Thanks and Regards,
Ashu Pachauri

On Fri, Mar 10, 2023 at 9:22 PM Ashu Pachauri <ashu210890@xxxxxxxxx> wrote:

> We have an internal use case where we back the storage of a proprietary
> database by a shared file system. We noticed something very odd when
> testing some workload with a local block device backed file system vs
> cephfs. We noticed that the amount of network IO done by cephfs is almost
> double compared to the IO done in case of a local file system backed by an
> attached block device.
>
> We also noticed that CephFS thrashes through the page cache very quickly
> compared to the amount of data being read and think that the two issues
> might be related. So, I wrote a simple test.
>
> 1. I wrote 10k files 400KB each using dd (approx 4 GB data).
> 2. I dropped the page cache completely.
> 3. I then read these files serially, again using dd. The page cache usage
> shot up to 39 GB for reading such a small amount of data.
>
> Following is the code used to repro this in bash:
>
> for i in $(seq 1 10000); do
>   dd if=/dev/zero of=test_${i} bs=4k count=100
> done
>
> sync; echo 1 > /proc/sys/vm/drop_caches
>
> for i in $(seq 1 10000); do
>   dd if=test_${i} of=/dev/null bs=4k count=100
> done
>
>
> The ceph version being used is:
> ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus
> (stable)
>
> The ceph configs being overriden:
> WHO       MASK  LEVEL     OPTION                                 VALUE
>    RO
>   mon           advanced  auth_allow_insecure_global_id_reclaim  false
>
>   mgr           advanced  mgr/balancer/mode                      upmap
>
>   mgr           advanced  mgr/dashboard/server_addr              127.0.0.1
>    *
>   mgr           advanced  mgr/dashboard/server_port              8443
>     *
>   mgr           advanced  mgr/dashboard/ssl                      false
>    *
>   mgr           advanced  mgr/prometheus/server_addr             0.0.0.0
>    *
>   mgr           advanced  mgr/prometheus/server_port             9283
>     *
>   osd           advanced  bluestore_compression_algorithm        lz4
>
>   osd           advanced  bluestore_compression_mode
> aggressive
>   osd           advanced  bluestore_throttle_bytes               536870912
>
>   osd           advanced  osd_max_backfills                      3
>
>   osd           advanced  osd_op_num_threads_per_shard_ssd       8
>    *
>   osd           advanced  osd_scrub_auto_repair                  true
>
>   mds           advanced  client_oc                              false
>
>   mds           advanced  client_readahead_max_bytes             4096
>
>   mds           advanced  client_readahead_max_periods           1
>
>   mds           advanced  client_readahead_min                   0
>
>   mds           basic     mds_cache_memory_limit
> 21474836480
>   client        advanced  client_oc                              false
>
>   client        advanced  client_readahead_max_bytes             4096
>
>   client        advanced  client_readahead_max_periods           1
>
>   client        advanced  client_readahead_min                   0
>
>   client        advanced  fuse_disable_pagecache                 false
>
>
> The cephfs mount options (note that readahead was disabled for this test):
> /mnt/cephfs type ceph
> (rw,relatime,name=cephfs,secret=<hidden>,acl,rasize=0)
>
> Any help or pointers are appreciated; this is a major performance issue
> for us.
>
>
> Thanks and Regards,
> Ashu Pachauri
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx