Re: ceph-fuse and its memory usage

"Yan, Zheng" <ukernel@xxxxxxxxx> · Thu, 22 Oct 2015 16:59:44 +0800

On Thu, Oct 22, 2015 at 4:47 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> On Tue, Oct 13, 2015 at 10:09 PM, Goncalo Borges
> <goncalo@xxxxxxxxxxxxxxxxxxx> wrote:
>> Hi all...
>>
>> Thank you for the feedback, and I am sorry for my delay in replying.
>>
>> 1./ Just to recall the problem, I was testing cephfs using fio in two
>> ceph-fuse clients:
>>
>> - Client A is in the same data center as all OSDs connected at 1 GbE
>> - Client B is in a different data center (in another city) also connected at
>> 1 GbE. However, I've seen that the connection is problematic, and sometimes,
>> the network performance is well bellow the theoretical 1 Gbps limit.
>> - Client A has 24 GB RAM + 98 GB of SWAP and client B has 48 GB of RAM + 98
>> GB of SWAP
>>
>>     and I was seeing that Client B was giving much better fio throughput
>> because it was hitting the cache much more than Client A.
>>
>> --- * ---
>>
>> 2./ I was suspecting that Client B was hitting the cache because it had bad
>> connectivity to the Ceph Cluster. I actually tried to sort that out and I
>> was able to nail down a problem in a bad switch. However, after that, I
>> still see the same behaviour which I can reproduce in a systematic way.
>>
>> --- * ---
>>
>> 3./ In a new round of tests in Client B, I've applied the following
>> procedure:
>>
>>     3.1/ This is the network statistics right before starting my fio test:
>>
>> * Printing network statistics:
>> * /sys/class/net/eth0/statistics/collisions: 0
>> * /sys/class/net/eth0/statistics/multicast: 453650
>> * /sys/class/net/eth0/statistics/rx_bytes: 437704562785
>> * /sys/class/net/eth0/statistics/rx_compressed: 0
>> * /sys/class/net/eth0/statistics/rx_crc_errors: 0
>> * /sys/class/net/eth0/statistics/rx_dropped: 0
>> * /sys/class/net/eth0/statistics/rx_errors: 0
>> * /sys/class/net/eth0/statistics/rx_fifo_errors: 0
>> * /sys/class/net/eth0/statistics/rx_frame_errors: 0
>> * /sys/class/net/eth0/statistics/rx_length_errors: 0
>> * /sys/class/net/eth0/statistics/rx_missed_errors: 0
>> * /sys/class/net/eth0/statistics/rx_over_errors: 0
>> * /sys/class/net/eth0/statistics/rx_packets: 387690140
>> * /sys/class/net/eth0/statistics/tx_aborted_errors: 0
>> * /sys/class/net/eth0/statistics/tx_bytes: 149206610455
>> * /sys/class/net/eth0/statistics/tx_carrier_errors: 0
>> * /sys/class/net/eth0/statistics/tx_compressed: 0
>> * /sys/class/net/eth0/statistics/tx_dropped: 0
>> * /sys/class/net/eth0/statistics/tx_errors: 0
>> * /sys/class/net/eth0/statistics/tx_fifo_errors: 0
>> * /sys/class/net/eth0/statistics/tx_heartbeat_errors: 0
>> * /sys/class/net/eth0/statistics/tx_packets: 241698327
>> * /sys/class/net/eth0/statistics/tx_window_errors: 0
>>
>>     3.2/ I've then launch my fio test. Please note that I am dropping caches
>> before starting the test (sync; echo 3 > /proc/sys/vm/drop_caches). My
>> current fio test has nothing fancy. Here are the options:
>>
>> # cat fio128write_ioenginelibaio_iodepth64_direct1_bs512K_20151013041036.in
>> [fio128write_ioenginelibaio_iodepth64_direct1_bs512K_20151013041036]
>> ioengine=libaio
>> iodepth=64
>> rw=write
>> bs=512K
>> direct=1
>> size=8192m
>> numjobs=128
>> filename=fio128write_ioenginelibaio_iodepth64_direct1_bs512K_20151013041036.data
>
> Oh right, so you're only using 8GB of data to write over (and you're
> hitting it a bunch of times). So if not for the direct IO flag this
> would sort of make sense.
>
> But with that, I'm very confused. There can be some annoying little
> pieces of making direct IO get passed correctly through all the FUSE
> interfaces, but I *thought* we were going through the hoops and making
> things work. Perhaps I am incorrect. Zheng, do you know anything about
> this?
>

direct IO only bypass kernel page cache. data still can be cached in
ceph-fuse. If I'm correct, the test repeatedly writes data to 8M
files. The cache make multiple write  assimilate into single OSD
write.

Regards
Yan, Zheng

>>
>>     I am no sure if it matters, but the layout of my dir is the following:
>>
>> # getfattr -n ceph.dir.layout /cephfs/sydney
>> getfattr: Removing leading '/' from absolute path names
>> # file: cephfs/sydney
>> ceph.dir.layout="stripe_unit=524288 stripe_count=8 object_size=4194304
>> pool=cephfs_dt"
>>
>>     3.3/ fio produced the following result for the aggregated bandwidth. If
>> I translate that number to Gbps, I get almost 3 Gbps which is impossible.
>>
>> # grep aggrb
>> fio128write_ioenginelibaio_iodepth64_direct1_bs512K_20151013041036.out
>>   WRITE: io=1024.0GB, aggrb=403101KB/s, minb=3149KB/s, maxb=3154KB/s,
>> mint=2659304msec, maxt=2663699msec
>>
>>     3.4 This is the network statistics immediately after the test
>>
>> * Printing network statistics:
>> * /sys/class/net/eth0/statistics/collisions: 0
>> * /sys/class/net/eth0/statistics/multicast: 454539
>> * /sys/class/net/eth0/statistics/rx_bytes: 440300506875
>> * /sys/class/net/eth0/statistics/rx_compressed: 0
>> * /sys/class/net/eth0/statistics/rx_crc_errors: 0
>> * /sys/class/net/eth0/statistics/rx_dropped: 0
>> * /sys/class/net/eth0/statistics/rx_errors: 0
>> * /sys/class/net/eth0/statistics/rx_fifo_errors: 0
>> * /sys/class/net/eth0/statistics/rx_frame_errors: 0
>> * /sys/class/net/eth0/statistics/rx_length_errors: 0
>> * /sys/class/net/eth0/statistics/rx_missed_errors: 0
>> * /sys/class/net/eth0/statistics/rx_over_errors: 0
>> * /sys/class/net/eth0/statistics/rx_packets: 423468075
>> * /sys/class/net/eth0/statistics/tx_aborted_errors: 0
>> * /sys/class/net/eth0/statistics/tx_bytes: 425580907716
>> * /sys/class/net/eth0/statistics/tx_carrier_errors: 0
>> * /sys/class/net/eth0/statistics/tx_compressed: 0
>> * /sys/class/net/eth0/statistics/tx_dropped: 0
>> * /sys/class/net/eth0/statistics/tx_errors: 0
>> * /sys/class/net/eth0/statistics/tx_fifo_errors: 0
>> * /sys/class/net/eth0/statistics/tx_heartbeat_errors: 0
>> * /sys/class/net/eth0/statistics/tx_packets: 423973681
>> * /sys/class/net/eth0/statistics/tx_window_errors: 0
>>
>>     If I just compared the tx_bytes before and after the fio test, I get
>> (425580907716−149206610455) ~ 260 GB. The whole test is supposed to use 128
>> threads writing 8 MB files each giving a total of 1024 GB. I am not sure if
>> I understand those numbers by they are not matching by a factor of 4 and I
>> also do not uinderstand how caching could compensate that difference.
>>
>> --- * ---
>>
>> 4./ During the whole process I have been monitoring ceph-fuse memory usage,
>> and this what I get at the beginning and end of the test:
>>
>> START: 4577 root      20   0 5861m  54m 4380 S 97.5  0.1   0:05.93 ceph-fuse
>> END: 4577 root      20   0 10.1g 4.5g 4412 S  0.0  9.5  30:48.27 ceph-fuse
>>
>> --- * ---
>>
>> 5./ I've tried to manipulate the ceph-fuse behaviour via client_cache_size
>> and client_oc_size (by the way, are these values given in bytes?). The
>> defaults are
>>
>> client cache size = 16384
>> client oc size = 209715200
>
> Cache size is in number of inodes. "oc size" (objectcacher size) is in
> bytes, yes.
> -Greg
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com