On Tue, Oct 13, 2015 at 10:09 PM, Goncalo Borges <goncalo@xxxxxxxxxxxxxxxxxxx> wrote: > Hi all... > > Thank you for the feedback, and I am sorry for my delay in replying. > > 1./ Just to recall the problem, I was testing cephfs using fio in two > ceph-fuse clients: > > - Client A is in the same data center as all OSDs connected at 1 GbE > - Client B is in a different data center (in another city) also connected at > 1 GbE. However, I've seen that the connection is problematic, and sometimes, > the network performance is well bellow the theoretical 1 Gbps limit. > - Client A has 24 GB RAM + 98 GB of SWAP and client B has 48 GB of RAM + 98 > GB of SWAP > > and I was seeing that Client B was giving much better fio throughput > because it was hitting the cache much more than Client A. > > --- * --- > > 2./ I was suspecting that Client B was hitting the cache because it had bad > connectivity to the Ceph Cluster. I actually tried to sort that out and I > was able to nail down a problem in a bad switch. However, after that, I > still see the same behaviour which I can reproduce in a systematic way. > > --- * --- > > 3./ In a new round of tests in Client B, I've applied the following > procedure: > > 3.1/ This is the network statistics right before starting my fio test: > > * Printing network statistics: > * /sys/class/net/eth0/statistics/collisions: 0 > * /sys/class/net/eth0/statistics/multicast: 453650 > * /sys/class/net/eth0/statistics/rx_bytes: 437704562785 > * /sys/class/net/eth0/statistics/rx_compressed: 0 > * /sys/class/net/eth0/statistics/rx_crc_errors: 0 > * /sys/class/net/eth0/statistics/rx_dropped: 0 > * /sys/class/net/eth0/statistics/rx_errors: 0 > * /sys/class/net/eth0/statistics/rx_fifo_errors: 0 > * /sys/class/net/eth0/statistics/rx_frame_errors: 0 > * /sys/class/net/eth0/statistics/rx_length_errors: 0 > * /sys/class/net/eth0/statistics/rx_missed_errors: 0 > * /sys/class/net/eth0/statistics/rx_over_errors: 0 > * /sys/class/net/eth0/statistics/rx_packets: 387690140 > * /sys/class/net/eth0/statistics/tx_aborted_errors: 0 > * /sys/class/net/eth0/statistics/tx_bytes: 149206610455 > * /sys/class/net/eth0/statistics/tx_carrier_errors: 0 > * /sys/class/net/eth0/statistics/tx_compressed: 0 > * /sys/class/net/eth0/statistics/tx_dropped: 0 > * /sys/class/net/eth0/statistics/tx_errors: 0 > * /sys/class/net/eth0/statistics/tx_fifo_errors: 0 > * /sys/class/net/eth0/statistics/tx_heartbeat_errors: 0 > * /sys/class/net/eth0/statistics/tx_packets: 241698327 > * /sys/class/net/eth0/statistics/tx_window_errors: 0 > > 3.2/ I've then launch my fio test. Please note that I am dropping caches > before starting the test (sync; echo 3 > /proc/sys/vm/drop_caches). My > current fio test has nothing fancy. Here are the options: > > # cat fio128write_ioenginelibaio_iodepth64_direct1_bs512K_20151013041036.in > [fio128write_ioenginelibaio_iodepth64_direct1_bs512K_20151013041036] > ioengine=libaio > iodepth=64 > rw=write > bs=512K > direct=1 > size=8192m > numjobs=128 > filename=fio128write_ioenginelibaio_iodepth64_direct1_bs512K_20151013041036.data Oh right, so you're only using 8GB of data to write over (and you're hitting it a bunch of times). So if not for the direct IO flag this would sort of make sense. But with that, I'm very confused. There can be some annoying little pieces of making direct IO get passed correctly through all the FUSE interfaces, but I *thought* we were going through the hoops and making things work. Perhaps I am incorrect. Zheng, do you know anything about this? > > I am no sure if it matters, but the layout of my dir is the following: > > # getfattr -n ceph.dir.layout /cephfs/sydney > getfattr: Removing leading '/' from absolute path names > # file: cephfs/sydney > ceph.dir.layout="stripe_unit=524288 stripe_count=8 object_size=4194304 > pool=cephfs_dt" > > 3.3/ fio produced the following result for the aggregated bandwidth. If > I translate that number to Gbps, I get almost 3 Gbps which is impossible. > > # grep aggrb > fio128write_ioenginelibaio_iodepth64_direct1_bs512K_20151013041036.out > WRITE: io=1024.0GB, aggrb=403101KB/s, minb=3149KB/s, maxb=3154KB/s, > mint=2659304msec, maxt=2663699msec > > 3.4 This is the network statistics immediately after the test > > * Printing network statistics: > * /sys/class/net/eth0/statistics/collisions: 0 > * /sys/class/net/eth0/statistics/multicast: 454539 > * /sys/class/net/eth0/statistics/rx_bytes: 440300506875 > * /sys/class/net/eth0/statistics/rx_compressed: 0 > * /sys/class/net/eth0/statistics/rx_crc_errors: 0 > * /sys/class/net/eth0/statistics/rx_dropped: 0 > * /sys/class/net/eth0/statistics/rx_errors: 0 > * /sys/class/net/eth0/statistics/rx_fifo_errors: 0 > * /sys/class/net/eth0/statistics/rx_frame_errors: 0 > * /sys/class/net/eth0/statistics/rx_length_errors: 0 > * /sys/class/net/eth0/statistics/rx_missed_errors: 0 > * /sys/class/net/eth0/statistics/rx_over_errors: 0 > * /sys/class/net/eth0/statistics/rx_packets: 423468075 > * /sys/class/net/eth0/statistics/tx_aborted_errors: 0 > * /sys/class/net/eth0/statistics/tx_bytes: 425580907716 > * /sys/class/net/eth0/statistics/tx_carrier_errors: 0 > * /sys/class/net/eth0/statistics/tx_compressed: 0 > * /sys/class/net/eth0/statistics/tx_dropped: 0 > * /sys/class/net/eth0/statistics/tx_errors: 0 > * /sys/class/net/eth0/statistics/tx_fifo_errors: 0 > * /sys/class/net/eth0/statistics/tx_heartbeat_errors: 0 > * /sys/class/net/eth0/statistics/tx_packets: 423973681 > * /sys/class/net/eth0/statistics/tx_window_errors: 0 > > If I just compared the tx_bytes before and after the fio test, I get > (425580907716−149206610455) ~ 260 GB. The whole test is supposed to use 128 > threads writing 8 MB files each giving a total of 1024 GB. I am not sure if > I understand those numbers by they are not matching by a factor of 4 and I > also do not uinderstand how caching could compensate that difference. > > --- * --- > > 4./ During the whole process I have been monitoring ceph-fuse memory usage, > and this what I get at the beginning and end of the test: > > START: 4577 root 20 0 5861m 54m 4380 S 97.5 0.1 0:05.93 ceph-fuse > END: 4577 root 20 0 10.1g 4.5g 4412 S 0.0 9.5 30:48.27 ceph-fuse > > --- * --- > > 5./ I've tried to manipulate the ceph-fuse behaviour via client_cache_size > and client_oc_size (by the way, are these values given in bytes?). The > defaults are > > client cache size = 16384 > client oc size = 209715200 Cache size is in number of inodes. "oc size" (objectcacher size) is in bytes, yes. -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com