On Thu, Oct 22, 2015 at 4:47 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > On Tue, Oct 13, 2015 at 10:09 PM, Goncalo Borges > <goncalo@xxxxxxxxxxxxxxxxxxx> wrote: >> Hi all... >> >> Thank you for the feedback, and I am sorry for my delay in replying. >> >> 1./ Just to recall the problem, I was testing cephfs using fio in two >> ceph-fuse clients: >> >> - Client A is in the same data center as all OSDs connected at 1 GbE >> - Client B is in a different data center (in another city) also connected at >> 1 GbE. However, I've seen that the connection is problematic, and sometimes, >> the network performance is well bellow the theoretical 1 Gbps limit. >> - Client A has 24 GB RAM + 98 GB of SWAP and client B has 48 GB of RAM + 98 >> GB of SWAP >> >> and I was seeing that Client B was giving much better fio throughput >> because it was hitting the cache much more than Client A. >> >> --- * --- >> >> 2./ I was suspecting that Client B was hitting the cache because it had bad >> connectivity to the Ceph Cluster. I actually tried to sort that out and I >> was able to nail down a problem in a bad switch. However, after that, I >> still see the same behaviour which I can reproduce in a systematic way. >> >> --- * --- >> >> 3./ In a new round of tests in Client B, I've applied the following >> procedure: >> >> 3.1/ This is the network statistics right before starting my fio test: >> >> * Printing network statistics: >> * /sys/class/net/eth0/statistics/collisions: 0 >> * /sys/class/net/eth0/statistics/multicast: 453650 >> * /sys/class/net/eth0/statistics/rx_bytes: 437704562785 >> * /sys/class/net/eth0/statistics/rx_compressed: 0 >> * /sys/class/net/eth0/statistics/rx_crc_errors: 0 >> * /sys/class/net/eth0/statistics/rx_dropped: 0 >> * /sys/class/net/eth0/statistics/rx_errors: 0 >> * /sys/class/net/eth0/statistics/rx_fifo_errors: 0 >> * /sys/class/net/eth0/statistics/rx_frame_errors: 0 >> * /sys/class/net/eth0/statistics/rx_length_errors: 0 >> * /sys/class/net/eth0/statistics/rx_missed_errors: 0 >> * /sys/class/net/eth0/statistics/rx_over_errors: 0 >> * /sys/class/net/eth0/statistics/rx_packets: 387690140 >> * /sys/class/net/eth0/statistics/tx_aborted_errors: 0 >> * /sys/class/net/eth0/statistics/tx_bytes: 149206610455 >> * /sys/class/net/eth0/statistics/tx_carrier_errors: 0 >> * /sys/class/net/eth0/statistics/tx_compressed: 0 >> * /sys/class/net/eth0/statistics/tx_dropped: 0 >> * /sys/class/net/eth0/statistics/tx_errors: 0 >> * /sys/class/net/eth0/statistics/tx_fifo_errors: 0 >> * /sys/class/net/eth0/statistics/tx_heartbeat_errors: 0 >> * /sys/class/net/eth0/statistics/tx_packets: 241698327 >> * /sys/class/net/eth0/statistics/tx_window_errors: 0 >> >> 3.2/ I've then launch my fio test. Please note that I am dropping caches >> before starting the test (sync; echo 3 > /proc/sys/vm/drop_caches). My >> current fio test has nothing fancy. Here are the options: >> >> # cat fio128write_ioenginelibaio_iodepth64_direct1_bs512K_20151013041036.in >> [fio128write_ioenginelibaio_iodepth64_direct1_bs512K_20151013041036] >> ioengine=libaio >> iodepth=64 >> rw=write >> bs=512K >> direct=1 >> size=8192m >> numjobs=128 >> filename=fio128write_ioenginelibaio_iodepth64_direct1_bs512K_20151013041036.data > > Oh right, so you're only using 8GB of data to write over (and you're > hitting it a bunch of times). So if not for the direct IO flag this > would sort of make sense. > > But with that, I'm very confused. There can be some annoying little > pieces of making direct IO get passed correctly through all the FUSE > interfaces, but I *thought* we were going through the hoops and making > things work. Perhaps I am incorrect. Zheng, do you know anything about > this? > direct IO only bypass kernel page cache. data still can be cached in ceph-fuse. If I'm correct, the test repeatedly writes data to 8M files. The cache make multiple write assimilate into single OSD write. Regards Yan, Zheng >> >> I am no sure if it matters, but the layout of my dir is the following: >> >> # getfattr -n ceph.dir.layout /cephfs/sydney >> getfattr: Removing leading '/' from absolute path names >> # file: cephfs/sydney >> ceph.dir.layout="stripe_unit=524288 stripe_count=8 object_size=4194304 >> pool=cephfs_dt" >> >> 3.3/ fio produced the following result for the aggregated bandwidth. If >> I translate that number to Gbps, I get almost 3 Gbps which is impossible. >> >> # grep aggrb >> fio128write_ioenginelibaio_iodepth64_direct1_bs512K_20151013041036.out >> WRITE: io=1024.0GB, aggrb=403101KB/s, minb=3149KB/s, maxb=3154KB/s, >> mint=2659304msec, maxt=2663699msec >> >> 3.4 This is the network statistics immediately after the test >> >> * Printing network statistics: >> * /sys/class/net/eth0/statistics/collisions: 0 >> * /sys/class/net/eth0/statistics/multicast: 454539 >> * /sys/class/net/eth0/statistics/rx_bytes: 440300506875 >> * /sys/class/net/eth0/statistics/rx_compressed: 0 >> * /sys/class/net/eth0/statistics/rx_crc_errors: 0 >> * /sys/class/net/eth0/statistics/rx_dropped: 0 >> * /sys/class/net/eth0/statistics/rx_errors: 0 >> * /sys/class/net/eth0/statistics/rx_fifo_errors: 0 >> * /sys/class/net/eth0/statistics/rx_frame_errors: 0 >> * /sys/class/net/eth0/statistics/rx_length_errors: 0 >> * /sys/class/net/eth0/statistics/rx_missed_errors: 0 >> * /sys/class/net/eth0/statistics/rx_over_errors: 0 >> * /sys/class/net/eth0/statistics/rx_packets: 423468075 >> * /sys/class/net/eth0/statistics/tx_aborted_errors: 0 >> * /sys/class/net/eth0/statistics/tx_bytes: 425580907716 >> * /sys/class/net/eth0/statistics/tx_carrier_errors: 0 >> * /sys/class/net/eth0/statistics/tx_compressed: 0 >> * /sys/class/net/eth0/statistics/tx_dropped: 0 >> * /sys/class/net/eth0/statistics/tx_errors: 0 >> * /sys/class/net/eth0/statistics/tx_fifo_errors: 0 >> * /sys/class/net/eth0/statistics/tx_heartbeat_errors: 0 >> * /sys/class/net/eth0/statistics/tx_packets: 423973681 >> * /sys/class/net/eth0/statistics/tx_window_errors: 0 >> >> If I just compared the tx_bytes before and after the fio test, I get >> (425580907716−149206610455) ~ 260 GB. The whole test is supposed to use 128 >> threads writing 8 MB files each giving a total of 1024 GB. I am not sure if >> I understand those numbers by they are not matching by a factor of 4 and I >> also do not uinderstand how caching could compensate that difference. >> >> --- * --- >> >> 4./ During the whole process I have been monitoring ceph-fuse memory usage, >> and this what I get at the beginning and end of the test: >> >> START: 4577 root 20 0 5861m 54m 4380 S 97.5 0.1 0:05.93 ceph-fuse >> END: 4577 root 20 0 10.1g 4.5g 4412 S 0.0 9.5 30:48.27 ceph-fuse >> >> --- * --- >> >> 5./ I've tried to manipulate the ceph-fuse behaviour via client_cache_size >> and client_oc_size (by the way, are these values given in bytes?). The >> defaults are >> >> client cache size = 16384 >> client oc size = 209715200 > > Cache size is in number of inodes. "oc size" (objectcacher size) is in > bytes, yes. > -Greg > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com