Re: ceph-fuse and its memory usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all...

Thank you for the feedback, and I am sorry for my delay in replying.

1./ Just to recall the problem, I was testing cephfs using fio in two ceph-fuse clients:
- Client A is in the same data center as all OSDs connected at 1 GbE
- Client B is in a different data center (in another city) also connected at 1 GbE. However, I've seen that the connection is problematic, and sometimes, the network performance is well bellow the theoretical 1 Gbps limit.
- Client A has 24 GB RAM + 98 GB of SWAP and client B has 48 GB of RAM + 98 GB of SWAP
    and I was seeing that Client B was giving much better fio throughput because it was hitting the cache much more than Client A.

--- * ---

2./ I was suspecting that Client B was hitting the cache because it had bad connectivity to the Ceph Cluster. I actually tried to sort that out and I was able to nail down a problem in a bad switch. However, after that, I still see the same behaviour which I can reproduce in a systematic way.

--- * ---

3./ In a new round of tests in Client B, I've applied the following procedure:

    3.1/ This is the network statistics right before starting my fio test:
* Printing network statistics:
* /sys/class/net/eth0/statistics/collisions: 0
* /sys/class/net/eth0/statistics/multicast: 453650
* /sys/class/net/eth0/statistics/rx_bytes: 437704562785
* /sys/class/net/eth0/statistics/rx_compressed: 0
* /sys/class/net/eth0/statistics/rx_crc_errors: 0
* /sys/class/net/eth0/statistics/rx_dropped: 0
* /sys/class/net/eth0/statistics/rx_errors: 0
* /sys/class/net/eth0/statistics/rx_fifo_errors: 0
* /sys/class/net/eth0/statistics/rx_frame_errors: 0
* /sys/class/net/eth0/statistics/rx_length_errors: 0
* /sys/class/net/eth0/statistics/rx_missed_errors: 0
* /sys/class/net/eth0/statistics/rx_over_errors: 0
* /sys/class/net/eth0/statistics/rx_packets: 387690140
* /sys/class/net/eth0/statistics/tx_aborted_errors: 0
* /sys/class/net/eth0/statistics/tx_bytes: 149206610455
* /sys/class/net/eth0/statistics/tx_carrier_errors: 0
* /sys/class/net/eth0/statistics/tx_compressed: 0
* /sys/class/net/eth0/statistics/tx_dropped: 0
* /sys/class/net/eth0/statistics/tx_errors: 0
* /sys/class/net/eth0/statistics/tx_fifo_errors: 0
* /sys/class/net/eth0/statistics/tx_heartbeat_errors: 0
* /sys/class/net/eth0/statistics/tx_packets: 241698327
* /sys/class/net/eth0/statistics/tx_window_errors: 0
    3.2/ I've then launch my fio test. Please note that I am dropping caches before starting the test (sync; echo 3 > /proc/sys/vm/drop_caches). My current fio test has nothing fancy. Here are the options:
# cat fio128write_ioenginelibaio_iodepth64_direct1_bs512K_20151013041036.in
[fio128write_ioenginelibaio_iodepth64_direct1_bs512K_20151013041036]
ioengine=libaio
iodepth=64
rw=write
bs=512K
direct=1
size=8192m
numjobs=128
filename=fio128write_ioenginelibaio_iodepth64_direct1_bs512K_20151013041036.data
    I am no sure if it matters, but the layout of my dir is the following:
# getfattr -n ceph.dir.layout /cephfs/sydney
getfattr: Removing leading '/' from absolute path names
# file: cephfs/sydney
ceph.dir.layout="stripe_unit=524288 stripe_count=8 object_size=4194304 pool=cephfs_dt"
    3.3/ fio produced the following result for the aggregated bandwidth. If I translate that number to Gbps, I get almost 3 Gbps which is impossible.
# grep aggrb fio128write_ioenginelibaio_iodepth64_direct1_bs512K_20151013041036.out
  WRITE: io=1024.0GB, aggrb=403101KB/s, minb=3149KB/s, maxb=3154KB/s, mint=2659304msec, maxt=2663699msec
    3.4 This is the network statistics immediately after the test
* Printing network statistics:
* /sys/class/net/eth0/statistics/collisions: 0
* /sys/class/net/eth0/statistics/multicast: 454539
* /sys/class/net/eth0/statistics/rx_bytes: 440300506875
* /sys/class/net/eth0/statistics/rx_compressed: 0
* /sys/class/net/eth0/statistics/rx_crc_errors: 0
* /sys/class/net/eth0/statistics/rx_dropped: 0
* /sys/class/net/eth0/statistics/rx_errors: 0
* /sys/class/net/eth0/statistics/rx_fifo_errors: 0
* /sys/class/net/eth0/statistics/rx_frame_errors: 0
* /sys/class/net/eth0/statistics/rx_length_errors: 0
* /sys/class/net/eth0/statistics/rx_missed_errors: 0
* /sys/class/net/eth0/statistics/rx_over_errors: 0
* /sys/class/net/eth0/statistics/rx_packets: 423468075
* /sys/class/net/eth0/statistics/tx_aborted_errors: 0
* /sys/class/net/eth0/statistics/tx_bytes: 425580907716
* /sys/class/net/eth0/statistics/tx_carrier_errors: 0
* /sys/class/net/eth0/statistics/tx_compressed: 0
* /sys/class/net/eth0/statistics/tx_dropped: 0
* /sys/class/net/eth0/statistics/tx_errors: 0
* /sys/class/net/eth0/statistics/tx_fifo_errors: 0
* /sys/class/net/eth0/statistics/tx_heartbeat_errors: 0
* /sys/class/net/eth0/statistics/tx_packets: 423973681
* /sys/class/net/eth0/statistics/tx_window_errors: 0
    If I just compared the tx_bytes before and after the fio test, I get (425580907716−149206610455) ~ 260 GB. The whole test is supposed to use 128 threads writing 8 MB files each giving a total of 1024 GB. I am not sure if I understand those numbers by they are not matching by a factor of 4 and I also do not uinderstand how caching could compensate that difference.

--- * ---

4./ During the whole process I have been monitoring ceph-fuse memory usage, and this what I get at the beginning and end of the test:

START: 4577 root      20   0 5861m  54m 4380 S 97.5  0.1   0:05.93 ceph-fuse     
END: 4577 root      20   0 10.1g 4.5g 4412 S  0.0  9.5  30:48.27 ceph-fuse
--- * ---

5./ I've tried to manipulate the ceph-fuse behaviour via client_cache_size and client_oc_size (by the way, are these values given in bytes?). The defaults are
client cache size = 16384
client oc size = 209715200
and I've decreased both by a factor of 4 but I kept seeing the same behavior.

At this point, I do not have a clear idea why this is happening.

Cheers
Goncalo


On 10/03/2015 04:03 AM, Gregory Farnum wrote:
On Fri, Oct 2, 2015 at 1:57 AM, John Spray <jspray@xxxxxxxxxx> wrote:
On Fri, Oct 2, 2015 at 2:42 AM, Goncalo Borges
<goncalo@xxxxxxxxxxxxxxxxxxx> wrote:
Dear CephFS Gurus...

I have a question regarding ceph-fuse and its memory usage.

1./ My Ceph and CephFS setups are the following:

Ceph:
a. ceph 9.0.3
b. 32 OSDs distributed in 4 servers (8 OSD per server).
c. 'osd pool default size = 3' and 'osd pool default min size = 2'
d. All servers running Centos6.7

CephFS:
e. a single mds
f. dedicated pools for data and metadata
g. clients in different locations / sites mounting CephFS via ceph-fuse
h. All servers and clients running Centos6.7

2./ I have been running fio tests in two CephFS clients:
    - Client A is in the same data center as all OSDs connected at 1 GbE
    - Client B is in a different data center (in another city) also
connected at 1 GbE. However, I've seen that the connection is problematic,
and sometimes, the network performance is well bellow the theoretical 1 Gbps
limit.
    - Client A has 24 GB RAM + 98 GB of SWAP and client B has 48 GB of RAM +
98 GB of SWAP

3./ I have been running some fio write tests (with 128 threads) in both
clients, and surprisingly, the results show that the aggregated throughput
is better for client B than client A.

CLIENT A results:
# grep agg
fio128threadsALL/fio128write_ioenginelibaio_iodepth64_direct1_bs512K_20151001015558.out
WRITE: io=1024.0GB, aggrb=114878KB/s, minb=897KB/s, maxb=1785KB/s,
mint=4697347msec, maxt=9346754msec

CLIENT B results:
#  grep agg
fio128threadsALL/fio128write_ioenginelibaio_iodepth64_direct1_bs512K_20151001015555.out
WRITE: io=1024.0GB, aggrb=483254KB/s, minb=3775KB/s, maxb=3782KB/s,
mint=2217808msec, maxt=2221896msec

4./ If I actually monitor the memory usage of ceph-fuse during the I/O
tests, I see that

CLIENT A: ceph-fuse does not seem to go behond 7GB of VMEM and 1 GB of RMEM.
CLIENT B: ceph-fuse uses 11 GB of VMEM and 7 GB of RMEM.

5./ These results make me think that caching is playing a critical role in
these results.

My questions are the following:

a./ Why CLIENT B uses more memory than CLIENT A? My hint is that there is a
network bottleneck between CLIENT B and the Ceph Cluster, and memory is more
used because of that.
This is weird, and I don't have an explanation.  I would be surprised
if network latency was influencing timing enough to create such a
dramatic difference in caching behaviour.

Are both clients running the same version of ceph-fuse and the same
version of the kernel?
Yeah at 483254KB/s (~480MB/s!) you're clearly exceeding how much the
network can actually support and are just writing into RAM. Something
in the stack is not actually forcing the data to go out to the OSDs
before being acknowledged. Check that all your fio settings are the
same and that directIO is actually doing what you expect, that you
haven't disabled any of that stuff in the kernel, etc.
-Greg


        
b/ Is the FIO write performance better in CLIENT B a consequence of the fact
that it is using more memory than client A?
Seems a reasonable inference, but it's still all very weird!

c./ Is there a parameters we can set for the CEPHFS clients to limit the
amount of memory they can use?
You can limit the caching inside ceph-fuse by setting
client_cache_size (metadata cache entries) and client_oc_size (max
data cache).  However, there'll also be some caching inside the kernel
(which you can probably control somehow but I don't know off the top
of my head).

Cheers,
John

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux