Re: ceph-fuse CPU and Memory usage vs CephFS kclient

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 10, 2018 at 6:32 AM Wido den Hollander <wido@xxxxxxxx> wrote:
Hi,

There have been numerous threads about this in the past, but I wanted to
bring this up again in a new situation.

Running with Luminous v12.2.4 I'm seeing some odd Memory and CPU usage
when using the ceph-fuse client to mount a multi-MDS CephFS filesystem.

    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum luvil,sanomat,tide
    mgr: luvil(active), standbys: tide, sanomat
    mds: svw-2/2/2 up  {0=luvil=up:active,1=tide=up:active}, 1 up:standby
    osd: 112 osds: 111 up, 111 in

  data:
    pools:   2 pools, 4352 pgs
    objects: 85549k objects, 4415 GB
    usage:   50348 GB used, 772 TB / 821 TB avail
    pgs:     4352 active+clean

After running a rsync with millions of files (and some directories
having 1M files) a ceph-fuse process was using 44GB RSS and using
between 100% and 200% CPU usage.

Looking at this FUSE client through the admin socket the objecter was
one of my first suspects, but it claimed to only use ~300M of data in
it's case spread out over tens of thousands of files.

After unmounting and mounting again the Memory usage was gone and we
tried the rsync again, but it wasn't reproducible.

The CPU usage however is, a "simple" rsync would cause ceph-fuse to use
up to 100% CPU.

Switching to the kernel client (4.16 kernel) seems to solve this, but
the reason for using ceph-fuse in this would be the lack of a recent
kernel in Debian 9 in this case and the easiness to upgrade the FUSE client.

I've tried to disable all logging inside the FUSE client, but that
didn't help.

When checking on the FUSE client's socket I saw that rename() operations
were hanging and that's something which rsync does a lot.

At the same time I saw a getfattr() being done to the same inode by the
FUSE client, but to a different MDS:

rename(): mds rank 0
getfattr: mds rank 1

Although the kernel client seems to perform better it has the same
behavior when looking at the mdsc file in /sys

216729  mds0    create  (unsafe)
#100021abbd9/.ddd.010236269.mpeg21.a0065.folia.xml.gz.AuxBQj
(reddata2/.ddd.010236269.mpeg21.a0065.folia.xml.gz.AuxBQj)

216731  mds1    rename   #100021abbd9/ddd.010236269.mpeg21.a0065.folia.xml.gz
(reddata2/ddd.010236269.mpeg21.a0065.folia.xml.gz)
#100021abbd9/.ddd.010236269.mpeg21.a0065.folia.xml.gz.AuxBQj
(reddata2/.ddd.010236269.mpeg21.a0065.folia.xml.gz.AuxBQj)

So this is rsync talking to two MDS, one for a create and one for a rename.

Is this normal? Is this expected behavior?

If the directory got large enough to be sharded across MDSes, yes, it's expected behavior. There are filesystems that attempt to recognize rsync and change their normal behavior specifically to deal with this case, but CephFS isn't one of them (yet, anyway).

Not sure about the specifics of the client memory or CPU usage; I think you'd have to profile. rsync is a pretty pessimal CephFS workload though and I think I've heard about this before...
-Greg
 

To me it seems like that possibly the Subtree Partitioning might be
interfering here, but it wanted to double check.

Apart from that the CPU and Memory usage of the FUSE client seems very
high and that might be related to this.

Any ideas?

Thanks,

Wido
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux