Re: CephFS client-side load issues for write-/delete-heavy workloads

Janek Bevendorff <janek.bevendorff@xxxxxxxxxxxxx> · Fri, 13 Sep 2019 14:49:43 +0200

Here's some more information on this issue.

I found the MDS host not to have any load issues, but other clients who 
have the FS mounted cannot execute statfs/fstatfs on the mount, since 
the call never returns while my rsync job is running. Other syscalls 
like fstat work without problems. Thus, I can run `ls` on any folder 
with no problem at all, but I cannot execute `find`, which only lists 
the first 3-5 files and then hangs in a fstatfs call. My graphical file 
manager also hangs at statfs while the cephfs is mounted.

On 13.09.19 10:16, Janek Bevendorff wrote:
Hi,

There have been various stability issues with the MDS that I reported 
a while ago and most of them have been addressed and fixes will be 
available in upcoming patch releases. However, there also seem to be 
problems on the client side, which I have not reported so far.

Note: This report is in part inspired by a previous mail to the list 
about CephFS deletion performance 
(http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-September/036842.html 
), but since I am not quite sure if we are actually talking about the 
very same issue, I decided to start a new thread.

I tried copying 70TB of data (mostly small files like 1TB of Git 
repositories) using parallel rsync jobs. I did this first using 
fpsync, but after a while the client started locking up with ever 
increasing load. No IO from or to the mount was possible anymore and 
`sync` hung indefinitely. Even after force-unmounting the FS, I still 
had kernel processes using 100% of about half my CPU cores. Remounting 
the FS was not possible until forcefully rebooting the entire node.

I then tried parsyncfp, which is more considerate regarding the load 
and I was able to sync the whole tree without issues after setting 
`vm.dirty_background_bytes` and `vm.dirty_bytes` via `sysctl` to 1GB 
and 4GB (the defaults of 10 and 20% of total RAM are way too much for 
a machine with 128GB of memory and write-heavy workloads). Right now, 
I am running another single rsync pass, since the parallel versions 
cannot do `--delete`. To ensure this one isn't locking up my system 
either, I use the same sysctl settings and periodically run `sync` in 
the background. So far the job has been running for a day with an 
average 15m load of 2.5 on a 32-thread machine).

I am not entirely sure if this is a general kernel bug or a cephfs 
bug. I believe it may be possible to produce similar issues with other 
kernel-space remote file systems like NFS (I had that in the past), 
but generally, it seems to be much more of an issue with the cephfs 
kernel driver (at least from my experience).

I am using Nautilus 14.2.3 and a single MDS with optimized recall and 
cache trimming settings to avoid cache inflation issues caused by the 
housekeeping thread not being able to catch up (fixed in future 
releases). Switching to multiple MDSs does not seem to have an impact 
on the problem).

Cheers
Janek
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com