On Fri, 2020-12-04 at 20:49 +0100, Stefan Kooman wrote: > On 12/3/20 5:46 PM, Jeff Layton wrote: > > On Thu, 2020-12-03 at 12:01 +0100, Stefan Kooman wrote: > > > Hi, > > > > > > We have a cephfs linux kernel (5.4.0-53-generic) workload (rsync) that > > > seems to be limited by a single ceph-msgr thread (doing close to 100% > > > cpu). We would like to investigate what this thread is so busy with. > > > What would be the easiest way to do this? On a related note: what would > > > be the best way to scale cephfs client performance for a single process > > > (if at all possible)? > > > > > > Thanks for any pointers. > > > > > > > Usually kernel profiling (a'la perf) is the way to go about this. You > > may want to consider trying more recent kernels and see if they fare any > > better. With a new enough MDS and kernel, you can try enabling async > > creates as well, and see whether that helps performance any. > > The thread is mostly busy with "build_snap_context": > > > + 94.39% 94.23% kworker/4:1-cep [kernel.kallsyms] [k] > build_snap_context > > Do I understand correctly if this code is checking for any potential > snapshots? As grepping through linux cephfs code gives a hit on snap.c > > Our cephfs filesystem has been created in Luminous, and upgraded through > Mimic to Nautilus. We have never enabled snapshot support (ceph fs set > cephfs allow_new_snaps true). But the filesystem does seem to support it > (.snap dirs present). The data rsync is processing does contain a lot of > directories. It might explain the amount of time spent in this code path. > > Would this be a plausible explanation? > > Thanks, > > Stefan Yes, that sounds plausible. You probably want to stop rsync from recursing down into .snap/ directories altogether if you have it doing that. -- Jeff Layton <jlayton@xxxxxxxxxx>