Re: Investigate busy ceph-msgr worker thread

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/3/20 5:46 PM, Jeff Layton wrote:
On Thu, 2020-12-03 at 12:01 +0100, Stefan Kooman wrote:
Hi,

We have a cephfs linux kernel (5.4.0-53-generic) workload (rsync) that
seems to be limited by a single ceph-msgr thread (doing close to 100%
cpu). We would like to investigate what this thread is so busy with.
What would be the easiest way to do this? On a related note: what would
be the best way to scale cephfs client performance for a single process
(if at all possible)?

Thanks for any pointers.


Usually kernel profiling (a'la perf) is the way to go about this. You
may want to consider trying more recent kernels and see if they fare any
better. With a new enough MDS and kernel, you can try enabling async
creates as well, and see whether that helps performance any.

The thread is mostly busy with "build_snap_context":


+ 94.39% 94.23% kworker/4:1-cep [kernel.kallsyms] [k] build_snap_context

Do I understand correctly if this code is checking for any potential snapshots? As grepping through linux cephfs code gives a hit on snap.c

Our cephfs filesystem has been created in Luminous, and upgraded through Mimic to Nautilus. We have never enabled snapshot support (ceph fs set cephfs allow_new_snaps true). But the filesystem does seem to support it (.snap dirs present). The data rsync is processing does contain a lot of directories. It might explain the amount of time spent in this code path.

Would this be a plausible explanation?

Thanks,

Stefan



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux