Ceph file system hang

Eric Eastman <eric.eastman@xxxxxxxxxxxxxx> · Thu, 15 Jun 2017 10:38:58 -0600

We are running Ceph 10.2.7 and after adding a new multi-threaded
writer application we are seeing hangs accessing metadata from ceph
file system kernel mounted clients.  I have a "du -ah /cephfs" process
that been stuck for over 12 hours on one cephfs client system.  We
started seeing  hung "du -ah" processes two days ago, so yesterday we
upgraded the whole cluster from v10.2.5 to v10.2.7, but the problem
occurred again last night.  Rebooting the client fixes the problem.
The ceph -s command is showing HEALTH_OK

We have four ceph file system clients, each kernel mounting our 1 ceph
file system to /cephfs. The "du -ah /cephfs" runs hourly within a test
script that is cron controlled.  If the du -ah /cephfs does not
complete within an hour, emails are sent to the admin group as part of
our monitoring process. This command normally takes less then a minute
to run and we have just over 3.6M files in this file system.  The du
-ah is hanging while accessing sub-directories where the new
multi-threaded writer application is writing.

About the application: On one ceph client we are downloading external
data via the network and writing data as files with a python program
into the ceph file system. The python script can write up to 100 files
in parallel. The metadata hangs we are seeing can occur on one or more
client systems, but right now it is only hung on one system, which is
not the node writing the data.

System info:

ceph -s
    cluster ba0c94fc-1168-11e6-aaea-000c290cc2d4
     health HEALTH_OK
     monmap e1: 3 mons at
{mon01=10.16.51.21:6789/0,mon02=10.16.51.22:6789/0,mon03=10.16.51.23:6789/0}
            election epoch 138, quorum 0,1,2 mon01,mon02,mon03
      fsmap e3210: 1/1/1 up {0=mds02=up:active}, 2 up:standby
     osdmap e33046: 85 osds: 85 up, 85 in
            flags sortbitwise,require_jewel_osds
      pgmap v27679236: 16192 pgs, 12 pools, 7655 GB data, 6591 kobjects
            24345 GB used, 217 TB / 241 TB avail
               16188 active+clean
                   3 active+clean+scrubbing
                   1 active+clean+scrubbing+deep
  client io 0 B/s rd, 15341 kB/s wr, 0 op/s rd, 21 op/s wr

On the hung client node, we are seeing an entry in  mdsc
cat /sys/kernel/debug/ceph/*/mdsc
163925513 mds0 readdir #100003be2b1 kplr009658474_dr25_window.fits

I am not seeing this on the other 3 client nodes.

On the active metdata server, I ran:

ceph daemon mds.mds02 dump_ops_in_flight

every 2 seconds, as it kept changing.  Part of the output is at:
https://paste.fedoraproject.org/paste/OizCowo3oGzZo-cJWV5R~Q

Info about the system

OS: Ubuntu Trusty

Cephfs snapshots are turned on and being created hourly

Ceph Version
ceph -v
ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)

Kernel: Ceph Servers:
uname -a
Linux mon01 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22
15:32:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Kernel Cephfs clients:
uname -a
Linux dfgw02 4.9.21-040921-generic #201704080434 SMP Sat Apr 8
08:35:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Let me know if I should write up a ticket on this.

Thanks

Eric
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com