Re: Ceph file system hang

Eric Eastman <eric.eastman@xxxxxxxxxxxxxx> · Thu, 15 Jun 2017 12:14:40 -0600

On Thu, Jun 15, 2017 at 11:45 AM, David Turner <drakonstein@xxxxxxxxx> wrote:
> Have you compared performance to mounting cephfs using ceph-fuse instead of
> the kernel client?

We have tested both, and with our applications the kernel mounted file
systems have been much faster then the fuse mounted tests.

> A very interesting thing that ceph-fuse does is that an ls -lhd of a directory
> shows the directory structures size.  It's a drastically faster response
> than a du for the size of a folder.

The "du -ah" is run to scan for hangs.  We only look at the output
when there is a problem. A while ago we had a 4.9 kernel issue that
was causing hangs, so we put in the du -ah to walk the file system
hourly to report if it was hung, and left it in after we installed the
4.9.21 kernel that had the fix. Until we started running the new
application, the system had been very stable.

> If you're deleting snapshots each hour as well, that might be a place to look for odd cluster happenings as well.

Currently the file system is only 10% full, so we are not deleting any
snapshots.

Even if our application is not properly architected for a shared file
system, the file system should not hang.

Thanks,
Eric

>
> On Thu, Jun 15, 2017 at 12:39 PM Eric Eastman <eric.eastman@xxxxxxxxxxxxxx>
> wrote:
>>
>> We are running Ceph 10.2.7 and after adding a new multi-threaded
>> writer application we are seeing hangs accessing metadata from ceph
>> file system kernel mounted clients.  I have a "du -ah /cephfs" process
>> that been stuck for over 12 hours on one cephfs client system.  We
>> started seeing  hung "du -ah" processes two days ago, so yesterday we
>> upgraded the whole cluster from v10.2.5 to v10.2.7, but the problem
>> occurred again last night.  Rebooting the client fixes the problem.
>> The ceph -s command is showing HEALTH_OK
>>
>> We have four ceph file system clients, each kernel mounting our 1 ceph
>> file system to /cephfs. The "du -ah /cephfs" runs hourly within a test
>> script that is cron controlled.  If the du -ah /cephfs does not
>> complete within an hour, emails are sent to the admin group as part of
>> our monitoring process. This command normally takes less then a minute
>> to run and we have just over 3.6M files in this file system.  The du
>> -ah is hanging while accessing sub-directories where the new
>> multi-threaded writer application is writing.
>>
>> About the application: On one ceph client we are downloading external
>> data via the network and writing data as files with a python program
>> into the ceph file system. The python script can write up to 100 files
>> in parallel. The metadata hangs we are seeing can occur on one or more
>> client systems, but right now it is only hung on one system, which is
>> not the node writing the data.
>>
>> System info:
>>
>> ceph -s
>>     cluster ba0c94fc-1168-11e6-aaea-000c290cc2d4
>>      health HEALTH_OK
>>      monmap e1: 3 mons at
>>
>> {mon01=10.16.51.21:6789/0,mon02=10.16.51.22:6789/0,mon03=10.16.51.23:6789/0}
>>             election epoch 138, quorum 0,1,2 mon01,mon02,mon03
>>       fsmap e3210: 1/1/1 up {0=mds02=up:active}, 2 up:standby
>>      osdmap e33046: 85 osds: 85 up, 85 in
>>             flags sortbitwise,require_jewel_osds
>>       pgmap v27679236: 16192 pgs, 12 pools, 7655 GB data, 6591 kobjects
>>             24345 GB used, 217 TB / 241 TB avail
>>                16188 active+clean
>>                    3 active+clean+scrubbing
>>                    1 active+clean+scrubbing+deep
>>   client io 0 B/s rd, 15341 kB/s wr, 0 op/s rd, 21 op/s wr
>>
>>
>> On the hung client node, we are seeing an entry in  mdsc
>> cat /sys/kernel/debug/ceph/*/mdsc
>> 163925513 mds0 readdir #100003be2b1 kplr009658474_dr25_window.fits
>>
>> I am not seeing this on the other 3 client nodes.
>>
>> On the active metdata server, I ran:
>>
>> ceph daemon mds.mds02 dump_ops_in_flight
>>
>> every 2 seconds, as it kept changing.  Part of the output is at:
>> https://paste.fedoraproject.org/paste/OizCowo3oGzZo-cJWV5R~Q
>>
>> Info about the system
>>
>> OS: Ubuntu Trusty
>>
>> Cephfs snapshots are turned on and being created hourly
>>
>> Ceph Version
>> ceph -v
>> ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
>>
>> Kernel: Ceph Servers:
>> uname -a
>> Linux mon01 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22
>> 15:32:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>>
>> Kernel Cephfs clients:
>> uname -a
>> Linux dfgw02 4.9.21-040921-generic #201704080434 SMP Sat Apr 8
>> 08:35:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>>
>> Let me know if I should write up a ticket on this.
>>
>> Thanks
>>
>> Eric
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com