Re: Ceph file system hang

Eric Eastman <eric.eastman@xxxxxxxxxxxxxx> · Fri, 16 Jun 2017 10:27:26 -0600



I have created a ticket on this issue:
http://tracker.ceph.com/issues/20329

On Thu, Jun 15, 2017 at 12:14 PM, Eric Eastman
<eric.eastman@xxxxxxxxxxxxxx> wrote:
> On Thu, Jun 15, 2017 at 11:45 AM, David Turner <drakonstein@xxxxxxxxx> wrote:
>> Have you compared performance to mounting cephfs using ceph-fuse instead of
>> the kernel client?
>
> We have tested both, and with our applications the kernel mounted file
> systems have been much faster then the fuse mounted tests.
>
>> A very interesting thing that ceph-fuse does is that an ls -lhd of a directory
>> shows the directory structures size.  It's a drastically faster response
>> than a du for the size of a folder.
>
> The "du -ah" is run to scan for hangs.  We only look at the output
> when there is a problem. A while ago we had a 4.9 kernel issue that
> was causing hangs, so we put in the du -ah to walk the file system
> hourly to report if it was hung, and left it in after we installed the
> 4.9.21 kernel that had the fix. Until we started running the new
> application, the system had been very stable.
>
>> If you're deleting snapshots each hour as well, that might be a place to look for odd cluster happenings as well.
>
> Currently the file system is only 10% full, so we are not deleting any
> snapshots.
>
> Even if our application is not properly architected for a shared file
> system, the file system should not hang.
>
> Thanks,
> Eric
>
>>
>> On Thu, Jun 15, 2017 at 12:39 PM Eric Eastman <eric.eastman@xxxxxxxxxxxxxx>
>> wrote:
>>>
>>> We are running Ceph 10.2.7 and after adding a new multi-threaded
>>> writer application we are seeing hangs accessing metadata from ceph
>>> file system kernel mounted clients.  I have a "du -ah /cephfs" process
>>> that been stuck for over 12 hours on one cephfs client system.  We
>>> started seeing  hung "du -ah" processes two days ago, so yesterday we
>>> upgraded the whole cluster from v10.2.5 to v10.2.7, but the problem
>>> occurred again last night.  Rebooting the client fixes the problem.
>>> The ceph -s command is showing HEALTH_OK
>>>
>>> We have four ceph file system clients, each kernel mounting our 1 ceph
>>> file system to /cephfs. The "du -ah /cephfs" runs hourly within a test
>>> script that is cron controlled.  If the du -ah /cephfs does not
>>> complete within an hour, emails are sent to the admin group as part of
>>> our monitoring process. This command normally takes less then a minute
>>> to run and we have just over 3.6M files in this file system.  The du
>>> -ah is hanging while accessing sub-directories where the new
>>> multi-threaded writer application is writing.
>>>
>>> About the application: On one ceph client we are downloading external
>>> data via the network and writing data as files with a python program
>>> into the ceph file system. The python script can write up to 100 files
>>> in parallel. The metadata hangs we are seeing can occur on one or more
>>> client systems, but right now it is only hung on one system, which is
>>> not the node writing the data.
>>>
>>> System info:
>>>
>>> ceph -s
>>>     cluster ba0c94fc-1168-11e6-aaea-000c290cc2d4
>>>      health HEALTH_OK
>>>      monmap e1: 3 mons at
>>>
>>> {mon01=10.16.51.21:6789/0,mon02=10.16.51.22:6789/0,mon03=10.16.51.23:6789/0}
>>>             election epoch 138, quorum 0,1,2 mon01,mon02,mon03
>>>       fsmap e3210: 1/1/1 up {0=mds02=up:active}, 2 up:standby
>>>      osdmap e33046: 85 osds: 85 up, 85 in
>>>             flags sortbitwise,require_jewel_osds
>>>       pgmap v27679236: 16192 pgs, 12 pools, 7655 GB data, 6591 kobjects
>>>             24345 GB used, 217 TB / 241 TB avail
>>>                16188 active+clean
>>>                    3 active+clean+scrubbing
>>>                    1 active+clean+scrubbing+deep
>>>   client io 0 B/s rd, 15341 kB/s wr, 0 op/s rd, 21 op/s wr
>>>
>>>
>>> On the hung client node, we are seeing an entry in  mdsc
>>> cat /sys/kernel/debug/ceph/*/mdsc
>>> 163925513 mds0 readdir #100003be2b1 kplr009658474_dr25_window.fits
>>>
>>> I am not seeing this on the other 3 client nodes.
>>>
>>> On the active metdata server, I ran:
>>>
>>> ceph daemon mds.mds02 dump_ops_in_flight
>>>
>>> every 2 seconds, as it kept changing.  Part of the output is at:
>>> https://paste.fedoraproject.org/paste/OizCowo3oGzZo-cJWV5R~Q
>>>
>>> Info about the system
>>>
>>> OS: Ubuntu Trusty
>>>
>>> Cephfs snapshots are turned on and being created hourly
>>>
>>> Ceph Version
>>> ceph -v
>>> ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
>>>
>>> Kernel: Ceph Servers:
>>> uname -a
>>> Linux mon01 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22
>>> 15:32:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> Kernel Cephfs clients:
>>> uname -a
>>> Linux dfgw02 4.9.21-040921-generic #201704080434 SMP Sat Apr 8
>>> 08:35:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> Let me know if I should write up a ticket on this.
>>>
>>> Thanks
>>>
>>> Eric
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com