Re: Cephfs: large files hang

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 4 Jan 2016 07:06:06 -0800



On Fri, Jan 1, 2016 at 9:14 AM, Bryan Wright <bkw1a@xxxxxxxxxxxx> wrote:
> Gregory Farnum <gfarnum@...> writes:
>
>> Or maybe it's 0.9a, or maybe I just don't remember at all. I'm sure
>> somebody recalls...
>>
>
> I'm still struggling with this.  When copying some files from the ceph file
> system, it hangs forever.  Here's some more data:
>
>
> * Attempt to copy file.  ceph --watch-warn shows:
>
> 2016-01-01 11:16:12.637932 osd.405 [WRN] slow request 480.160153 seconds
> old, received at 2016-01-01 11:08:12.477509: osd_op(client.46686461.1:11
> 10000006479.00000004 [read 2097152~2097152 [1@-1]] 0.ca710b7 read e367378)
> currently waiting for replay end
>
> * Look for client's entry in "ceph daemon mds.0 session ls".  Here it is:
>
>     {
>         "id": 46686461,
>         "num_leases": 0,
>         "num_caps": 10332,
>         "state": "open",
>         "replay_requests": 0,
>         "reconnecting": false,
>         "inst": "client.46686461 192.168.1.180:0\/2512587758",
>         "client_metadata": {
>             "entity_id": "",
>             "hostname": "node80.galileo",
>             "kernel_version": "4.3.3-1.el6.elrepo.i686"
>         }
>     },
>
> * Look for messages in /var/log/ceph/ceph.log referring to this client:
>
> 2016-01-01 11:16:12.637917 osd.405 192.168.1.23:6823/30938 142 : cluster
> [WRN] slow request 480.184693 seconds old, received at 2016-01-01
> 11:08:12.452970: osd_op(client.46686461.1:10 10000006479.00000004 [read
> 0~2097152 [1@-1]] 0.ca710b7 read e367378) currently waiting for replay end
> 2016-01-01 11:16:12.637932 osd.405 192.168.1.23:6823/30938 143 : cluster
> [WRN] slow request 480.160153 seconds old, received at 2016-01-01
> 11:08:12.477509: osd_op(client.46686461.1:11 10000006479.00000004 [read
> 2097152~2097152 [1@-1]] 0.ca710b7 read e367378) currently waiting for replay
> end
> 2016-01-01 11:23:11.298786 mds.0 192.168.1.31:6800/19945 64 : cluster [WRN]
> slow request 7683.077077 seconds old, received at 2016-01-01
> 09:15:08.221671: client_request(client.46686461:758 readdir #1000001913d
> 2016-01-01 09:15:08.222194) currently acquired locks
> 2016-01-01 11:24:12.728794 osd.405 192.168.1.23:6823/30938 145 : cluster
> [WRN] slow request 960.275521 seconds old, received at 2016-01-01
> 11:08:12.452970: osd_op(client.46686461.1:10 10000006479.00000004 [read
> 0~2097152 [1@-1]] 0.ca710b7 read e367378) currently waiting for replay end
> 2016-01-01 11:24:12.728814 osd.405 192.168.1.23:6823/30938 146 : cluster
> [WRN] slow request 960.250982 seconds old, received at 2016-01-01
> 11:08:12.477509: osd_op(client.46686461.1:11 10000006479.00000004 [read
> 2097152~2097152 [1@-1]] 0.ca710b7 read e367378) currently waiting for replay
> end
>
>
> * Seems to refer to "0.ca710b7", which I'm guessing is either pg 0.ca,
> 0.ca7, 0.7b, 0.7b0, 0.b7 or 0.0b7.  Look for these in "ceph health detail":
>
> ceph health detail | egrep '0\.ca|0\.7b|0\.b7|0\.0b'
> pg 0.7b2 is stuck inactive since forever, current state incomplete, last
> acting [307,206]
> pg 0.7b2 is stuck unclean since forever, current state incomplete, last
> acting [307,206]
> pg 0.7b2 is incomplete, acting [307,206]
>
> OK, so no "7b" or "7b0", but is "7b2" close enough?
>
> * Take a look at osd 307 and 206.  These are both online and show no errors
> in their logs.  Why then the "stuck"?
>
> * Look at filesystem on other OSDs for "7b".  Find this:
>
> osd 102 (defunct, offline OSD disk, appears as "DNE" in "ceph osd tree"):
> drwxr-xr-x  3 root root  4096 Dec 13 12:58 0.7b_head
> drwxr-xr-x  2 root root     6 Dec 13 12:43 0.7b_TEMP
>
> osd 103:
> drwxr-xr-x  3 root root  4096 Dec 18 12:04 0.7b0_head
>
> osd 110:
> drwxr-xr-x  3 root root  4096 Dec 20 09:06 0.7b_head
>
> osd 402:
> drwxr-xr-x  3 root root  4096 Jul  1  2014 0.7b_head
>
> All of these OSDs except 102 are up and heathy.
>
>
> Where do I go from here?

What's the output of ceph pg query on that PG — do the OSDs agree with
the monitor log that it's incomplete? They should have info about why,
if os (eg, known missing log).
Based on the slow client request message, it's stuck on a PG which is
still in the replay period. See http://tracker.ceph.com/issues/13116,
which was fixed for infernalis and backported to hammer but I think
not released yet. If you reboot one of the OSDs in the PG it should
recover (this is often a good band-aid when something is busted in
peering/recovery).
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com