Re: mds0: Client X failing to respond to capability release

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 8 Feb 2016 07:46:42 -0800



On Fri, Feb 5, 2016 at 10:19 PM, Michael Metz-Martini | SpeedPartner
GmbH <metz@xxxxxxxxxxxxxxx> wrote:
> Hi,
>
> Am 06.02.2016 um 07:15 schrieb Yan, Zheng:
>>> On Feb 6, 2016, at 13:41, Michael Metz-Martini | SpeedPartner GmbH <metz@xxxxxxxxxxxxxxx> wrote:
>>> Am 04.02.2016 um 15:38 schrieb Yan, Zheng:
>>>>> On Feb 4, 2016, at 17:00, Michael Metz-Martini | SpeedPartner GmbH <metz@xxxxxxxxxxxxxxx> wrote:
>>>>> Am 04.02.2016 um 09:43 schrieb Yan, Zheng:
>>>>>> On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner
>>>>>> GmbH <metz@xxxxxxxxxxxxxxx> wrote:
>>>>>>> Am 03.02.2016 um 15:55 schrieb Yan, Zheng:
>>>>>>>>> On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH <metz@xxxxxxxxxxxxxxx> wrote:
>>>>>>>>> Am 03.02.2016 um 12:11 schrieb Yan, Zheng:
>>>>>>>>>>> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH <metz@xxxxxxxxxxxxxxx> wrote:
>>>>>>>>>>> Am 03.02.2016 um 10:26 schrieb Gregory Farnum:
>>>>>>>>>>>> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner
>>>>>>>>> 2016-02-03 14:42:25.581840 7fadfd280700  0 log_channel(default) log
>>>>>>>>> [WRN] : 7 slow requests, 6 included below; oldest blocked for >
>>>>>>>>> 62.125785 secs
>>>>>>>>> 2016-02-03 14:42:25.581849 7fadfd280700  0 log_channel(default) log
>>>>>>>>> [WRN] : slow request 62.125785 seconds old, received at 2016-02-03
>>>>>>>>> 14:41:23.455812: client_request(client.10199855:1313157 getattr
>>>>>>>>> pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to
>>>>>>>>> rdlock, waiting
>>>>>>>> This seems like dirty page writeback is too slow.  Is there any hung OSD request in /sys/kernel/debug/ceph/xxx/osdc?
>>>>> Got it. http://www.michael-metz.de/osdc.txt.gz (about 500kb uncompressed)
>>>> That’s quite a lot requests. Could you pick some requests in osdc, and check how long do these requests last.
>>> After stopping load/access to cephfs there are a few requests left:
>>> 330     osd87   5.72c3bf71      100826d5cdc.00000002            write
>>> 508     osd87   5.569ad068      100826d5d18.00000000            write
>>> 668     osd87   5.3db54b00      100826d5d4d.00000001            write
>>> 799     osd87   5.65f8c4e0      100826d5d79.00000000            write
>>> 874     osd87   5.d238da71      100826d5d98.00000000            write
>>> 1023    osd87   5.705950e0      100826d5e2d.00000000            write
>>> 1277    osd87   5.33673f71      100826d5f2a.00000000            write
>>> 1329    osd87   5.e81ab868      100826d5f5e.00000000            write
>>> 1392    osd87   5.aea1c771      100826d5f9c.00000000            write
>>>
>>> osd.87 is near full and currently has some pg's with backfill_toofull
>>> but can this be the reason for this?
>>
>> Yes, it’s likely.
> But "why"?
> I thought that reads/writes are still possible but not replicated /
> objects are degraded.

As long as all the PGs are "active" they'll still accept reads/writes,
but it's possible that osd 87 is just so busy that the clients are all
stuck waiting for it.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com