On Fri, Feb 5, 2016 at 10:19 PM, Michael Metz-Martini | SpeedPartner GmbH <metz@xxxxxxxxxxxxxxx> wrote: > Hi, > > Am 06.02.2016 um 07:15 schrieb Yan, Zheng: >>> On Feb 6, 2016, at 13:41, Michael Metz-Martini | SpeedPartner GmbH <metz@xxxxxxxxxxxxxxx> wrote: >>> Am 04.02.2016 um 15:38 schrieb Yan, Zheng: >>>>> On Feb 4, 2016, at 17:00, Michael Metz-Martini | SpeedPartner GmbH <metz@xxxxxxxxxxxxxxx> wrote: >>>>> Am 04.02.2016 um 09:43 schrieb Yan, Zheng: >>>>>> On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner >>>>>> GmbH <metz@xxxxxxxxxxxxxxx> wrote: >>>>>>> Am 03.02.2016 um 15:55 schrieb Yan, Zheng: >>>>>>>>> On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH <metz@xxxxxxxxxxxxxxx> wrote: >>>>>>>>> Am 03.02.2016 um 12:11 schrieb Yan, Zheng: >>>>>>>>>>> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH <metz@xxxxxxxxxxxxxxx> wrote: >>>>>>>>>>> Am 03.02.2016 um 10:26 schrieb Gregory Farnum: >>>>>>>>>>>> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner >>>>>>>>> 2016-02-03 14:42:25.581840 7fadfd280700 0 log_channel(default) log >>>>>>>>> [WRN] : 7 slow requests, 6 included below; oldest blocked for > >>>>>>>>> 62.125785 secs >>>>>>>>> 2016-02-03 14:42:25.581849 7fadfd280700 0 log_channel(default) log >>>>>>>>> [WRN] : slow request 62.125785 seconds old, received at 2016-02-03 >>>>>>>>> 14:41:23.455812: client_request(client.10199855:1313157 getattr >>>>>>>>> pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to >>>>>>>>> rdlock, waiting >>>>>>>> This seems like dirty page writeback is too slow. Is there any hung OSD request in /sys/kernel/debug/ceph/xxx/osdc? >>>>> Got it. http://www.michael-metz.de/osdc.txt.gz (about 500kb uncompressed) >>>> That’s quite a lot requests. Could you pick some requests in osdc, and check how long do these requests last. >>> After stopping load/access to cephfs there are a few requests left: >>> 330 osd87 5.72c3bf71 100826d5cdc.00000002 write >>> 508 osd87 5.569ad068 100826d5d18.00000000 write >>> 668 osd87 5.3db54b00 100826d5d4d.00000001 write >>> 799 osd87 5.65f8c4e0 100826d5d79.00000000 write >>> 874 osd87 5.d238da71 100826d5d98.00000000 write >>> 1023 osd87 5.705950e0 100826d5e2d.00000000 write >>> 1277 osd87 5.33673f71 100826d5f2a.00000000 write >>> 1329 osd87 5.e81ab868 100826d5f5e.00000000 write >>> 1392 osd87 5.aea1c771 100826d5f9c.00000000 write >>> >>> osd.87 is near full and currently has some pg's with backfill_toofull >>> but can this be the reason for this? >> >> Yes, it’s likely. > But "why"? > I thought that reads/writes are still possible but not replicated / > objects are degraded. As long as all the PGs are "active" they'll still accept reads/writes, but it's possible that osd 87 is just so busy that the clients are all stuck waiting for it. -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com