Re: Unexplainable slow request

Christian Balzer <chibi@xxxxxxx> · Tue, 9 Dec 2014 13:23:43 +0900

Hello,

On Mon, 8 Dec 2014 19:51:00 -0800 Gregory Farnum wrote:

> On Mon, Dec 8, 2014 at 6:39 PM, Christian Balzer <chibi@xxxxxxx> wrote:
> >
> > Hello,
> >
> > Debian Jessie cluster, thus kernel 3.16, ceph 0.80.7.
> > 3 storage nodes with 8 OSDs (journals on 4 SSDs) each, 3 mons.
> > 2 compute nodes, everything connected via Infiniband.
> >
> > This is pre-production, currently there are only 3 VMs and 2 of them
> > were idle at the time. The non-idle one was having 600GB of maildirs
> > copied onto it, which stresses things but not Ceph as those millions
> > of small files coalesce nicely and result in rather few Ceph ops.
> >
> > A couple of hours into that copy marathon (the source FS and machine
> > are slow and rsync isn't particular speedy with this kind of operation
> > either) this happened:
> > ---
> > 2014-12-06 19:20:57.023974 osd.23 10.0.8.23:6815/3552 77 : [WRN] slow
> > request 30 .673939 seconds old, received at 2014-12-06
> > 19:20:26.346746: osd_op(client.33776 .0:743596
> > rb.0.819b.238e1f29.00000003f52f [set-alloc-hint object_size 4194304 wr
> > ite_size 4194304,write 1748992~4096] 3.efa97e35 ack+ondisk+write e380)
> > v4 curren tly waiting for subops from 4,8 2014-12-06 19:20:57.023991
> > osd.23 10.0.8.23:6815/3552 78 : [WRN] slow request 30 .673886 seconds
> > old, received at 2014-12-06 19:20:26.346799:
> > osd_op(client.33776 .0:743597 rb.0.819b.238e1f29.00000003f52f
> > [set-alloc-hint object_size 4194304 wr ite_size 4194304,write
> > 1945600~4096] 3.efa97e35 ack+ondisk+write e380) v4 curren tly waiting
> > for subops from 4,8 2014-12-06 19:20:57.323976 osd.1
> > 10.0.8.21:6815/4868 123 : [WRN] slow request 30 .910821 seconds old,
> > received at 2014-12-06 19:20:26.413051: osd_op(client.33776 .0:743604
> > rb.0.819b.238e1f29.00000003e628 [set-alloc-hint object_size 4194304 wr
> > ite_size 4194304,write 1794048~1835008] 3.5e76b8ba ack+ondisk+write
> > e380) v4 cur rently waiting for subops from 8,17 ---
> >
> > There were a few more later, but they all involved OSD 8 as common
> > factor.
> >
> > Alas there's nothing in the osd-8.log indicating why:
> > ---
> > 2014-12-06 19:13:13.933636 7fce85552700  0 -- 10.0.8.22:6835/5389 >>
> > 10.0.8.6:0/ 716350435 pipe(0x7fcec3c25900 sd=23 :6835 s=0 pgs=0 cs=0
> > l=0 c=0x7fcebfad03c0).a ccept peer addr is really 10.0.8.6:0/716350435
> > (socket is 10.0.8.6:50592/0) 2014-12-06 19:20:56.595773 7fceac82f700
> > 0 log [WRN] : 3 slow requests, 3 included below; oldest blocked for >
> > 30.241397 secs 2014-12-06 19:20:56.595796 7fceac82f700  0 log [WRN] :
> > slow request 30.241397 seconds old, received at 2014-12-06
> > 19:20:26.354247: osd_sub_op(client.33776.0:743596 3.235
> > efa97e35/rb.0.819b.238e1f29.00000003f52f/head//3 [] v 380'3783
> > snapset=0=[]:[] snapc=0=[]) v11 currently started 2014-12-06
> > 19:20:56.595825 7fceac82f700  0 log [WRN] : slow request 30.240286
> > seconds old, received at 2014-12-06 19:20:26.355358:
> > osd_sub_op(client.33776.0:743597 3.235
> > efa97e35/rb.0.819b.238e1f29.00000003f52f/head//3 [] v 380'3784
> > snapset=0=[]:[] snapc=0=[]) v11 currently started 2014-12-06
> > 19:20:56.595837 7fceac82f700  0 log [WRN] : slow request 30.177186
> > seconds old, received at 2014-12-06 19:20:26.418458:
> > osd_sub_op(client.33776.0:743604 3.ba
> > 5e76b8ba/rb.0.819b.238e1f29.00000003e628/head//3 [] v 380'6439
> > snapset=0=[]:[] snapc=0=[]) v11 currently started ----
> 
> That these are started and nothing else suggests that they're probably
> waiting for one of the throttles to let them in, rather than
> themselves being particularly slow.
>

If this was indeed caused by one of the (rather numerous) throttles,
wouldn't it be a good idea to log that fact? 
A slow disk is one thing, Ceph permanently seizing up because something
exceeded a threshold sounds noteworthy to me.

> >
> > The HDDs and SSDs are new, there's nothing in the pertinent logs or
> > smart that indicates any problem with that HDD or its journal SSD, nor
> > the system in general.
> > This problem persisted (and the VM remained stuck) until OSD 8 was
> > restarted the next day when I discovered this.
> >
> > I suppose this is another "this can't/shouldn't happen" case, but I'd
> > be delighted about any suggestions as to what happened here, potential
> > prevention measures and any insights on how to maybe coax more
> > information out of Ceph if this happens again.
> 
> Nah, there are a million reasons stuff can be slow. 
This wasn't just slow. Those requests never completed even after half a
day had passed with the system and disks being basically idle.

> It might just be a
> transient overload of the disk compared to the others. 

Transient would be fine (though highly unlikely in this scenario), however
it never recovered, see above.

> If you see this
> again while it's happening I'd check the perfcounters; if you're
> keeping historical checks of them go look at the blocked-up times and
> see if any of them are at or near their maximum values.
> -Gre
> 
Ah, I should have thought of that before re-starting that OSD.
dump_historic_ops is at the default values, so that information is long gone.

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com