On Mon, 8 Dec 2014 20:36:17 -0800 Gregory Farnum wrote: > They never fixed themselves? As I wrote, it took a restart of OSD 8 to resolve this on the next day. > Did the reported times ever increase? Indeed, the last before the reboot was: --- 2014-12-07 13:12:42.933396 7fceac82f700 0 log [WRN] : 14 slow requests, 5 included below; oldest blocked for > 64336.578995 secs --- All IOPS hitting that osd.8 (eventually the other VM did as well during a log write I suppose) were blocked. > If not I think that's just a reporting bug which is fixed in an > unreleased branch, but I'd have to check the tracker to be sure. > > On Mon, Dec 8, 2014 at 8:23 PM, Christian Balzer <chibi@xxxxxxx> wrote: > > > > Hello, > > > > On Mon, 8 Dec 2014 19:51:00 -0800 Gregory Farnum wrote: > > > >> On Mon, Dec 8, 2014 at 6:39 PM, Christian Balzer <chibi@xxxxxxx> > >> wrote: > >> > > >> > Hello, > >> > > >> > Debian Jessie cluster, thus kernel 3.16, ceph 0.80.7. > >> > 3 storage nodes with 8 OSDs (journals on 4 SSDs) each, 3 mons. > >> > 2 compute nodes, everything connected via Infiniband. > >> > > >> > This is pre-production, currently there are only 3 VMs and 2 of them > >> > were idle at the time. The non-idle one was having 600GB of maildirs > >> > copied onto it, which stresses things but not Ceph as those millions > >> > of small files coalesce nicely and result in rather few Ceph ops. > >> > > >> > A couple of hours into that copy marathon (the source FS and machine > >> > are slow and rsync isn't particular speedy with this kind of > >> > operation either) this happened: > >> > --- > >> > 2014-12-06 19:20:57.023974 osd.23 10.0.8.23:6815/3552 77 : [WRN] > >> > slow request 30 .673939 seconds old, received at 2014-12-06 > >> > 19:20:26.346746: osd_op(client.33776 .0:743596 > >> > rb.0.819b.238e1f29.00000003f52f [set-alloc-hint object_size 4194304 > >> > wr ite_size 4194304,write 1748992~4096] 3.efa97e35 ack+ondisk+write > >> > e380) v4 curren tly waiting for subops from 4,8 2014-12-06 > >> > 19:20:57.023991 osd.23 10.0.8.23:6815/3552 78 : [WRN] slow request > >> > 30 .673886 seconds old, received at 2014-12-06 19:20:26.346799: > >> > osd_op(client.33776 .0:743597 rb.0.819b.238e1f29.00000003f52f > >> > [set-alloc-hint object_size 4194304 wr ite_size 4194304,write > >> > 1945600~4096] 3.efa97e35 ack+ondisk+write e380) v4 curren tly > >> > waiting for subops from 4,8 2014-12-06 19:20:57.323976 osd.1 > >> > 10.0.8.21:6815/4868 123 : [WRN] slow request 30 .910821 seconds old, > >> > received at 2014-12-06 19:20:26.413051: > >> > osd_op(client.33776 .0:743604 rb.0.819b.238e1f29.00000003e628 > >> > [set-alloc-hint object_size 4194304 wr ite_size 4194304,write > >> > 1794048~1835008] 3.5e76b8ba ack+ondisk+write e380) v4 cur rently > >> > waiting for subops from 8,17 --- > >> > > >> > There were a few more later, but they all involved OSD 8 as common > >> > factor. > >> > > >> > Alas there's nothing in the osd-8.log indicating why: > >> > --- > >> > 2014-12-06 19:13:13.933636 7fce85552700 0 -- 10.0.8.22:6835/5389 >> > >> > 10.0.8.6:0/ 716350435 pipe(0x7fcec3c25900 sd=23 :6835 s=0 pgs=0 cs=0 > >> > l=0 c=0x7fcebfad03c0).a ccept peer addr is really > >> > 10.0.8.6:0/716350435 (socket is 10.0.8.6:50592/0) 2014-12-06 > >> > 19:20:56.595773 7fceac82f700 0 log [WRN] : 3 slow requests, 3 > >> > included below; oldest blocked for > 30.241397 secs 2014-12-06 > >> > 19:20:56.595796 7fceac82f700 0 log [WRN] : slow request 30.241397 > >> > seconds old, received at 2014-12-06 19:20:26.354247: > >> > osd_sub_op(client.33776.0:743596 3.235 > >> > efa97e35/rb.0.819b.238e1f29.00000003f52f/head//3 [] v 380'3783 > >> > snapset=0=[]:[] snapc=0=[]) v11 currently started 2014-12-06 > >> > 19:20:56.595825 7fceac82f700 0 log [WRN] : slow request 30.240286 > >> > seconds old, received at 2014-12-06 19:20:26.355358: > >> > osd_sub_op(client.33776.0:743597 3.235 > >> > efa97e35/rb.0.819b.238e1f29.00000003f52f/head//3 [] v 380'3784 > >> > snapset=0=[]:[] snapc=0=[]) v11 currently started 2014-12-06 > >> > 19:20:56.595837 7fceac82f700 0 log [WRN] : slow request 30.177186 > >> > seconds old, received at 2014-12-06 19:20:26.418458: > >> > osd_sub_op(client.33776.0:743604 3.ba > >> > 5e76b8ba/rb.0.819b.238e1f29.00000003e628/head//3 [] v 380'6439 > >> > snapset=0=[]:[] snapc=0=[]) v11 currently started ---- > >> > >> That these are started and nothing else suggests that they're probably > >> waiting for one of the throttles to let them in, rather than > >> themselves being particularly slow. > >> > > > > If this was indeed caused by one of the (rather numerous) throttles, > > wouldn't it be a good idea to log that fact? > > A slow disk is one thing, Ceph permanently seizing up because something > > exceeded a threshold sounds noteworthy to me. > > If it permanently seized up then this is not what happened; So am I looking at an unknown bug then? > if the > reporting just didn't go away then I'm not sure it's appropriate to > log every time a throttle gets hit (some of them are supposed to or at > least expected to be, since they keep the journal from running away > from the backing store). > Fair enough, I'd suppose that anything causing a WRN log entry which in turn is caused by a throttle triggering should have that root cause included though. Christian > > > >> > > >> > The HDDs and SSDs are new, there's nothing in the pertinent logs or > >> > smart that indicates any problem with that HDD or its journal SSD, > >> > nor the system in general. > >> > This problem persisted (and the VM remained stuck) until OSD 8 was > >> > restarted the next day when I discovered this. > >> > > >> > I suppose this is another "this can't/shouldn't happen" case, but > >> > I'd be delighted about any suggestions as to what happened here, > >> > potential prevention measures and any insights on how to maybe coax > >> > more information out of Ceph if this happens again. > >> > >> Nah, there are a million reasons stuff can be slow. > > This wasn't just slow. Those requests never completed even after half a > > day had passed with the system and disks being basically idle. > > > >> It might just be a > >> transient overload of the disk compared to the others. > > > > Transient would be fine (though highly unlikely in this scenario), > > however it never recovered, see above. > > > > > >> If you see this > >> again while it's happening I'd check the perfcounters; if you're > >> keeping historical checks of them go look at the blocked-up times and > >> see if any of them are at or near their maximum values. > >> -Gre > >> > > Ah, I should have thought of that before re-starting that OSD. > > dump_historic_ops is at the default values, so that information is > > long gone. > > > > Regards, > > > > Christian > > -- > > Christian Balzer Network/Systems Engineer > > chibi@xxxxxxx Global OnLine Japan/Fusion Communications > > http://www.gol.com/ > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com