Hi, On 06 Sep 2014, at 13:53, Christian Balzer <chibi at gol.com> wrote: > > Hello, > > On Sat, 6 Sep 2014 13:37:25 +0200 Josef Johansson wrote: > >> Also putting this on the list. >> >> On 06 Sep 2014, at 13:36, Josef Johansson <josef at oderland.se> wrote: >> >>> Hi, >>> >>> Same issues again, but I think we found the drive that causes the >>> problems. >>> >>> But this is causing problems as it?s trying to do a recover to that >>> osd at the moment. >>> >>> So we?re left with the status message >>> >>> 2014-09-06 13:35:07.580007 mon.0 [INF] pgmap v12678802: 6860 pgs: 6841 >>> active+clean, 19 active+remapped+backfilling; 12299 GB data, 36882 GB >>> used, 142 TB / 178 TB avail; 1921KB/s rd, 192KB/s wr, 74op/s; >>> 41424/15131923 degraded (0.274%); recovering 0 o/s, 2035KB/s >>> >>> >>> It?s improving, but way too slowly. If I restart the recovery (ceph >>> osd set no recovery /unset) it doesn?t change the osd what I can see. >>> >>> Any ideas? >>> > I don't know the state of your cluster, i.e. what caused the recovery to > start (how many OSDs went down?). Performance degradation, databases are the worst impacted. It?s actually a OSD that we put in that?s causing it (removed it again though). So the cluster in itself is healthy. > If you have a replication of 3 and only one OSD was involved, what is > stopping you from taking that wonky drive/OSD out? > There?s data that goes missing if I do that, I guess I have to wait for the recovery process to complete before I can go any further, this is with rep 3. > If you don't know that or want to play it safe, how about setting the > weight of that OSD to 0? > While that will AFAICT still result in all primary PGs to be evacuated > off it, no more writes will happen to it and reads might be faster. > In either case, it shouldn't slow down the rest of your cluster anymore. > That?s actually one idea I haven?t thought off, I wan?t to play it safe right now and hope that it goes up again, I actually found one wonky way of getting the recovery process from not stalling to a grind, and that was restarting OSDs. One at the time. Regards, Josef > Regards, > > Christian >>> Cheers, >>> Josef >>> >>> On 05 Sep 2014, at 11:26, Luis Periquito <luis.periquito at ocado.com> >>> wrote: >>> >>>> Only time I saw such behaviour was when I was deleting a big chunk of >>>> data from the cluster: all the client activity was reduced, the op/s >>>> were almost non-existent and there was unjustified delays all over >>>> the cluster. But all the disks were somewhat busy in atop/iotstat. >>>> >>>> >>>> On 5 September 2014 09:51, David <david at visions.se> wrote: >>>> Hi, >>>> >>>> Indeed strange. >>>> >>>> That output was when we had issues, seems that most operations were >>>> blocked / slow requests. >>>> >>>> A ?baseline? output is more like today: >>>> >>>> 2014-09-05 10:44:29.123681 mon.0 [INF] pgmap v12582759: 6860 pgs: >>>> 6860 active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB >>>> avail; 9273KB/s rd, 24650KB/s wr, 2755op/s 2014-09-05 10:44:30.125637 >>>> mon.0 [INF] pgmap v12582760: 6860 pgs: 6860 active+clean; 12253 GB >>>> data, 36574 GB used, 142 TB / 178 TB avail; 9500KB/s rd, 20430KB/s >>>> wr, 2294op/s 2014-09-05 10:44:31.139427 mon.0 [INF] pgmap v12582761: >>>> 6860 pgs: 6860 active+clean; 12253 GB data, 36574 GB used, 142 TB / >>>> 178 TB avail; 9216KB/s rd, 20062KB/s wr, 2488op/s 2014-09-05 >>>> 10:44:32.144945 mon.0 [INF] pgmap v12582762: 6860 pgs: 6860 >>>> active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB avail; >>>> 12511KB/s rd, 15739KB/s wr, 2488op/s 2014-09-05 10:44:33.161210 mon.0 >>>> [INF] pgmap v12582763: 6860 pgs: 6860 active+clean; 12253 GB data, >>>> 36574 GB used, 142 TB / 178 TB avail; 18593KB/s rd, 14880KB/s wr, >>>> 2609op/s 2014-09-05 10:44:34.187294 mon.0 [INF] pgmap v12582764: 6860 >>>> pgs: 6860 active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB >>>> avail; 17720KB/s rd, 22964KB/s wr, 3257op/s 2014-09-05 >>>> 10:44:35.190785 mon.0 [INF] pgmap v12582765: 6860 pgs: 6860 >>>> active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB avail; >>>> 19230KB/s rd, 18901KB/s wr, 3199op/s 2014-09-05 10:44:36.213535 mon.0 >>>> [INF] pgmap v12582766: 6860 pgs: 6860 active+clean; 12253 GB data, >>>> 36574 GB used, 142 TB / 178 TB avail; 17630KB/s rd, 18855KB/s wr, >>>> 3131op/s 2014-09-05 10:44:37.220052 mon.0 [INF] pgmap v12582767: 6860 >>>> pgs: 6860 active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB >>>> avail; 12262KB/s rd, 18627KB/s wr, 2595op/s 2014-09-05 >>>> 10:44:38.233357 mon.0 [INF] pgmap v12582768: 6860 pgs: 6860 >>>> active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB avail; >>>> 17697KB/s rd, 17572KB/s wr, 2156op/s 2014-09-05 10:44:39.239409 mon.0 >>>> [INF] pgmap v12582769: 6860 pgs: 6860 active+clean; 12253 GB data, >>>> 36574 GB used, 142 TB / 178 TB avail; 20300KB/s rd, 19735KB/s wr, >>>> 2197op/s 2014-09-05 10:44:40.260423 mon.0 [INF] pgmap v12582770: 6860 >>>> pgs: 6860 active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB >>>> avail; 14656KB/s rd, 15460KB/s wr, 2199op/s 2014-09-05 >>>> 10:44:41.269736 mon.0 [INF] pgmap v12582771: 6860 pgs: 6860 >>>> active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB avail; >>>> 8969KB/s rd, 11918KB/s wr, 1951op/s 2014-09-05 10:44:42.276192 mon.0 >>>> [INF] pgmap v12582772: 6860 pgs: 6860 active+clean; 12253 GB data, >>>> 36574 GB used, 142 TB / 178 TB avail; 7272KB/s rd, 10644KB/s wr, >>>> 1832op/s 2014-09-05 10:44:43.291817 mon.0 [INF] pgmap v12582773: 6860 >>>> pgs: 6860 active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB >>>> avail; 9316KB/s rd, 16610KB/s wr, 2412op/s 2014-09-05 10:44:44.295469 >>>> mon.0 [INF] pgmap v12582774: 6860 pgs: 6860 active+clean; 12253 GB >>>> data, 36574 GB used, 142 TB / 178 TB avail; 9257KB/s rd, 19953KB/s >>>> wr, 2633op/s 2014-09-05 10:44:45.315774 mon.0 [INF] pgmap v12582775: >>>> 6860 pgs: 6860 active+clean; 12253 GB data, 36574 GB used, 142 TB / >>>> 178 TB avail; 9718KB/s rd, 14298KB/s wr, 2101op/s 2014-09-05 >>>> 10:44:46.326783 mon.0 [INF] pgmap v12582776: 6860 pgs: 6860 >>>> active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB avail; >>>> 20877KB/s rd, 12822KB/s wr, 2447op/s 2014-09-05 10:44:47.327537 mon.0 >>>> [INF] pgmap v12582777: 6860 pgs: 6860 active+clean; 12253 GB data, >>>> 36574 GB used, 142 TB / 178 TB avail; 18447KB/s rd, 12945KB/s wr, >>>> 2226op/s 2014-09-05 10:44:48.348725 mon.0 [INF] pgmap v12582778: 6860 >>>> pgs: 6860 active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB >>>> avail; 15514KB/s rd, 19366KB/s wr, 1892op/s 2014-09-05 >>>> 10:44:49.357537 mon.0 [INF] pgmap v12582779: 6860 pgs: 6860 >>>> active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB avail; >>>> 16640KB/s rd, 19694KB/s wr, 1887op/s 2014-09-05 10:44:50.365016 mon.0 >>>> [INF] pgmap v12582780: 6860 pgs: 6860 active+clean; 12253 GB data, >>>> 36574 GB used, 142 TB / 178 TB avail; 9139KB/s rd, 10734KB/s wr, >>>> 1566op/s 2014-09-05 10:44:51.385553 mon.0 [INF] pgmap v12582781: 6860 >>>> pgs: 6860 active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB >>>> avail; 11493KB/s rd, 13827KB/s wr, 1928op/s 2014-09-05 >>>> 10:44:52.389189 mon.0 [INF] pgmap v12582782: 6860 pgs: 6860 >>>> active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB avail; >>>> 12922KB/s rd, 12814KB/s wr, 1670op/s 2014-09-05 10:44:53.403735 mon.0 >>>> [INF] pgmap v12582783: 6860 pgs: 6860 active+clean; 12253 GB data, >>>> 36574 GB used, 142 TB / 178 TB avail; 14880KB/s rd, 17532KB/s wr, >>>> 2145op/s 2014-09-05 10:44:54.410581 mon.0 [INF] pgmap v12582784: 6860 >>>> pgs: 6860 active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB >>>> avail; 15419KB/s rd, 19696KB/s wr, 2259op/s 2014-09-05 >>>> 10:44:55.420308 mon.0 [INF] pgmap v12582785: 6860 pgs: 6860 >>>> active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB avail; >>>> 10964KB/s rd, 15739KB/s wr, 1701op/s 2014-09-05 10:44:56.445017 mon.0 >>>> [INF] pgmap v12582786: 6860 pgs: 6860 active+clean; 12253 GB data, >>>> 36574 GB used, 142 TB / 178 TB avail; 16195KB/s rd, 15554KB/s wr, >>>> 2444op/s 2014-09-05 10:44:57.452136 mon.0 [INF] pgmap v12582787: 6860 >>>> pgs: 6860 active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB >>>> avail; 15016KB/s rd, 16356KB/s wr, 2358op/s 2014-09-05 >>>> 10:44:58.465958 mon.0 [INF] pgmap v12582788: 6860 pgs: 6860 >>>> active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB avail; >>>> 11668KB/s rd, 18443KB/s wr, 2029op/s 2014-09-05 10:44:59.483462 mon.0 >>>> [INF] pgmap v12582789: 6860 pgs: 6860 active+clean; 12253 GB data, >>>> 36574 GB used, 142 TB / 178 TB avail; 12253KB/s rd, 10846KB/s wr, >>>> 1529op/s 2014-09-05 10:45:00.492322 mon.0 [INF] pgmap v12582790: 6860 >>>> pgs: 6860 active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB >>>> avail; 12247KB/s rd, 7084KB/s wr, 1464op/s 2014-09-05 10:45:01.516581 >>>> mon.0 [INF] pgmap v12582791: 6860 pgs: 6860 active+clean; 12253 GB >>>> data, 36574 GB used, 142 TB / 178 TB avail; 16460KB/s rd, 12089KB/s >>>> wr, 2537op/s 2014-09-05 10:45:02.527110 mon.0 [INF] pgmap v12582792: >>>> 6860 pgs: 6860 active+clean; 12253 GB data, 36574 GB used, 142 TB / >>>> 178 TB avail; 13382KB/s rd, 15080KB/s wr, 2563op/s 2014-09-05 >>>> 10:45:03.538090 mon.0 [INF] pgmap v12582793: 6860 pgs: 6860 >>>> active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB avail; >>>> 10902KB/s rd, 18745KB/s wr, 2863op/s 2014-09-05 10:45:04.558261 mon.0 >>>> [INF] pgmap v12582794: 6860 pgs: 6860 active+clean; 12253 GB data, >>>> 36574 GB used, 142 TB / 178 TB avail; 10850KB/s rd, 15995KB/s wr, >>>> 2695op/s 2014-09-05 10:45:05.565750 mon.0 [INF] pgmap v12582795: 6860 >>>> pgs: 6860 active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB >>>> avail; 9636KB/s rd, 13262KB/s wr, 2372op/s 2014-09-05 10:45:06.593984 >>>> mon.0 [INF] pgmap v12582796: 6860 pgs: 6860 active+clean; 12253 GB >>>> data, 36574 GB used, 142 TB / 178 TB avail; 18808KB/s rd, 19329KB/s >>>> wr, 3819op/s 2014-09-05 10:45:07.595866 mon.0 [INF] pgmap v12582797: >>>> 6860 pgs: 6860 active+clean; 12253 GB data, 36574 GB used, 142 TB / >>>> 178 TB avail; 21265KB/s rd, 20743KB/s wr, 3861op/s 2014-09-05 >>>> 10:45:08.624949 mon.0 [INF] pgmap v12582798: 6860 pgs: 6860 >>>> active+clean; 12254 GB data, 36574 GB used, 142 TB / 178 TB avail; >>>> 20114KB/s rd, 18543KB/s wr, 3248op/s 2014-09-05 10:45:09.627901 mon.0 >>>> [INF] pgmap v12582799: 6860 pgs: 6860 active+clean; 12254 GB data, >>>> 36574 GB used, 142 TB / 178 TB avail; 14717KB/s rd, 15141KB/s wr, >>>> 2302op/s 2014-09-05 10:45:10.643234 mon.0 [INF] pgmap v12582800: 6860 >>>> pgs: 6860 active+clean; 12254 GB data, 36574 GB used, 142 TB / 178 TB >>>> avail; 8328KB/s rd, 13950KB/s wr, 1919op/s 2014-09-05 10:45:11.651602 >>>> mon.0 [INF] pgmap v12582801: 6860 pgs: 6860 active+clean; 12254 GB >>>> data, 36574 GB used, 142 TB / 178 TB avail; 16978KB/s rd, 15921KB/s >>>> wr, 3377op/s 2014-09-05 10:45:12.674819 mon.0 [INF] pgmap v12582802: >>>> 6860 pgs: 6860 active+clean; 12254 GB data, 36574 GB used, 142 TB / >>>> 178 TB avail; 16471KB/s rd, 14034KB/s wr, 3379op/s 2014-09-05 >>>> 10:45:13.688080 mon.0 [INF] pgmap v12582803: 6860 pgs: 6860 >>>> active+clean; 12254 GB data, 36574 GB used, 142 TB / 178 TB avail; >>>> 16149KB/s rd, 12657KB/s wr, 2734op/s >>>> >>>> Aye, we actually saw latency on the disks go up a bit when we had >>>> 128GB of RAM on the OSDs and decided to beef them up to 256GB which >>>> helped. They?re running different workloads (shared hosting) but >>>> we?ve never encountered the issue we had yesterday even during our >>>> testing/benchmarking. >>>> >>>> Kind Regards, >>>> David >>>> >>>> 5 sep 2014 kl. 09:05 skrev Christian Balzer <chibi at gol.com>: >>>> >>>>> >>>>> Hello, >>>>> >>>>> On Fri, 5 Sep 2014 08:26:47 +0200 David wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Sorry for the lack of information yesterday, this was "solved" >>>>>> after some 30 minutes, after having reloaded/restarted all osd >>>>>> daemons. Unfortunately we couldn?t pin point it to a single OSD or >>>>>> drive, all drives seemed ok, some had a bit higher latency and we >>>>>> tried to out / in them to see if it had any effect which it did >>>>>> not. >>>>>> >>>>> This is odd. >>>>> Having it "fixed" by restarting all OSDs would suggest either a >>>>> software problem (bug) with Ceph or some resource other than the >>>>> storage system being starved. But memory seems unlikely, even with >>>>> bloated, leaking OSD daemon. And CPU seems even less likely. >>>>> >>>>>> The cluster consists of 3 mon servrers, 5 OSD servers with 10 >>>>>> enterprise HDDs backed with 2 S3700 SSDs for journals each. OSD >>>>>> servers have 256GB of RAM, 2x E5-2630 v2 @ 2.60GHz CPUs. >>>>>> >>>>>> The log that I posted yesterday was just a small taste of the full >>>>>> one ;) They were all pointing to different osd?s that they were >>>>>> waiting for. We?re also monitoring all of the VMs running on KVM, >>>>>> and we didn?t see any exceptional throughput or iops usage before >>>>>> or during this event. We were checking iostat etc and nothing was >>>>>> out of the ordinary.. >>>>>> >>>>>> Going to double check SMART and also see if we can off load some >>>>>> of the cluster in any way. If you have any other advice that?d be >>>>>> appreciated :) >>>>>> >>>>> If the mon output below is typical for your cluster (634KB/s rd, >>>>> 487KB/s wr, 90op/s) it should be able to handle that w/o breaking a >>>>> sweat. I guestimate it at about 13K write IOPS and your reads >>>>> should (after a while) come mostly from the nice big pagecache you >>>>> have on those 256GB RAM machines. >>>>> >>>>> Christian >>>>> >>>>>> Thanks for your help! >>>>>> >>>>>> Kind Regards, >>>>>> David >>>>>> >>>>>> 5 sep 2014 kl. 07:30 skrev Martin B Nielsen <martin at unity3d.com>: >>>>>> >>>>>>> Just echoing what Christian said. >>>>>>> >>>>>>> Also, iirc the "currently waiting for subobs on [" could also >>>>>>> mean a problem on those as it waits for ack from them (I might >>>>>>> remember wrong). >>>>>>> >>>>>>> If that is the case you might want to check in on osd 13 & 37 as >>>>>>> well. >>>>>>> >>>>>>> With the cluster load and size you should not have this problem; >>>>>>> I'm pretty sure you're dealing with a rogue/faulty osd/node >>>>>>> somewhere. >>>>>>> >>>>>>> Cheers, >>>>>>> Martin >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 5, 2014 at 2:28 AM, Christian Balzer <chibi at gol.com> >>>>>>> wrote: On Thu, 4 Sep 2014 12:02:13 +0200 David wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> We?re running a ceph cluster with version: >>>>>>>> >>>>>>>> 0.67.7-1~bpo70+1 >>>>>>>> >>>>>>>> All of a sudden we?re having issues with the cluster (running RBD >>>>>>>> images for kvm) with slow requests on all of the OSD servers. Any >>>>>>>> idea why and how to fix it? >>>>>>>> >>>>>>> You give us a Ceph version at least, but for anybody to make >>>>>>> guesses we need much more information than a log spew. >>>>>>> >>>>>>> How many nodes/OSDs, OS, hardware, OSD details (FS, journals on >>>>>>> SSDs), etc. >>>>>>> >>>>>>> Run atop (in a sufficiently large terminal) on all your nodes, >>>>>>> see if you can spot a bottleneck, like a disk being at 100% all >>>>>>> the time with a much higher avio than the others. >>>>>>> Looking at your logs, I'd pay particular attention to the disk >>>>>>> holding osd.22. >>>>>>> A single slow disk can bring a whole large cluster to a crawl. >>>>>>> If you're using a hardware controller with a battery backed up >>>>>>> cache, check if that is fine, loss of the battery would switch >>>>>>> from writeback to writethrough and massively slow down IOPS. >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Christian >>>>>>>> >>>>>>>> 2014-09-04 11:56:35.868521 mon.0 [INF] pgmap v12504451: 6860 pgs: >>>>>>>> 6860 active+clean; 12163 GB data, 36308 GB used, 142 TB / 178 TB >>>>>>>> avail; 634KB/s rd, 487KB/s wr, 90op/s 2014-09-04 11:56:29.510270 >>>>>>>> osd.22 [WRN] 15 slow requests, 1 included below; oldest blocked >>>>>>>> for >>>>>>>>> 44.745754 secs 2014-09-04 11:56:29.510276 osd.22 [WRN] slow >>>>>>>>> request 30.999821 seconds >>>>>>>> old, received at 2014-09-04 11:55:58.510424: >>>>>>>> osd_op(client.10731617.0:81868956 >>>>>>>> rbd_data.967e022eb141f2.0000000000000e72 [write 0~4194304] >>>>>>>> 3.c585cebe e13901) v4 currently waiting for subops from [37,13] >>>>>>>> 2014-09-04 11:56:30.510528 osd.22 [WRN] 21 slow requests, 6 >>>>>>>> included below; oldest blocked for > 45.745989 secs 2014-09-04 >>>>>>>> 11:56:30.510534 osd.22 [WRN] slow request 30.122555 seconds old, >>>>>>>> received at 2014-09-04 11:56:00.387925: >>>>>>>> osd_op(client.13425082.0:11962345 >>>>>>>> rbd_data.54f24c3d1b58ba.0000000000003753 [stat,write >>>>>>>> 1114112~8192] 3.c9e49140 e13901) v4 currently waiting for subops >>>>>>>> from [13,42] 2014-09-04 11:56:30.510537 osd.22 [WRN] slow >>>>>>>> request 30.122362 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.388118: osd_op(client.13425082.0:11962352 >>>>>>>> rbd_data.54f24c3d1b58ba.0000000000003753 [stat,write >>>>>>>> 1134592~4096] 3.c9e49140 e13901) v4 currently waiting for subops >>>>>>>> from [13,42] 2014-09-04 11:56:30.510541 osd.22 [WRN] slow >>>>>>>> request 30.122298 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.388182: osd_op(client.13425082.0:11962353 >>>>>>>> rbd_data.54f24c3d1b58ba.0000000000003753 [stat,write >>>>>>>> 4046848~8192] 3.c9e49140 e13901) v4 currently waiting for subops >>>>>>>> from [13,42] 2014-09-04 11:56:30.510544 osd.22 [WRN] slow >>>>>>>> request 30.121577 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.388903: osd_op(client.13425082.0:11962374 >>>>>>>> rbd_data.54f24c3d1b58ba.00000000000047f2 [stat,write >>>>>>>> 2527232~4096] 3.cd9a9015 e13901) v4 currently waiting for subops >>>>>>>> from [45,1] 2014-09-04 11:56:30.510548 osd.22 [WRN] slow request >>>>>>>> 30.121518 seconds old, received at 2014-09-04 11:56:00.388962: >>>>>>>> osd_op(client.13425082.0:11962375 >>>>>>>> rbd_data.54f24c3d1b58ba.00000000000047f2 [stat,write >>>>>>>> 3133440~4096] 3.cd9a9015 e13901) v4 currently waiting for subops >>>>>>>> from [45,1] 2014-09-04 11:56:31.510706 osd.22 [WRN] 26 slow >>>>>>>> requests, 6 included below; oldest blocked for > 46.746163 secs >>>>>>>> 2014-09-04 11:56:31.510711 osd.22 [WRN] slow request 31.035418 >>>>>>>> seconds old, received at 2014-09-04 11:56:00.475236: >>>>>>>> osd_op(client.9266625.0:135900595 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2097152~4096] 3.a2894ebe e13901) v4 currently waiting for subops >>>>>>>> from [37,13] 2014-09-04 11:56:31.510715 osd.22 [WRN] slow >>>>>>>> request 31.035335 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.475319: osd_op(client.9266625.0:135900596 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2162688~4096] 3.a2894ebe e13901) v4 currently waiting for subops >>>>>>>> from [37,13] 2014-09-04 11:56:31.510718 osd.22 [WRN] slow >>>>>>>> request 31.035270 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.475384: osd_op(client.9266625.0:135900597 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2400256~16384] 3.a2894ebe e13901) v4 currently waiting for >>>>>>>> subops from [37,13] 2014-09-04 11:56:31.510721 osd.22 [WRN] slow >>>>>>>> request 31.035093 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.475561: osd_op(client.9266625.0:135900598 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2420736~4096] 3.a2894ebe e13901) v4 currently waiting for subops >>>>>>>> from [37,13] 2014-09-04 11:56:31.510724 osd.22 [WRN] slow >>>>>>>> request 31.034990 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.475664: osd_op(client.9266625.0:135900599 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2428928~12288] 3.a2894ebe e13901) v4 currently waiting for >>>>>>>> subops from [37,13] 2014-09-04 11:56:32.510864 osd.22 [WRN] 28 >>>>>>>> slow requests, 6 included below; oldest blocked for > 47.746324 >>>>>>>> secs 2014-09-04 11:56:32.510871 osd.22 [WRN] slow request >>>>>>>> 32.035026 seconds old, received at 2014-09-04 11:56:00.475789: >>>>>>>> osd_op(client.9266625.0:135900600 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2453504~4096] 3.a2894ebe e13901) v4 currently waiting for subops >>>>>>>> from [37,13] 2014-09-04 11:56:32.510876 osd.22 [WRN] slow >>>>>>>> request 32.034982 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.475833: osd_op(client.9266625.0:135900601 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2502656~16384] 3.a2894ebe e13901) v4 currently waiting for >>>>>>>> subops from [37,13] 2014-09-04 11:56:32.510881 osd.22 [WRN] slow >>>>>>>> request 32.034917 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.475898: osd_op(client.9266625.0:135900602 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2600960~4096] 3.a2894ebe e13901) v4 currently waiting for subops >>>>>>>> from [37,13] 2014-09-04 11:56:32.510885 osd.22 [WRN] slow >>>>>>>> request 32.034807 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.476008: osd_op(client.9266625.0:135900603 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2617344~4096] 3.a2894ebe e13901) v4 currently waiting for subops >>>>>>>> from [37,13] 2014-09-04 11:56:32.510890 osd.22 [WRN] slow >>>>>>>> request 32.034756 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.476059: osd_op(client.9266625.0:135900604 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2637824~69632] 3.a2894ebe e13901) v4 currently waiting for >>>>>>>> subops from [37,13] 2014-09-04 11:56:32.723329 osd.17 [WRN] 4 >>>>>>>> slow requests, 4 included below; oldest blocked for > 30.605794 >>>>>>>> secs 2014-09-04 11:56:32.723335 osd.17 [WRN] slow request >>>>>>>> 30.605794 seconds old, received at 2014-09-04 11:56:02.117495: >>>>>>>> osd_op(client.12646411.0:31118135 >>>>>>>> rbd_data.81aaf52eb141f2.0000000000000e4b [stat,write 0~4194304] >>>>>>>> 3.716b373b e13901) v4 currently waiting for subops from [31,1] >>>>>>>> 2014-09-04 11:56:32.723338 osd.17 [WRN] slow request 30.362042 >>>>>>>> seconds old, received at 2014-09-04 11:56:02.361247: >>>>>>>> osd_op(client.12485793.0:121786489 >>>>>>>> rbd_data.4ae87f2eb141f2.000000000000e200 [stat,write >>>>>>>> 2834432~4096] 3.6b43de72 e13901) v4 currently waiting for subops >>>>>>>> from [5,31] 2014-09-04 11:56:32.723341 osd.17 [WRN] slow request >>>>>>>> 30.179681 seconds old, received at 2014-09-04 11:56:02.543608: >>>>>>>> osd_op(client.13425079.0:7014393 >>>>>>>> rbd_data.54f2522eb141f2.0000000000014d01 [stat,write >>>>>>>> 2502656~4096] 3.d8ceea89 e13901) v4 currently waiting for subops >>>>>>>> from [31,18] 2014-09-04 11:56:32.723344 osd.17 [WRN] slow >>>>>>>> request 30.000568 seconds old, received at 2014-09-04 >>>>>>>> 11:56:02.722721: osd_op(client.9281279.0:144230325 >>>>>>>> rbd_data.2d99fc2eb141f2.000000000000e80a [stat,write >>>>>>>> 1589248~4096] 3.16dc0e89 e13901) v4 currently waiting for subops >>>>>>>> from [31,18] 2014-09-04 11:56:33.511043 osd.22 [WRN] 33 slow >>>>>>>> requests, 6 included below; oldest blocked for > 48.746486 secs >>>>>>>> 2014-09-04 11:56:33.511049 osd.22 [WRN] slow request 33.034756 >>>>>>>> seconds old, received at 2014-09-04 11:56:00.476221: >>>>>>>> osd_op(client.9266625.0:135900605 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2715648~8192] 3.a2894ebe e13901) v4 currently waiting for subops >>>>>>>> from [37,13] 2014-09-04 11:56:33.511053 osd.22 [WRN] slow >>>>>>>> request 33.034620 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.476357: osd_op(client.9266625.0:135900606 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2756608~4096] 3.a2894ebe e13901) v4 currently waiting for subops >>>>>>>> from [37,13] 2014-09-04 11:56:33.511056 osd.22 [WRN] slow >>>>>>>> request 33.034446 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.476531: osd_op(client.9266625.0:135900607 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2809856~4096] 3.a2894ebe e13901) v4 currently waiting for subops >>>>>>>> from [37,13] 2014-09-04 11:56:33.511060 osd.22 [WRN] slow >>>>>>>> request 33.034387 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.476590: osd_op(client.9266625.0:135900608 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2883584~4096] 3.a2894ebe e13901) v4 currently waiting for subops >>>>>>>> from [37,13] 2014-09-04 11:56:33.511063 osd.22 [WRN] slow >>>>>>>> request 33.034341 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.476636: osd_op(client.9266625.0:135900609 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2891776~8192] 3.a2894ebe e13901) v4 currently waiting for subops >>>>>>>> from [37,13] 2014-09-04 11:56:34.511206 osd.22 [WRN] 38 slow >>>>>>>> requests, 6 included below; oldest blocked for > 49.746664 secs >>>>>>>> 2014-09-04 11:56:34.511215 osd.22 [WRN] slow request 34.034426 >>>>>>>> seconds old, received at 2014-09-04 11:56:00.476729: >>>>>>>> osd_op(client.9266625.0:135900610 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2908160~4096] 3.a2894ebe e13901) v4 currently waiting for subops >>>>>>>> from [37,13] 2014-09-04 11:56:34.511219 osd.22 [WRN] slow >>>>>>>> request 34.034298 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.476857: osd_op(client.9266625.0:135900611 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2932736~28672] 3.a2894ebe e13901) v4 currently waiting for >>>>>>>> subops from [37,13] 2014-09-04 11:56:34.511222 osd.22 [WRN] slow >>>>>>>> request 34.034234 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.476921: osd_op(client.9266625.0:135900612 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2969600~8192] 3.a2894ebe e13901) v4 currently waiting for subops >>>>>>>> from [37,13] 2014-09-04 11:56:34.511225 osd.22 [WRN] slow >>>>>>>> request 34.034147 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.477008: osd_op(client.9266625.0:135900613 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2981888~4096] 3.a2894ebe e13901) v4 currently waiting for subops >>>>>>>> from [37,13] 2014-09-04 11:56:34.511228 osd.22 [WRN] slow >>>>>>>> request 34.034055 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.477100: osd_op(client.9266625.0:135900614 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 2998272~12288] 3.a2894ebe e13901) v4 currently waiting for >>>>>>>> subops from [37,13] 2014-09-04 11:56:35.153813 osd.37 [WRN] 1 >>>>>>>> slow requests, 1 included below; oldest blocked for > 30.307864 >>>>>>>> secs 2014-09-04 11:56:35.153820 osd.37 [WRN] slow request >>>>>>>> 30.307864 seconds old, received at 2014-09-04 11:56:04.845917: >>>>>>>> osd_op(client.10785413.0:129409910 >>>>>>>> rbd_data.96803f2eb141f2.00000000000042b0 [stat,write >>>>>>>> 1712128~4096] 3.6c5ac950 e13901) v4 currently waiting for subops >>>>>>>> from [22,41] 2014-09-04 11:56:35.511450 osd.22 [WRN] 42 slow >>>>>>>> requests, 6 included below; oldest blocked for > 50.746638 secs >>>>>>>> 2014-09-04 11:56:35.511455 osd.22 [WRN] slow request 35.034219 >>>>>>>> seconds old, received at 2014-09-04 11:56:00.477178: >>>>>>>> osd_op(client.9266625.0:135900615 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 3022848~8192] 3.a2894ebe e13901) v4 currently waiting for subops >>>>>>>> from [37,13] 2014-09-04 11:56:35.511459 osd.22 [WRN] slow >>>>>>>> request 35.034143 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.477254: osd_op(client.9266625.0:135900616 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 3035136~12288] 3.a2894ebe e13901) v4 currently waiting for >>>>>>>> subops from [37,13] 2014-09-04 11:56:35.511462 osd.22 [WRN] slow >>>>>>>> request 35.034082 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.477315: osd_op(client.9266625.0:135900617 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 3055616~4096] 3.a2894ebe e13901) v4 currently waiting for subops >>>>>>>> from [37,13] 2014-09-04 11:56:35.511465 osd.22 [WRN] slow >>>>>>>> request 35.033965 seconds old, received at 2014-09-04 >>>>>>>> 11:56:00.477432: osd_op(client.9266625.0:135900618 >>>>>>>> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write >>>>>>>> 3063808~4096] 3.a2894ebe e13901) v4 currently waiting for subops >>>>>>>> from [37,13] 2014-09-04 11:56:35.511469 osd.22 [WRN] slow >>>>>>>> request 34.162275 seconds old, received at 2014-09-04 >>>>>>>> 11:56:01.349122: osd_op(client.12485646.0:110385143 >>>>>>>> rbd_data.4ccab12eb141f2.0000000000015b66 [stat,write >>>>>>>> 909312~4096] 3.9c416abe e13901) v4 currently waiting for subops >>>>>>>> from [37,13] 2014-09-04 11:56:36.511603 osd.22 [WRN] 34 slow >>>>>>>> requests, 6 included below; oldest blocked for > 51.746797 secs >>>>>>>> 2014-09-04 11:56:36.511609 osd.22 [WRN] slow request 34.401476 >>>>>>>> seconds old, received at 2014-09-04 11:56:02.110080: >>>>>>>> osd_op(client.12485643.0:117955489 >>>>>>>> rbd_data.4aad332eb141f2.0000000000002c00 [stat,write >>>>>>>> 2531328~4096] 3.b985ec92 e13901) v4 currently waiting for subops >>>>>>>> from [33,1] 2014-09-04 11:56:36.511612 osd.22 [WRN] slow request >>>>>>>> 34.149511 seconds old, received at 2014-09-04 11:56:02.362045: >>>>>>>> osd_op(client.12485793.0:121786505 >>>>>>>> rbd_data.4ae87f2eb141f2.0000000000018a0a [stat,write >>>>>>>> 3342336~4096] 3.26907492 e13901) v4 currently waiting for subops >>>>>>>> from [33,1] 2014-09-04 11:56:36.511615 osd.22 [WRN] slow request >>>>>>>> 34.329528 seconds old, received at 2014-09-04 11:56:02.182028: >>>>>>>> osd_op(client.12646411.0:31118266 >>>>>>>> rbd_data.81aaf52eb141f2.0000000000000e4f [stat,write 0~4194304] >>>>>>>> 3.168c978d e13901) v4 currently waiting for subops from [49,29] >>>>>>>> 2014-09-04 11:56:36.511618 osd.22 [WRN] slow request 31.871976 >>>>>>>> seconds old, received at 2014-09-04 11:56:04.639580: >>>>>>>> osd_op(client.12485793.0:121786506 >>>>>>>> rbd_data.4ae87f2eb141f2.0000000000018a0a [stat,write 0~4096] >>>>>>>> 3.26907492 e13901) v4 currently waiting for subops from [33,1] >>>>>>>> 2014-09-04 11:56:36.511621 osd.22 [WRN] slow request 31.008867 >>>>>>>> seconds old, received at 2014-09-04 11:56:05.502689: >>>>>>>> osd_op(client.9266625.0:135901516 >>>>>>>> rbd_data.42d6792eb141f2.0000000000017436 [stat,write >>>>>>>> 2682880~4096] 3.32067b97 e13901) v4 currently waiting for subops >>>>>>>> from [12,29] 2014-09-04 11:56:36.723882 osd.17 [WRN] 5 slow >>>>>>>> requests, 1 included below; oldest blocked for > 34.606349 secs >>>>>>>> 2014-09-04 11:56:36.723887 osd.17 [WRN] slow request 30.603057 >>>>>>>> seconds old, received at 2014-09-04 11:56:06.120787: >>>>>>>> osd_op(client.12483520.0:11473870 >>>>>>>> rbd_data.4b8e9b3d1b58ba.0000000000004600 [stat,write >>>>>>>> 2293760~4096] 3.49398689 e13901) v4 currently waiting for subops >>>>>>>> from [31,18] 2014-09-04 11:56:37.511787 osd.22 [WRN] 24 slow >>>>>>>> requests, 6 included below; oldest blocked for > 52.746984 secs >>>>>>>> 2014-09-04 11:56:37.511793 osd.22 [WRN] slow request 32.008986 >>>>>>>> seconds old, received at 2014-09-04 11:56:05.502757: >>>>>>>> osd_op(client.9266625.0:135901517 >>>>>>>> rbd_data.42d6792eb141f2.0000000000017436 [stat,write >>>>>>>> 2695168~4096] 3.32067b97 e13901) v4 currently waiting for subops >>>>>>>> from [12,29] 2014-09-04 11:56:37.511798 osd.22 [WRN] slow >>>>>>>> request 31.749483 seconds old, received at 2014-09-04 >>>>>>>> 11:56:05.762260: osd_op(client.9281282.0:55728036 >>>>>>>> rbd_data.2d99f62eb141f2.0000000000003753 [stat,write >>>>>>>> 1630208~8192] 3.6625df97 e13901) v4 currently waiting for subops >>>>>>>> from [12,29] 2014-09-04 11:56:37.511801 osd.22 [WRN] slow >>>>>>>> request 31.745558 seconds old, received at 2014-09-04 >>>>>>>> 11:56:05.766185: osd_op(client.9281282.0:55728038 >>>>>>>> rbd_data.2d99f62eb141f2.0000000000003753 [stat,write >>>>>>>> 1941504~4096] 3.6625df97 e13901) v4 currently waiting for subops >>>>>>>> from [12,29] 2014-09-04 11:56:37.511803 osd.22 [WRN] slow >>>>>>>> request 31.390945 seconds old, received at 2014-09-04 >>>>>>>> 11:56:06.120798: osd_op(client.12483520.0:11473871 >>>>>>>> rbd_data.4b8e9b3d1b58ba.0000000000005c00 [stat,write >>>>>>>> 2424832~8192] 3.47dbbb97 e13901) v4 currently waiting for subops >>>>>>>> from [12,29] 2014-09-04 11:56:37.511806 osd.22 [WRN] slow >>>>>>>> request 31.390357 seconds old, received at 2014-09-04 >>>>>>>> 11:56:06.121386: osd_op(client.12483520.0:11473884 >>>>>>>> rbd_data.4b8e9b3d1b58ba.0000000000005c00 [stat,write >>>>>>>> 2457600~16384] 3.47dbbb97 e13901) v4 currently waiting for >>>>>>>> subops from [12,29] >>>>>>>> >>>>>>>> Kind Regards, >>>>>>>> >>>>>>>> David >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> ceph-users mailing list >>>>>>>> ceph-users at lists.ceph.com >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Christian Balzer Network/Systems Engineer >>>>>>> chibi at gol.com Global OnLine Japan/Fusion Communications >>>>>>> http://www.gol.com/ >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> ceph-users at lists.ceph.com >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> ceph-users at lists.ceph.com >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> >>>>> >>>>> >>>>> -- >>>>> Christian Balzer Network/Systems Engineer >>>>> chibi at gol.com Global OnLine Japan/Fusion Communications >>>>> http://www.gol.com/ >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users at lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users at lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>> >>>> >>>> -- >>>> Luis Periquito >>>> Unix Engineer >>>> >>>> Ocado.com >>>> >>>> Head Office, Titan Court, 3 Bishop Square, Hatfield Business Park, >>>> Hatfield, Herts AL10 9NE >>>> >>>> Notice: This email is confidential and may contain copyright >>>> material of members of the Ocado Group. Opinions and views expressed >>>> in this message may not necessarily reflect the opinions and views of >>>> the members of the Ocado Group. >>>> >>>> If you are not the intended recipient, please notify us immediately >>>> and delete all copies of this message. Please note that it is your >>>> responsibility to scan this message for viruses. >>>> >>>> References to the ?Ocado Group? are to Ocado Group plc (registered in >>>> England and Wales with number 7098618) and its subsidiary >>>> undertakings (as that expression is defined in the Companies Act >>>> 2006) from time to time. The registered office of Ocado Group plc is >>>> Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, >>>> Herts. AL10 9NE. >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users at lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> > > > -- > Christian Balzer Network/Systems Engineer > chibi at gol.com Global OnLine Japan/Fusion Communications > http://www.gol.com/