.rgw.bucket.index.pool is the pool with rgw's index objects, right? The actual on-disk directory for one of those pgs would contain only empty files -- the actual index data is stored in the osd's leveldb instance. I suspect your index objects are very large (because the buckets contain many objects) and are taking a long time to scrub. iirc, there is a way to make rgw split up those index objects into smaller ones. -Sam On Tue, Jun 21, 2016 at 11:58 AM, Trygve Vea <trygve.vea@xxxxxxxxxxxxxxxxxx> wrote: > Hi, > > I believe I've stumbled on a bug in Ceph, and I'm currently trying to figure out if this is a new bug, some behaviour caused by our cluster being in the midst of a hammer(0.94.6)->jewel(10.2.2) upgrade, or other factors. > > The state of the cluster at the time of the incident: > > - All monitor nodes are running 10.2.2. > - One OSD-server (4 osds) is up with 10.2.2 and with all pg's in active+clean. > - One OSD-server (4 osds) is up with 10.2.2 and undergoing backfills (however: nobackfill was set, as we try to keep backfills running during night time). > > We have 4 OSD-servers with 4 osds each with 0.94.6. > We have 3 OSD-servers with 2 osds each with 0.94.6. > > > We experienced something that heavily affected our RGW-users. Some requests interfacing with 0.94.6 nodes were slow. > > During a 10 minute window, our RGW-nodes ran out of available workers and ceased to respond. > > Some nodes logged some lines like these (only 0.94.6 nodes): > > 2016-06-21 09:51:08.053886 7f54610d8700 0 log_channel(cluster) log [WRN] : 2 slow requests, 1 included below; oldest blocked for > 74.368036 secs > 2016-06-21 09:51:08.053951 7f54610d8700 0 log_channel(cluster) log [WRN] : slow request 30.056333 seconds old, received at 2016-06-21 09:50:37.997327: osd_op(client.9433496.0:1089298249 somergwuser.buckets [call user.set_buckets_info] 12.da8df901 ondisk+write+known_if_redirected e9906) currently waiting for rw locks > > > Some nodes logged some lines like these (there were some, but not 100% overlap between osds that logged these and the beforementioned lines - only 0.94.6 nodes): > > 2016-06-21 09:51:48.677474 7f8cb6628700 0 log_channel(cluster) log [WRN] : 2 slow requests, 1 included below; oldest blocked for > 42.033650 secs > 2016-06-21 09:51:48.677565 7f8cb6628700 0 log_channel(cluster) log [WRN] : slow request 30.371173 seconds old, received at 2016-06-21 09:51:18.305770: osd_op(client.9525441.0:764274789 gc.1164 [call lock.lock] 7.7b4f1779 ondisk+write+known_if_redirected e9906) currently waiting for subops from 40,50 > > All of the osds that logged these lines, were waiting for subops from osd.50 > > > Investigating what's going on this osd during that window: > > 2016-06-21 09:48:22.064630 7f1cbb41d700 0 log_channel(cluster) log [INF] : 5.b5 deep-scrub starts > 2016-06-21 09:59:56.640012 7f1c90163700 0 -- 10.21.9.22:6800/2003521 >> 10.20.9.21:6805/7755 pipe(0x1e47a000 sd=298 :39448 s=2 pgs=23 cs=1 l=0 c=0x1033ba20).fault with nothing to send, going to standby > 2016-06-21 09:59:56.997763 7f1c700f8700 0 -- 10.21.9.22:6808/3521 >> 10.21.9.12:0/1028533 pipe(0x1f30f000 sd=87 :6808 s=0 pgs=0 cs=0 l=1 c=0x743c840).accept replacing existing (lossy) channel (new one lossy=1) > 2016-06-21 10:00:39.938700 7f1cd9828700 0 log_channel(cluster) log [WRN] : 33 slow requests, 33 included below; oldest blocked for > 727.862759 secs > 2016-06-21 10:00:39.938708 7f1cd9828700 0 log_channel(cluster) log [WRN] : slow request 670.918857 seconds old, received at 2016-06-21 09:49:29.019653: osd_op(client.9403437.0:1209613500 TZ1A91MYDE1LO63AQCM3 [getxattrs,stat] 9.442585e6 ack+read+known_if_redirected e9906) currently no flag points reached > 2016-06-21 10:00:39.938800 7f1cd9828700 0 log_channel(cluster) log [WRN] : slow request 689.815851 seconds old, received at 2016-06-21 09:49:10.122660: osd_op(client.9403437.0:1209611533 TZ1A91MYDE1LO63AQCM3 [getxattrs,stat] 9.442585e6 ack+read+known_if_redirected e9906) currently no flag points reached > 2016-06-21 10:00:39.938807 7f1cd9828700 0 log_channel(cluster) log [WRN] : slow request 670.895353 seconds old, received at 2016-06-21 09:49:29.043158: osd_op(client.9403437.0:1209613505 prod.arkham [call version.read,getxattrs,stat] 2.4da23de6 ack+read+known_if_redirected e9906) currently no flag points reached > 2016-06-21 10:00:39.938810 7f1cd9828700 0 log_channel(cluster) log [WRN] : slow request 688.612303 seconds old, received at 2016-06-21 09:49:11.326207: osd_op(client.20712623.0:137251515 TZ1A91MYDE1LO63AQCM3 [getxattrs,stat] 9.442585e6 ack+read+known_if_redirected e9906) currently no flag points reached > 2016-06-21 10:00:39.938813 7f1cd9828700 0 log_channel(cluster) log [WRN] : slow request 658.605163 seconds old, received at 2016-06-21 09:49:41.333348: osd_op(client.20712623.0:137254412 TZ1A91MYDE1LO63AQCM3 [getxattrs,stat] 9.442585e6 ack+read+known_if_redirected e9906) currently no flag points reached > 2016-06-21 10:00:39.960300 7f1cbb41d700 0 log_channel(cluster) log [INF] : 5.b5 deep-scrub ok > > Looking at the contents of 5.b5 (which is in our .rgw.buckets.index pool, if relevant) and it's almost empty (12KB of files on the disk) I find it unlikely for a scrubbing to take that long. Which is why I suspect we've ran into a bug. > > With the information I've provided here, can anyone shed some light on what this may be, and if it's a bug that is not fixed in HEAD; What information would be useful to include in a bug report? > > > Regards > -- > Trygve Vea > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com