We're seeing a lot of this as well. (as i mentioned to sage at SCALE..) Is there a rule of thumb at all for how big is safe to let a RGW bucket get? Also, is this theoretically resolved by the new bucket-sharding feature in the latest dev release? -Ben On Mon, Mar 2, 2015 at 11:08 AM, Erdem Agaoglu <erdem.agaoglu@xxxxxxxxx> wrote: > Hi Gregory, > > We are not using listomapkeys that way or in any way to be precise. I used > it here just to reproduce the behavior/issue. > > What i am really interested in is if scrubbing-deep actually mitigates the > problem and/or is there something that can be further improved. > > Or i guess we should go upgrade now and hope for the best :) > > On Mon, Mar 2, 2015 at 8:10 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> >> On Mon, Mar 2, 2015 at 7:56 AM, Erdem Agaoglu <erdem.agaoglu@xxxxxxxxx> >> wrote: >> > Hi all, especially devs, >> > >> > We have recently pinpointed one of the causes of slow requests in our >> > cluster. It seems deep-scrubs on pg's that contain the index file for a >> > large radosgw bucket lock the osds. Incresing op threads and/or disk >> > threads >> > helps a little bit, but we need to increase them beyond reason in order >> > to >> > completely get rid of the problem. A somewhat similar (and more severe) >> > version of the issue occurs when we call listomapkeys for the index >> > file, >> > and since the logs for deep-scrubbing was much harder read, this >> > inspection >> > was based on listomapkeys. >> > >> > In this example osd.121 is the primary of pg 10.c91 which contains file >> > .dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket contains >> > ~500k objects. Standard listomapkeys call take about 3 seconds. >> > >> > time rados -p .rgw.buckets listomapkeys .dir.5926.3 > /dev/null >> > real 0m2.983s >> > user 0m0.760s >> > sys 0m0.148s >> > >> > In order to lock the osd we request 2 of them simultaneously with >> > something >> > like: >> > >> > rados -p .rgw.buckets listomapkeys .dir.5926.3 > /dev/null & >> > sleep 1 >> > rados -p .rgw.buckets listomapkeys .dir.5926.3 > /dev/null & >> > >> > 'debug_osd=30' logs show the flow like: >> > >> > At t0 some thread enqueue_op's my omap-get-keys request. >> > Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading ~500k >> > keys. >> > Op-Thread B responds to several other requests during that 1 second >> > sleep. >> > They're generally extremely fast subops on other pgs. >> > At t1 (about a second later) my second omap-get-keys request gets >> > enqueue_op'ed. But it does not start probably because of the lock held >> > by >> > Thread A. >> > After that point other threads enqueue_op other requests on other pgs >> > too >> > but none of them starts processing, in which i consider the osd is >> > locked. >> > At t2 (about another second later) my first omap-get-keys request is >> > finished. >> > Op-Thread B locks pg 10.c91 and dequeue_op's my second request and >> > starts >> > reading ~500k keys again. >> > Op-Thread A continues to process the requests enqueued in t1-t2. >> > >> > It seems Op-Thread B is waiting on the lock held by Op-Thread A while it >> > can >> > process other requests for other pg's just fine. >> > >> > My guess is a somewhat larger scenario happens in deep-scrubbing, like >> > on >> > the pg containing index for the bucket of >20M objects. A disk/op thread >> > starts reading through the omap which will take say 60 seconds. During >> > the >> > first seconds, other requests for other pgs pass just fine. But in 60 >> > seconds there are bound to be other requests for the same pg, especially >> > since it holds the index file. Each of these requests lock another >> > disk/op >> > thread to the point where there are no free threads left to process any >> > requests for any pg. Causing slow-requests. >> > >> > So first of all thanks if you can make it here, and sorry for the >> > involved >> > mail, i'm exploring the problem as i go. >> > Now, is that deep-scrubbing situation i tried to theorize even possible? >> > If >> > not can you point us where to look further. >> > We are currently running 0.72.2 and know about newer ioprio settings in >> > Firefly and such. While we are planning to upgrade in a few weeks but i >> > don't think those options will help us in any way. Am i correct? >> > Are there any other improvements that we are not aware? >> >> This is all basically correct; it's one of the reasons you don't want >> to let individual buckets get too large. >> >> That said, I'm a little confused about why you're running listomapkeys >> that way. RGW throttles itself by getting only a certain number of >> entries at a time (1000?) and any system you're also building should >> do the same. That would reduce the frequency of any issues, and I >> *think* that scrubbing has some mitigating factors to help (although >> maybe not; it's been a while since I looked at any of that stuff). >> >> Although I just realized that my vague memory of deep scrubbing >> working better might be based on improvements that only got in for >> firefly...not sure. >> -Greg > > > > > -- > erdem agaoglu > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com