Hi, I've started the bucket --check --fix on friday evening and it's still running. 'ceph -s' shows the cluster health as OK, I don't know if there is anything else I could check? Is there a way of finding out if its actually doing something? We only have this issue on the one bucket with versioning enabled, I can't get rid of the feeling it has something todo with that. The "underscore bug" is also still present on that bucket (http://tracker.ceph.com/issues/12819). Not sure if thats related in any way. Are there any alternatives, as for example copy all the objects into a new bucket without versioning? Simple way would be to list the objects and copy them to a new bucket, but bucket listing is not working so... -Sam On 31-08-15 10:47, Gregory Farnum wrote: > This generally shouldn't be a problem at your bucket sizes. Have you > checked that the cluster is actually in a healthy state? The sleeping > locks are normal but should be getting woken up; if they aren't it > means the object access isn't working for some reason. A down PG or > something would be the simplest explanation. > -Greg > > On Fri, Aug 28, 2015 at 6:52 PM, Sam Wouters <sam@xxxxxxxxx> wrote: >> Ok, maybe I'm to impatient. It would be great if there were some verbose >> or progress logging of the radosgw-admin tool. >> I will start a check and let it run over the weekend. >> >> tnx, >> Sam >> >> On 28-08-15 18:16, Sam Wouters wrote: >>> Hi, >>> >>> this bucket only has 13389 objects, so the index size shouldn't be a >>> problem. Also, on the same cluster we have an other bucket with 1200543 >>> objects (but no versioning configured), which has no issues. >>> >>> when we run a radosgw-admin bucket --check (--fix), nothing seems to be >>> happening. Putting an strace on the process shows a lot of lines like these: >>> [pid 99372] futex(0x2d730d4, FUTEX_WAIT_PRIVATE, 156619, NULL >>> <unfinished ...> >>> [pid 99385] futex(0x2da9410, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> >>> [pid 99371] futex(0x2da9410, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> >>> [pid 99385] <... futex resumed> ) = -1 EAGAIN (Resource >>> temporarily unavailable) >>> [pid 99371] <... futex resumed> ) = 0 >>> >>> but no errors in the ceph logs or health warnings. >>> >>> r, >>> Sam >>> >>> On 28-08-15 17:49, Ben Hines wrote: >>>> How many objects in the bucket? >>>> >>>> RGW has problems with index size once number of objects gets into the >>>> 900000+ level. The buckets need to be recreated with 'sharded bucket >>>> indexes' on: >>>> >>>> rgw override bucket index max shards = 23 >>>> >>>> You could also try repairing the index with: >>>> >>>> radosgw-admin bucket check --fix --bucket=<bucketname> >>>> >>>> -Ben >>>> >>>> On Fri, Aug 28, 2015 at 8:38 AM, Sam Wouters <sam@xxxxxxxxx> wrote: >>>>> Hi, >>>>> >>>>> we have a rgw bucket (with versioning) where PUT and GET operations for >>>>> specific objects succeed, but retrieving an object list fails. >>>>> Using python-boto, after a timeout just gives us an 500 internal error; >>>>> radosgw-admin just hangs. >>>>> Also a radosgw-admin bucket check just seems to hang... >>>>> >>>>> ceph version is 0.94.3 but this also was happening with 0.94.2, we >>>>> quietly hoped upgrading would fix but it didn't... >>>>> >>>>> r, >>>>> Sam >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com