On 6/4/19 8:00 PM, J. Eric Ivancich wrote: > On 6/4/19 7:37 AM, Wido den Hollander wrote: >> I've set up a temporary machine next to the 13.2.5 cluster with the >> 13.2.6 packages from Shaman. >> >> On that machine I'm running: >> >> $ radosgw-admin gc process >> >> That seems to work as intended! So the PR seems to have fixed it. >> >> Should be fixed permanently when 13.2.6 is officially released. >> >> Wido > > Thank you, Wido, for sharing the results of your experiment. I'm happy > to learn that it was successful. And v13.2.6 was just released about 2 > hours ago. > I thought it was resolved, but it isn't. I counted all the OMAP values for the GC objects and I got back: gc.0: 0 gc.11: 0 gc.14: 0 gc.15: 0 gc.16: 0 gc.18: 0 gc.19: 0 gc.1: 0 gc.20: 0 gc.21: 0 gc.22: 0 gc.23: 0 gc.24: 0 gc.25: 0 gc.27: 0 gc.29: 0 gc.2: 0 gc.30: 0 gc.3: 0 gc.4: 0 gc.5: 0 gc.6: 0 gc.7: 0 gc.8: 0 gc.9: 0 gc.13: 110996 gc.10: 111104 gc.26: 111142 gc.28: 111292 gc.17: 111314 gc.12: 111534 gc.31: 111956 So as you can see a few remain. I ran: $ radosgw-admin gc process --debug-rados=10 That finishes within 10 seconds. Then I tried: $ radosgw-admin gc process --debug-rados=10 --include-all That also finishes within 10 seconds. What I noticed in the logs was this: 2019-06-11 09:06:58.711 7f8ffb876240 10 librados: call oid=gc.17 nspace= 2019-06-11 09:06:58.717 7f8ffb876240 10 librados: Objecter returned from call r=-16 The return value is '-16' for gc.17 where for gc.18 or any other object with 0 OMAP values it is: 2019-06-11 09:06:58.717 7f8ffb876240 10 librados: call oid=gc.18 nspace= 2019-06-11 09:06:58.720 7f8ffb876240 10 librados: Objecter returned from call r=0 So I set --debug-rgw=10 RGWGC::process failed to acquire lock on gc.17 I haven't tried stopping all the RGWs yet as that will impact the services, but might that be the root-cause here? Wido > Eric > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com