RGW lifecycle bucket stuck processing?

Ben Hines <bhines@xxxxxxxxx> · Thu, 13 Apr 2017 11:10:06 -0700

I initiated a manual lifecycle cleanup with:
radosgw-admin lc process

It took over a day working on my bucket called 'bucket1'  (w/2 million objects) and seems like it eventually got stuck with about 1.7 million objs left, with uninformative errors like:  (notice the timestamps)

2017-04-12 18:50:15.706952 7f90aa5dcc80  0 ERROR: rgw_remove_object
2017-04-12 18:50:16.841254 7f90aa5dcc80  0 ERROR: rgw_remove_object
2017-04-12 18:50:17.153323 7f90aa5dcc80  0 ERROR: rgw_remove_object
2017-04-12 18:50:20.752924 7f90aa5dcc80  0 ERROR: rgw_remove_object
2017-04-12 18:50:25.400460 7f90aa5dcc80  0 ERROR: rgw_remove_object
2017-04-13 03:19:30.027773 7f9099069700  0 -- 10.29.16.57:0/3392796805 >> 10.29.16.53:6801/20291 conn(0x7f9084002990 :-1 s=STATE_OPEN pgs=167140106 cs=1 l=0).fault initiating reconnect
2017-04-13 03:36:30.721085 7f9099069700  0 -- 10.29.16.57:0/3392796805 >> 10.29.16.53:6801/20291 conn(0x7f90841d6ef0 :-1 s=STATE_OPEN pgs=167791627 cs=1 l=0).fault initiating reconnect
2017-04-13 03:46:46.143055 7f90aa5dcc80  0 ERROR: rgw_remove_object

This morning i aborted it with control-c. Now 'lc list' still shows the bucket as processing, and lc process returns quickly, as if the bucket is still locked:

radosgw-admin lc list

...
    {
        "bucket": ":bucket1:default.42048218.4",
        "status": "PROCESSING"
    },

-bash-4.2$ time radosgw-admin lc process
2017-04-13 11:07:48.482671 7f4fbeb87c80  0 System already converted

real    0m17.785s

Is is possible it left behind a stale lock on the bucket due to the control-c?

-Ben
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com