Lifecycle Stuck PROCESSING and UNINITIAL

Reid Guyett <reid.guyett@xxxxxxxxx> · Thu, 17 Oct 2024 11:44:20 -0400

Hello,

I am experiencing an issue where it seems all lifecycles are showing either
PROCESSING or UNINITIAL.

> # radosgw-admin lc list
> [
>     {
>         "bucket":
> ":tesra:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.833499554.20",
>         "shard": "lc.0",
>         "started": "Thu, 17 Oct 2024 00:00:01 GMT",
>         "status": "PROCESSING"
>     },
>     {
>         "bucket":
> ":primevideos:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.311333334.886",
>         "shard": "lc.3",
>         "started": "Wed, 16 Oct 2024 00:00:01 GMT",
>         "status": "PROCESSING"
>     },
>     {
>         "bucket":
> ":editorimages:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.311333334.3",
>         "shard": "lc.4",
>         "started": "Wed, 16 Oct 2024 00:00:01 GMT",
>         "status": "PROCESSING"
>     },
>     {
>         "bucket":
> ":osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1",
>         "shard": "lc.10",
>         "started": "Thu, 17 Oct 2024 00:00:01 GMT",
>         "status": "PROCESSING"
>     },
>     {
>         "bucket":
> ":projects0609:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.856877269.1147",
>         "shard": "lc.10",
>         "started": "Thu, 01 Jan 1970 00:00:00 GMT",
>         "status": "UNINITIAL"
>     },
> ...
>

I turned up the log level for rgw and see a most have some sort of error:
"failed to put head" or "returned error ret==-2"

> 2024-10-17T15:08:31.720+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> head.marker !empty() at START for shard==lc.10 head last stored at Mon Sep
> 23 00:00:00 2024
> 2024-10-17T15:08:31.720+0000 7fc9cf968640 16 lifecycle:
> RGWLC::expired_session started: 1729123201 interval: 86400(*2==172800) now:
> 1729177711
> 2024-10-17T15:08:31.720+0000 7fc9cf968640  5 lifecycle: RGWLC::process():
> ACTIVE entry:
> :osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1::1729123201:1
> index: 10 worker ix: 0
> 2024-10-17T15:08:31.744+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> failed to put head lc.10
> 2024-10-17T15:08:31.779+0000 7fc9cf968640  5 lifecycle: RGWLC::process():
> ENTER: index: 2 worker ix: 0
> 2024-10-17T15:08:31.781+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> head.marker !empty() at START for shard==lc.2 head last stored at Wed Jun
> 19 00:00:02 2024
> 2024-10-17T15:08:31.781+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> sal_lc->get_entry(lc_shard, head.marker, entry) returned error ret==-2
> 2024-10-17T15:08:31.782+0000 7fc9cf968640  5 lifecycle: RGWLC::process():
> ENTER: index: 23 worker ix: 0
> 2024-10-17T15:08:31.783+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> head.marker !empty() at START for shard==lc.23 head last stored at Mon Jan
>  8 00:00:00 2024
> 2024-10-17T15:08:31.784+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> sal_lc->get_entry(lc_shard, head.marker, entry) returned error ret==-2
> 2024-10-17T15:08:31.784+0000 7fc9cf968640  5 lifecycle: RGWLC::process():
> ENTER: index: 22 worker ix: 0
> 2024-10-17T15:08:31.786+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> head.marker !empty() at START for shard==lc.22 head last stored at Mon Aug
> 19 00:00:00 2024
> 2024-10-17T15:08:31.786+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> sal_lc->get_entry(lc_shard, head.marker, entry) returned error ret==-2
> 2024-10-17T15:08:31.787+0000 7fc9cf968640  5 lifecycle: RGWLC::process():
> ENTER: index: 17 worker ix: 0
> 2024-10-17T15:08:31.788+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> head.marker !empty() at START for shard==lc.17 head last stored at Mon Jul
> 22 00:00:00 2024
> 2024-10-17T15:08:31.788+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> sal_lc->get_entry(lc_shard, head.marker, entry) returned error ret==-2
> 2024-10-17T15:08:31.789+0000 7fc9cf968640  5 lifecycle: RGWLC::process():
> ENTER: index: 7 worker ix: 0
> 2024-10-17T15:08:31.790+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> head.marker !empty() at START for shard==lc.7 head last stored at Sat Jun
>  8 00:00:02 2024
> 2024-10-17T15:08:31.791+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> sal_lc->get_entry(lc_shard, head.marker, entry) returned error ret==-2
> 2024-10-17T15:08:31.791+0000 7fc9cf968640  5 lifecycle: RGWLC::process():
> ENTER: index: 0 worker ix: 0
> 2024-10-17T15:08:31.793+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> head.marker !empty() at START for shard==lc.0 head last stored at Mon Sep
> 23 00:00:00 2024
> 2024-10-17T15:08:31.793+0000 7fc9cf968640 16 lifecycle:
> RGWLC::expired_session started: 1729123201 interval: 86400(*2==172800) now:
> 1729177711
> 2024-10-17T15:08:31.793+0000 7fc9cf968640  5 lifecycle: RGWLC::process():
> ACTIVE entry:
> :tesra:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.833499554.20::1729123201:1
> index: 0 worker ix: 0
> 2024-10-17T15:08:31.794+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> failed to put head lc.0
> 2024-10-17T15:08:31.795+0000 7fc9cf968640  2 lifecycle: life cycle: stop
>

I don't see any errors when running the process command but the status
doesn't update to COMPLETE:

> # radosgw-admin lc process --bucket osbackup --debug_rgw 10/5
> 2024-10-17T15:13:32.826+0000 7f59f15c0840  4 Realm:     realname
>      (5e3b0d1c-78de-499a-9fa3-23e2f9142ef7)
> 2024-10-17T15:13:32.826+0000 7f59f15c0840  4 ZoneGroup: us
>   (1c370840-4d35-4ada-85a4-14bd6eeb2b0a)
> 2024-10-17T15:13:32.826+0000 7f59f15c0840  4 Zone:      zonename
>       (5e9bc383-f7bd-4fd1-b607-1e563bfe0011)
> 2024-10-17T15:13:32.826+0000 7f59f15c0840  4 using period configuration:
> 69f026df-8fa4-42c8-8f62-1b0a902db797:12
> 2024-10-17T15:13:32.900+0000 7f59f15c0840  2 all 8 watchers are set,
> enabling cache
> 2024-10-17T15:13:32.923+0000 7f59d7536640  2 rgw data changes log:
> RGWDataChangesLog::ChangesRenewThread: start
> 2024-10-17T15:13:33.609+0000 7f59f15c0840 10 rgw notify: Started
> notification manager with: 1 workers
> 2024-10-17T15:13:33.610+0000 7f59f15c0840 10 cache get:
> name=zonename.rgw.meta+root+osbackup : miss
> 2024-10-17T15:13:33.611+0000 7f59f15c0840 10 cache put:
> name=zonename.rgw.meta+root+osbackup info.flags=0x11
> 2024-10-17T15:13:33.611+0000 7f59f15c0840 10
> adding zonename.rgw.meta+root+osbackup to cache LRU end
> 2024-10-17T15:13:33.611+0000 7f59f15c0840 10 cache get:
> name=zonename.rgw.meta+root+.bucket.meta.osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1
> : miss
> 2024-10-17T15:13:33.612+0000 7f59f15c0840 10 cache put:
> name=zonename.rgw.meta+root+.bucket.meta.osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1
> info.flags=0x17
> 2024-10-17T15:13:33.612+0000 7f59f15c0840 10
> adding zonename.rgw.meta+root+.bucket.meta.osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1
> to cache LRU end
> 2024-10-17T15:13:33.612+0000 7f59f15c0840 10 updating xattr:
> name=ceph.objclass.version bl.length()=42
> 2024-10-17T15:13:33.612+0000 7f59f15c0840 10 updating xattr:
> name=user.rgw.acl bl.length()=153
> 2024-10-17T15:13:33.612+0000 7f59f15c0840 10 updating xattr: name=
> user.rgw.lc bl.length()=208
> 2024-10-17T15:13:33.612+0000 7f59f15c0840 10 chain_cache_entry:
> cache_locator=zonename.rgw.meta+root+.bucket.meta.osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1
> 2024-10-17T15:13:33.612+0000 7f59f15c0840  5 lifecycle:
> RGWLC::process_bucket(): ENTER: index: 10 worker ix: 0
> 2024-10-17T15:13:33.641+0000 7f59f15c0840  5 lifecycle:
> RGWLC::process_bucket(): ACTIVE entry:
> :osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1::1729123201:1
> index: 10 worker ix: 0
> 2024-10-17T15:13:33.815+0000 7f59d7536640  2 rgw data changes log:
> RGWDataChangesLog::ChangesRenewThread: start
> 2024-10-17T15:13:33.867+0000 7f59f15c0840  2 removed watcher, disabling
> cache
>

I do see that one of the buckets doesn't exist but still is in the lc list.

> # radosgw-admin lc list | grep projects0609
>         "bucket":
> ":projects0609:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.856877269.1147",
> # radosgw-admin bucket stats --bucket projects0609
> failure: (2002) Unknown error 2002:

Is it possible to rm an LC entry from the LC list if the bucket doesn't
exist?

Is there anything else to check? I saw https://tracker.ceph.com/issues/68160
and the related tickets in the comments. Maybe it is related.

Our RGWs are running in 18.2.4 official containers but our cluster is
17.2.7 debian packages. We upgraded the RGWs due to constant crashing.

Thanks,

Reid
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx