Re: Lifecycle Stuck PROCESSING and UNINITIAL

Lukasz Borek <lukasz@xxxxxxxxxxxx> · Fri, 18 Oct 2024 09:54:08 +0200

Don't think that the root cause has been found. I disabled versioning as I
have to manually remove expired objects using s3 client.

On Thu, 17 Oct 2024 at 17:50, Reid Guyett <reid.guyett@xxxxxxxxx> wrote:

> Hello,
>
> I am experiencing an issue where it seems all lifecycles are showing either
> PROCESSING or UNINITIAL.
>
> > # radosgw-admin lc list
> > [
> >     {
> >         "bucket":
> > ":tesra:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.833499554.20",
> >         "shard": "lc.0",
> >         "started": "Thu, 17 Oct 2024 00:00:01 GMT",
> >         "status": "PROCESSING"
> >     },
> >     {
> >         "bucket":
> > ":primevideos:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.311333334.886",
> >         "shard": "lc.3",
> >         "started": "Wed, 16 Oct 2024 00:00:01 GMT",
> >         "status": "PROCESSING"
> >     },
> >     {
> >         "bucket":
> > ":editorimages:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.311333334.3",
> >         "shard": "lc.4",
> >         "started": "Wed, 16 Oct 2024 00:00:01 GMT",
> >         "status": "PROCESSING"
> >     },
> >     {
> >         "bucket":
> > ":osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1",
> >         "shard": "lc.10",
> >         "started": "Thu, 17 Oct 2024 00:00:01 GMT",
> >         "status": "PROCESSING"
> >     },
> >     {
> >         "bucket":
> > ":projects0609:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.856877269.1147",
> >         "shard": "lc.10",
> >         "started": "Thu, 01 Jan 1970 00:00:00 GMT",
> >         "status": "UNINITIAL"
> >     },
> > ...
> >
>
> I turned up the log level for rgw and see a most have some sort of error:
> "failed to put head" or "returned error ret==-2"
>
> > 2024-10-17T15:08:31.720+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> > head.marker !empty() at START for shard==lc.10 head last stored at Mon
> Sep
> > 23 00:00:00 2024
> > 2024-10-17T15:08:31.720+0000 7fc9cf968640 16 lifecycle:
> > RGWLC::expired_session started: 1729123201 interval: 86400(*2==172800)
> now:
> > 1729177711
> > 2024-10-17T15:08:31.720+0000 7fc9cf968640  5 lifecycle: RGWLC::process():
> > ACTIVE entry:
> > :osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1::1729123201:1
> > index: 10 worker ix: 0
> > 2024-10-17T15:08:31.744+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> > failed to put head lc.10
> > 2024-10-17T15:08:31.779+0000 7fc9cf968640  5 lifecycle: RGWLC::process():
> > ENTER: index: 2 worker ix: 0
> > 2024-10-17T15:08:31.781+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> > head.marker !empty() at START for shard==lc.2 head last stored at Wed Jun
> > 19 00:00:02 2024
> > 2024-10-17T15:08:31.781+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> > sal_lc->get_entry(lc_shard, head.marker, entry) returned error ret==-2
> > 2024-10-17T15:08:31.782+0000 7fc9cf968640  5 lifecycle: RGWLC::process():
> > ENTER: index: 23 worker ix: 0
> > 2024-10-17T15:08:31.783+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> > head.marker !empty() at START for shard==lc.23 head last stored at Mon
> Jan
> >  8 00:00:00 2024
> > 2024-10-17T15:08:31.784+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> > sal_lc->get_entry(lc_shard, head.marker, entry) returned error ret==-2
> > 2024-10-17T15:08:31.784+0000 7fc9cf968640  5 lifecycle: RGWLC::process():
> > ENTER: index: 22 worker ix: 0
> > 2024-10-17T15:08:31.786+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> > head.marker !empty() at START for shard==lc.22 head last stored at Mon
> Aug
> > 19 00:00:00 2024
> > 2024-10-17T15:08:31.786+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> > sal_lc->get_entry(lc_shard, head.marker, entry) returned error ret==-2
> > 2024-10-17T15:08:31.787+0000 7fc9cf968640  5 lifecycle: RGWLC::process():
> > ENTER: index: 17 worker ix: 0
> > 2024-10-17T15:08:31.788+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> > head.marker !empty() at START for shard==lc.17 head last stored at Mon
> Jul
> > 22 00:00:00 2024
> > 2024-10-17T15:08:31.788+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> > sal_lc->get_entry(lc_shard, head.marker, entry) returned error ret==-2
> > 2024-10-17T15:08:31.789+0000 7fc9cf968640  5 lifecycle: RGWLC::process():
> > ENTER: index: 7 worker ix: 0
> > 2024-10-17T15:08:31.790+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> > head.marker !empty() at START for shard==lc.7 head last stored at Sat Jun
> >  8 00:00:02 2024
> > 2024-10-17T15:08:31.791+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> > sal_lc->get_entry(lc_shard, head.marker, entry) returned error ret==-2
> > 2024-10-17T15:08:31.791+0000 7fc9cf968640  5 lifecycle: RGWLC::process():
> > ENTER: index: 0 worker ix: 0
> > 2024-10-17T15:08:31.793+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> > head.marker !empty() at START for shard==lc.0 head last stored at Mon Sep
> > 23 00:00:00 2024
> > 2024-10-17T15:08:31.793+0000 7fc9cf968640 16 lifecycle:
> > RGWLC::expired_session started: 1729123201 interval: 86400(*2==172800)
> now:
> > 1729177711
> > 2024-10-17T15:08:31.793+0000 7fc9cf968640  5 lifecycle: RGWLC::process():
> > ACTIVE entry:
> > :tesra:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.833499554.20::1729123201:1
> > index: 0 worker ix: 0
> > 2024-10-17T15:08:31.794+0000 7fc9cf968640  0 lifecycle: RGWLC::process()
> > failed to put head lc.0
> > 2024-10-17T15:08:31.795+0000 7fc9cf968640  2 lifecycle: life cycle: stop
> >
>
> I don't see any errors when running the process command but the status
> doesn't update to COMPLETE:
>
> > # radosgw-admin lc process --bucket osbackup --debug_rgw 10/5
> > 2024-10-17T15:13:32.826+0000 7f59f15c0840  4 Realm:     realname
> >      (5e3b0d1c-78de-499a-9fa3-23e2f9142ef7)
> > 2024-10-17T15:13:32.826+0000 7f59f15c0840  4 ZoneGroup: us
> >   (1c370840-4d35-4ada-85a4-14bd6eeb2b0a)
> > 2024-10-17T15:13:32.826+0000 7f59f15c0840  4 Zone:      zonename
> >       (5e9bc383-f7bd-4fd1-b607-1e563bfe0011)
> > 2024-10-17T15:13:32.826+0000 7f59f15c0840  4 using period configuration:
> > 69f026df-8fa4-42c8-8f62-1b0a902db797:12
> > 2024-10-17T15:13:32.900+0000 7f59f15c0840  2 all 8 watchers are set,
> > enabling cache
> > 2024-10-17T15:13:32.923+0000 7f59d7536640  2 rgw data changes log:
> > RGWDataChangesLog::ChangesRenewThread: start
> > 2024-10-17T15:13:33.609+0000 7f59f15c0840 10 rgw notify: Started
> > notification manager with: 1 workers
> > 2024-10-17T15:13:33.610+0000 7f59f15c0840 10 cache get:
> > name=zonename.rgw.meta+root+osbackup : miss
> > 2024-10-17T15:13:33.611+0000 7f59f15c0840 10 cache put:
> > name=zonename.rgw.meta+root+osbackup info.flags=0x11
> > 2024-10-17T15:13:33.611+0000 7f59f15c0840 10
> > adding zonename.rgw.meta+root+osbackup to cache LRU end
> > 2024-10-17T15:13:33.611+0000 7f59f15c0840 10 cache get:
> >
> name=zonename.rgw.meta+root+.bucket.meta.osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1
> > : miss
> > 2024-10-17T15:13:33.612+0000 7f59f15c0840 10 cache put:
> >
> name=zonename.rgw.meta+root+.bucket.meta.osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1
> > info.flags=0x17
> > 2024-10-17T15:13:33.612+0000 7f59f15c0840 10
> > adding
> zonename.rgw.meta+root+.bucket.meta.osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1
> > to cache LRU end
> > 2024-10-17T15:13:33.612+0000 7f59f15c0840 10 updating xattr:
> > name=ceph.objclass.version bl.length()=42
> > 2024-10-17T15:13:33.612+0000 7f59f15c0840 10 updating xattr:
> > name=user.rgw.acl bl.length()=153
> > 2024-10-17T15:13:33.612+0000 7f59f15c0840 10 updating xattr: name=
> > user.rgw.lc bl.length()=208
> > 2024-10-17T15:13:33.612+0000 7f59f15c0840 10 chain_cache_entry:
> >
> cache_locator=zonename.rgw.meta+root+.bucket.meta.osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1
> > 2024-10-17T15:13:33.612+0000 7f59f15c0840  5 lifecycle:
> > RGWLC::process_bucket(): ENTER: index: 10 worker ix: 0
> > 2024-10-17T15:13:33.641+0000 7f59f15c0840  5 lifecycle:
> > RGWLC::process_bucket(): ACTIVE entry:
> > :osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1::1729123201:1
> > index: 10 worker ix: 0
> > 2024-10-17T15:13:33.815+0000 7f59d7536640  2 rgw data changes log:
> > RGWDataChangesLog::ChangesRenewThread: start
> > 2024-10-17T15:13:33.867+0000 7f59f15c0840  2 removed watcher, disabling
> > cache
> >
>
> I do see that one of the buckets doesn't exist but still is in the lc list.
>
> > # radosgw-admin lc list | grep projects0609
> >         "bucket":
> > ":projects0609:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.856877269.1147",
> > # radosgw-admin bucket stats --bucket projects0609
> > failure: (2002) Unknown error 2002:
>
>
> Is it possible to rm an LC entry from the LC list if the bucket doesn't
> exist?
>
> Is there anything else to check? I saw
> https://tracker.ceph.com/issues/68160
> and the related tickets in the comments. Maybe it is related.
>
> Our RGWs are running in 18.2.4 official containers but our cluster is
> 17.2.7 debian packages. We upgraded the RGWs due to constant crashing.
>
> Thanks,
>
> Reid
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 
Łukasz Borek
lukasz@xxxxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx