Don't think that the root cause has been found. I disabled versioning as I have to manually remove expired objects using s3 client. On Thu, 17 Oct 2024 at 17:50, Reid Guyett <reid.guyett@xxxxxxxxx> wrote: > Hello, > > I am experiencing an issue where it seems all lifecycles are showing either > PROCESSING or UNINITIAL. > > > # radosgw-admin lc list > > [ > > { > > "bucket": > > ":tesra:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.833499554.20", > > "shard": "lc.0", > > "started": "Thu, 17 Oct 2024 00:00:01 GMT", > > "status": "PROCESSING" > > }, > > { > > "bucket": > > ":primevideos:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.311333334.886", > > "shard": "lc.3", > > "started": "Wed, 16 Oct 2024 00:00:01 GMT", > > "status": "PROCESSING" > > }, > > { > > "bucket": > > ":editorimages:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.311333334.3", > > "shard": "lc.4", > > "started": "Wed, 16 Oct 2024 00:00:01 GMT", > > "status": "PROCESSING" > > }, > > { > > "bucket": > > ":osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1", > > "shard": "lc.10", > > "started": "Thu, 17 Oct 2024 00:00:01 GMT", > > "status": "PROCESSING" > > }, > > { > > "bucket": > > ":projects0609:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.856877269.1147", > > "shard": "lc.10", > > "started": "Thu, 01 Jan 1970 00:00:00 GMT", > > "status": "UNINITIAL" > > }, > > ... > > > > I turned up the log level for rgw and see a most have some sort of error: > "failed to put head" or "returned error ret==-2" > > > 2024-10-17T15:08:31.720+0000 7fc9cf968640 0 lifecycle: RGWLC::process() > > head.marker !empty() at START for shard==lc.10 head last stored at Mon > Sep > > 23 00:00:00 2024 > > 2024-10-17T15:08:31.720+0000 7fc9cf968640 16 lifecycle: > > RGWLC::expired_session started: 1729123201 interval: 86400(*2==172800) > now: > > 1729177711 > > 2024-10-17T15:08:31.720+0000 7fc9cf968640 5 lifecycle: RGWLC::process(): > > ACTIVE entry: > > :osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1::1729123201:1 > > index: 10 worker ix: 0 > > 2024-10-17T15:08:31.744+0000 7fc9cf968640 0 lifecycle: RGWLC::process() > > failed to put head lc.10 > > 2024-10-17T15:08:31.779+0000 7fc9cf968640 5 lifecycle: RGWLC::process(): > > ENTER: index: 2 worker ix: 0 > > 2024-10-17T15:08:31.781+0000 7fc9cf968640 0 lifecycle: RGWLC::process() > > head.marker !empty() at START for shard==lc.2 head last stored at Wed Jun > > 19 00:00:02 2024 > > 2024-10-17T15:08:31.781+0000 7fc9cf968640 0 lifecycle: RGWLC::process() > > sal_lc->get_entry(lc_shard, head.marker, entry) returned error ret==-2 > > 2024-10-17T15:08:31.782+0000 7fc9cf968640 5 lifecycle: RGWLC::process(): > > ENTER: index: 23 worker ix: 0 > > 2024-10-17T15:08:31.783+0000 7fc9cf968640 0 lifecycle: RGWLC::process() > > head.marker !empty() at START for shard==lc.23 head last stored at Mon > Jan > > 8 00:00:00 2024 > > 2024-10-17T15:08:31.784+0000 7fc9cf968640 0 lifecycle: RGWLC::process() > > sal_lc->get_entry(lc_shard, head.marker, entry) returned error ret==-2 > > 2024-10-17T15:08:31.784+0000 7fc9cf968640 5 lifecycle: RGWLC::process(): > > ENTER: index: 22 worker ix: 0 > > 2024-10-17T15:08:31.786+0000 7fc9cf968640 0 lifecycle: RGWLC::process() > > head.marker !empty() at START for shard==lc.22 head last stored at Mon > Aug > > 19 00:00:00 2024 > > 2024-10-17T15:08:31.786+0000 7fc9cf968640 0 lifecycle: RGWLC::process() > > sal_lc->get_entry(lc_shard, head.marker, entry) returned error ret==-2 > > 2024-10-17T15:08:31.787+0000 7fc9cf968640 5 lifecycle: RGWLC::process(): > > ENTER: index: 17 worker ix: 0 > > 2024-10-17T15:08:31.788+0000 7fc9cf968640 0 lifecycle: RGWLC::process() > > head.marker !empty() at START for shard==lc.17 head last stored at Mon > Jul > > 22 00:00:00 2024 > > 2024-10-17T15:08:31.788+0000 7fc9cf968640 0 lifecycle: RGWLC::process() > > sal_lc->get_entry(lc_shard, head.marker, entry) returned error ret==-2 > > 2024-10-17T15:08:31.789+0000 7fc9cf968640 5 lifecycle: RGWLC::process(): > > ENTER: index: 7 worker ix: 0 > > 2024-10-17T15:08:31.790+0000 7fc9cf968640 0 lifecycle: RGWLC::process() > > head.marker !empty() at START for shard==lc.7 head last stored at Sat Jun > > 8 00:00:02 2024 > > 2024-10-17T15:08:31.791+0000 7fc9cf968640 0 lifecycle: RGWLC::process() > > sal_lc->get_entry(lc_shard, head.marker, entry) returned error ret==-2 > > 2024-10-17T15:08:31.791+0000 7fc9cf968640 5 lifecycle: RGWLC::process(): > > ENTER: index: 0 worker ix: 0 > > 2024-10-17T15:08:31.793+0000 7fc9cf968640 0 lifecycle: RGWLC::process() > > head.marker !empty() at START for shard==lc.0 head last stored at Mon Sep > > 23 00:00:00 2024 > > 2024-10-17T15:08:31.793+0000 7fc9cf968640 16 lifecycle: > > RGWLC::expired_session started: 1729123201 interval: 86400(*2==172800) > now: > > 1729177711 > > 2024-10-17T15:08:31.793+0000 7fc9cf968640 5 lifecycle: RGWLC::process(): > > ACTIVE entry: > > :tesra:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.833499554.20::1729123201:1 > > index: 0 worker ix: 0 > > 2024-10-17T15:08:31.794+0000 7fc9cf968640 0 lifecycle: RGWLC::process() > > failed to put head lc.0 > > 2024-10-17T15:08:31.795+0000 7fc9cf968640 2 lifecycle: life cycle: stop > > > > I don't see any errors when running the process command but the status > doesn't update to COMPLETE: > > > # radosgw-admin lc process --bucket osbackup --debug_rgw 10/5 > > 2024-10-17T15:13:32.826+0000 7f59f15c0840 4 Realm: realname > > (5e3b0d1c-78de-499a-9fa3-23e2f9142ef7) > > 2024-10-17T15:13:32.826+0000 7f59f15c0840 4 ZoneGroup: us > > (1c370840-4d35-4ada-85a4-14bd6eeb2b0a) > > 2024-10-17T15:13:32.826+0000 7f59f15c0840 4 Zone: zonename > > (5e9bc383-f7bd-4fd1-b607-1e563bfe0011) > > 2024-10-17T15:13:32.826+0000 7f59f15c0840 4 using period configuration: > > 69f026df-8fa4-42c8-8f62-1b0a902db797:12 > > 2024-10-17T15:13:32.900+0000 7f59f15c0840 2 all 8 watchers are set, > > enabling cache > > 2024-10-17T15:13:32.923+0000 7f59d7536640 2 rgw data changes log: > > RGWDataChangesLog::ChangesRenewThread: start > > 2024-10-17T15:13:33.609+0000 7f59f15c0840 10 rgw notify: Started > > notification manager with: 1 workers > > 2024-10-17T15:13:33.610+0000 7f59f15c0840 10 cache get: > > name=zonename.rgw.meta+root+osbackup : miss > > 2024-10-17T15:13:33.611+0000 7f59f15c0840 10 cache put: > > name=zonename.rgw.meta+root+osbackup info.flags=0x11 > > 2024-10-17T15:13:33.611+0000 7f59f15c0840 10 > > adding zonename.rgw.meta+root+osbackup to cache LRU end > > 2024-10-17T15:13:33.611+0000 7f59f15c0840 10 cache get: > > > name=zonename.rgw.meta+root+.bucket.meta.osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1 > > : miss > > 2024-10-17T15:13:33.612+0000 7f59f15c0840 10 cache put: > > > name=zonename.rgw.meta+root+.bucket.meta.osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1 > > info.flags=0x17 > > 2024-10-17T15:13:33.612+0000 7f59f15c0840 10 > > adding > zonename.rgw.meta+root+.bucket.meta.osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1 > > to cache LRU end > > 2024-10-17T15:13:33.612+0000 7f59f15c0840 10 updating xattr: > > name=ceph.objclass.version bl.length()=42 > > 2024-10-17T15:13:33.612+0000 7f59f15c0840 10 updating xattr: > > name=user.rgw.acl bl.length()=153 > > 2024-10-17T15:13:33.612+0000 7f59f15c0840 10 updating xattr: name= > > user.rgw.lc bl.length()=208 > > 2024-10-17T15:13:33.612+0000 7f59f15c0840 10 chain_cache_entry: > > > cache_locator=zonename.rgw.meta+root+.bucket.meta.osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1 > > 2024-10-17T15:13:33.612+0000 7f59f15c0840 5 lifecycle: > > RGWLC::process_bucket(): ENTER: index: 10 worker ix: 0 > > 2024-10-17T15:13:33.641+0000 7f59f15c0840 5 lifecycle: > > RGWLC::process_bucket(): ACTIVE entry: > > :osbackup:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.668063260.1::1729123201:1 > > index: 10 worker ix: 0 > > 2024-10-17T15:13:33.815+0000 7f59d7536640 2 rgw data changes log: > > RGWDataChangesLog::ChangesRenewThread: start > > 2024-10-17T15:13:33.867+0000 7f59f15c0840 2 removed watcher, disabling > > cache > > > > I do see that one of the buckets doesn't exist but still is in the lc list. > > > # radosgw-admin lc list | grep projects0609 > > "bucket": > > ":projects0609:5e9bc383-f7bd-4fd1-b607-1e563bfe0011.856877269.1147", > > # radosgw-admin bucket stats --bucket projects0609 > > failure: (2002) Unknown error 2002: > > > Is it possible to rm an LC entry from the LC list if the bucket doesn't > exist? > > Is there anything else to check? I saw > https://tracker.ceph.com/issues/68160 > and the related tickets in the comments. Maybe it is related. > > Our RGWs are running in 18.2.4 official containers but our cluster is > 17.2.7 debian packages. We upgraded the RGWs due to constant crashing. > > Thanks, > > Reid > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Łukasz Borek lukasz@xxxxxxxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx