Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

Jakub Jaszewski <jaszewski.jakub@xxxxxxxxx> · Wed, 14 Dec 2022 18:31:54 +0100

Sure, I tried in screen session before but it did not reduce the queue.

Eventually managed to zero the queue by increasing these params
radosgw-admin gc process --include-all --debug-rgw=20
--rgw-gc-max-concurrent-io=20 --rgw-gc-max-trim-chunk=64
--rgw-gc-processor-max-time=7200

I think it was the matter of the lock on GC shard being released before
RGWGC::process finished removing all objects included in that shard,
however I did notice any errors in the output with debug-rgw=20
https://github.com/ceph/ceph/blob/octopus/src/rgw/rgw_gc.cc#L514
Many thanks
Jakub

On Wed, Dec 14, 2022 at 1:24 AM Boris Behrens <bb@xxxxxxxxx> wrote:

> You could try to do this in a screen session for a while.
> while true; do radosgw-admin gc process; done
>
> Maybe your normal RGW daemons are too busy for GC processing.
> We have this in our config and have started extra RGW instances for GC
> only:
> [global]
> ...
> # disable garbage collector default
> rgw_enable_gc_threads = false
> [client.gc-host1]
> rgw_frontends = "beast endpoint=[::1]:7489"
> rgw_enable_gc_threads = true
>
> Am Mi., 14. Dez. 2022 um 01:14 Uhr schrieb Jakub Jaszewski
> <jaszewski.jakub@xxxxxxxxx>:
> >
> > Hi Boris, many thanks for the link!
> >
> > I see that GC list keep growing on my cluster and there are some very
> big multipart objects on the GC list, even 138660 parts that I calculate as
> >500GB in size.
> > These objects are visible on the GC list but not on rados-level when
> calling radosgw-admin --bucket=bucket_name bucket radoslist
> > Also I manually called GC process,  radosgw-admin gc process
> --bucket=bucket_name --debug-rgw=20   which according to logs did the job
> (no errors raised although objects do not exist in rados?)
> > ...
> > 2022-12-13T20:21:06.635+0100 7fe0eb771080 20 garbage collection:
> RGWGC::process iterating over entry
> tag='2~rBQjiZ4SWUf8u9IS1BUXGEwCnFTHqfD',
> time=2022-12-13T12:35:59.727067+0100, chain.objs.size()=138660
> > 2022-12-13T20:21:06.635+0100 7fe0eb771080  5 garbage collection:
> RGWGC::process removing
> default.rgw.buckets.data:b4a09486-4fb6-474a-a45a-3fc6f7778e27.6781345.2__multipart_2ib3aonh7thn59a394l06un5i9lu2fhf1r2sl2g6rrhqbqhv6pjg.2~rBQjiZ4SWUf8u9IS1BUXGEwCnFTHqfD.1
> > 2022-12-13T20:21:06.703+0100 7fe0eb771080  5 garbage collection:
> RGWGC::process removing
> default.rgw.buckets.data:b4a09486-4fb6-474a-a45a-3fc6f7778e27.6781345.2__shadow_2ib3aonh7thn59a394l06un5i9lu2fhf1r2sl2g6rrhqbqhv6pjg.2~rBQjiZ4SWUf8u9IS1BUXGEwCnFTHqfD.1_1
> > 2022-12-13T20:21:06.859+0100 7fe0eb771080  5 garbage collection:
> RGWGC::process removing
> default.rgw.buckets.data:b4a09486-4fb6-474a-a45a-3fc6f7778e27.6781345.2__shadow_2ib3aonh7thn59a394l06un5i9lu2fhf1r2sl2g6rrhqbqhv6pjg.2~rBQjiZ4SWUf8u9IS1BUXGEwCnFTHqfD.1_2
> > ...
> > but GC queue did not reduce, objects are still on the GC list.
> >
> > Do you happen to know how to remove non existent RADOS objects from RGW
> GC list ?
> >
> > One more thing i have to check is max_secs=3600 for GC when entering
> particular index_shard. As you can see in the logs, processing of
> multiparted objects takes more than 3600 seconds.  I will try to increase
> rgw_gc_processor_max_time
> >
> > 2022-12-13T20:20:13.168+0100 7fe0eb771080 20 garbage collection:
> RGWGC::process entered with GC index_shard=25, max_secs=3600, expired_only=1
> > 2022-12-13T20:20:13.168+0100 7fe0eb771080 20 garbage collection:
> RGWGC::process cls_rgw_gc_list returned with returned:0, entries.size=0,
> truncated=0, next_marker=''
> > 2022-12-13T20:20:13.172+0100 7fe0eb771080 20 garbage collection:
> RGWGC::process cls_rgw_gc_list returned NO non expired entries, so setting
> cache entry to TRUE
> > 2022-12-13T20:20:27.748+0100 7fe02ffff700  2
> RGWDataChangesLog::ChangesRenewThread: start
> > 2022-12-13T20:20:49.748+0100 7fe02ffff700  2
> RGWDataChangesLog::ChangesRenewThread: start
> > ...
> > 2022-12-13T20:21:05.339+0100 7fe0eb771080 20 garbage collection:
> RGWGC::process cls_rgw_gc_queue_list_entries returned with return value:0,
> entries.size=100, truncated=1, next_marker='4/20986990'
> > 2022-12-13T20:21:06.635+0100 7fe0eb771080 20 garbage collection:
> RGWGC::process iterating over entry
> tag='2~rBQjiZ4SWUf8u9IS1BUXGEwCnFTHqfD',
> time=2022-12-13T12:35:59.727067+0100, chain.objs.size()=138660
> > 2022-12-13T20:21:06.635+0100 7fe0eb771080  5 garbage collection:
> RGWGC::process removing
> default.rgw.buckets.data:b4a09486-4fb6-474a-a45a-3fc6f7778e27.6781345.2__multipart_2ib3aonh7thn59a394l06un5i9lu2fhf1r2sl2g6rrhqbqhv6pjg.2~rBQjiZ4SWUf8u9IS1BUXGEwCnFTHqfD.1
> > 2022-12-13T20:21:06.703+0100 7fe0eb771080  5 garbage collection:
> RGWGC::process removing
> default.rgw.buckets.data:b4a09486-4fb6-474a-a45a-3fc6f7778e27.6781345.2__shadow_2ib3aonh7thn59a394l06un5i9lu2fhf1r2sl2g6rrhqbqhv6pjg.2~rBQjiZ4SWUf8u9IS1BUXGEwCnFTHqfD.1_1
> > ...
> > 2022-12-13T21:31:23.505+0100 7fe0eb771080  5 garbage collection:
> RGWGC::process removing
> default.rgw.buckets.data:b4a09486-4fb6-474a-a45a-3fc6f7778e27.6781345.2__
> >
> shadow_2ib3aonh7thn59a394l06un5i9lu2fhf1r2sl2g6rrhqbqhv6pjg.2~rBQjiZ4SWUf8u9IS1BUXGEwCnFTHqfD.4622_29
> > ...
> > 2022-12-13T21:31:23.565+0100 7fe0eb771080 20 garbage collection:
> RGWGC::process entered with GC index_shard=26, max_secs=3600, expired_only=1
> > ...
> >
> > Best Regards
> > Jakub
> >
> > On Wed, Dec 7, 2022 at 6:10 PM Boris <bb@xxxxxxxxx> wrote:
> >>
> >> Hi Jakub,
> >>
> >> the problem is in our case that we hit this bug (
> https://tracker.ceph.com/issues/53585) and the GC leads to this problem.
> >>
> >> We worked around this, by moving the GC to separate disk.
> >> It still runs nuts all the time, but at least it does not bring down
> the cluster, but now it looks like rados objects go missing.
> >>
> >> Mit freundlichen Grüßen
> >>  - Boris Behrens
> >>
> >> Am 07.12.2022 um 17:44 schrieb Jakub Jaszewski <
> jaszewski.jakub@xxxxxxxxx>:
> >>
> >> 
> >> Hi Boris,
> >>
> >> Did you manage to solve the problem?
> >> I have a similar one after upgrade from nautilus 14.2.22 to octopus
> 15.2.17. My ssd-class OSDs shuts down after being marked down more than 5
> times in 10 minutes after the series of
> >>
> >> 2022-12-04T06:52:30.705+0000 7fb2a8f6c700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fb28aed4700' had timed out after 15
> >> 2022-12-04T06:52:43.585+0000 7fb286ecc700  1 heartbeat_map
> reset_timeout 'OSD::osd_op_tp thread 0x7fb286ecc700' had timed out after 15
> >>
> >> 2022-12-04T06:52:43.597+0000 7fb298cea700  0 log_channel(cluster) log
> [WRN] : Monitor daemon marked osd.62 down, but it is still running
> >> 2022-12-04T06:52:43.597+0000 7fb298cea700  0 log_channel(cluster) log
> [DBG] : map e57013 wrongly marked me down at e57013
> >> 2022-12-04T06:56:20.388+0000 7fb298cea700  0 log_channel(cluster) log
> [WRN] : Monitor daemon marked osd.62 down, but it is still running
> >> 2022-12-04T06:56:20.388+0000 7fb298cea700  0 log_channel(cluster) log
> [DBG] : map e57031 wrongly marked me down at e57030
> >> 2022-12-04T07:16:07.323+0000 7fb298cea700  0 log_channel(cluster) log
> [WRN] : Monitor daemon marked osd.62 down, but it is still running
> >> 2022-12-04T07:16:07.323+0000 7fb298cea700  0 log_channel(cluster) log
> [DBG] : map e57040 wrongly marked me down at e57039
> >> 2022-12-04T07:17:21.955+0000 7fb298cea700  0 log_channel(cluster) log
> [WRN] : Monitor daemon marked osd.62 down, but it is still running
> >> 2022-12-04T07:17:21.955+0000 7fb298cea700  0 log_channel(cluster) log
> [DBG] : map e57048 wrongly marked me down at e57048
> >> 2022-12-04T07:18:40.723+0000 7fb298cea700  0 log_channel(cluster) log
> [WRN] : Monitor daemon marked osd.62 down, but it is still running
> >> 2022-12-04T07:18:40.723+0000 7fb298cea700  0 log_channel(cluster) log
> [DBG] : map e57058 wrongly marked me down at e57057
> >> 2022-12-04T07:19:55.463+0000 7fb298cea700  0 log_channel(cluster) log
> [WRN] : Monitor daemon marked osd.62 down, but it is still running
> >> 2022-12-04T07:19:55.463+0000 7fb298cea700  0 log_channel(cluster) log
> [DBG] : map e57065 wrongly marked me down at e57065
> >> 2022-12-04T07:21:14.030+0000 7fb298cea700  0 log_channel(cluster) log
> [WRN] : Monitor daemon marked osd.62 down, but it is still running
> >> 2022-12-04T07:21:14.030+0000 7fb298cea700  0 log_channel(cluster) log
> [DBG] : map e57074 wrongly marked me down at e57073
> >> 2022-12-04T07:22:34.078+0000 7fb298cea700  0 log_channel(cluster) log
> [WRN] : Monitor daemon marked osd.62 down, but it is still running
> >> 2022-12-04T07:22:34.078+0000 7fb298cea700  0 log_channel(cluster) log
> [DBG] : map e57081 wrongly marked me down at e57080
> >> 2022-12-04T07:22:34.078+0000 7fb298cea700 -1 osd.62 57081
> _committed_osd_maps marked down 6 > osd_max_markdown_count 5 in last
> 600.000000 seconds, shutting down
> >>
> >> at the same time i observe slow ops on ssd-class OSDs, ops look like:
> >>
> >> osd_op(client.575459444.0:1715813 4.7 4:e622f76a:gc::gc.20:head [call
> rgw_gc.rgw_gc_queue_list_entries in=31b] snapc 0=[]
> ondisk+read+known_if_redirected e54655)
> >> osd_op(client.575459438.0:1403912 4.7
> 4:e09767b9:::obj_delete_at_hint.0000000086:head [call lock.lock in=8b]
> snapc 0=[] ondisk+write+known_if_redirected e54655)
> >> osd_op(client.596382956.0:83988009 4.13 4:cea599e4:usage::usage.10:head
> [call rgw.user_usage_log_read in=94b] snapc 0=[]
> ondisk+read+known_if_redirected e57109)
> >> osd_op(client.596345306.0:14545144 4.13 4:cea599e4:usage::usage.10:head
> [call rgw.user_usage_log_add in=173b] snapc 0=[]
> ondisk+write+known_if_redirected e57109)
> >> osd_op(client.596345312.0:2369917 4.13
> 4:cedbb205:::obj_delete_at_hint.0000000017:head [call lock.lock in=50b]
> snapc 0=[] ondisk+write+known_if_redirected e57109)
> >>
> >> where pool number 4 is default.rgw.log pool, not sure how to decode
> remaining part of osd_op
> >>
> >> Involved SSDs are shared by RGWs pools default.rgw.buckets.index ,
> default.rgw.log and are far from being saturated
> >> --- POOLS ---
> >> POOL                        ID  PGS   STORED   (DATA)   (OMAP)
>  OBJECTS  USED     (DATA)   (OMAP)   %USED  MAX AVAIL  QUOTA OBJECTS  QUOTA
> BYTES  DIRTY  USED COMPR  UNDER COMPR
> >> default.rgw.buckets.index    1    32   52 GiB      0 B   52 GiB
> 6.21k  157 GiB      0 B  157 GiB   0.67    7.6 TiB  N/A            N/A
>       N/A         0 B          0 B
> >> default.rgw.log              4    32  514 MiB   11 MiB  502 MiB
> 238  1.5 GiB   35 MiB  1.5 GiB      0    7.6 TiB  N/A            N/A
>     N/A         0 B          0 B
> >>
> >> After analyzing RGW traffic, it seems to be related to the wave of
> objects deletion, next wave should occur on Sunday, I will increase
> debug_bluestore this time, maybe something interesting will be logged.
> >> I've also performed manual kvstore compaction and increased
> osd_op_thread_timeout from 15 to 60 seconds, hope it will help.
> >>
> >>
> >> Best Regards
> >> Jakub
> >>
> >>
> >> On Tue, Apr 26, 2022 at 6:55 PM Boris <bb@xxxxxxxxx> wrote:
> >>>
> >>> I have this problem also with OSDs that are with SSDs as block.db.
> >>>
> >>> >
> >>> > Am 26.04.2022 um 17:10 schrieb Konstantin Shalygin <k0ste@xxxxxxxx>:
> >>> >
> >>> > Hi,
> >>> >
> >>> > After some load HDD's will be not perform well. You should move
> block.db's to NVMe for avoid database vacuuming problems
> >>> >
> >>> > k
> >>> > Sent from my iPhone
> >>> >
> >>> >> On 26 Apr 2022, at 13:58, Boris Behrens <bb@xxxxxxxxx> wrote:
> >>> >>
> >>> >> The cluster contains 12x8TB OSDs without any SSDs as cache
> >>> >
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> im groÃƒ¼en Saal.
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx