An issue presenting exactly like this was fixed in spring of last year, for certain on nautilus and higher. Matt On Sat, Apr 11, 2020, 12:04 PM <346415320@xxxxxx> wrote: > Ceph Version : Mimic 13.2.4 > > The cluster has been running steadily for more than a year, recently I > found cluster usage grows faster than usual .And we figured out the problem > is garbage collection. > 'radosgw-admin gc list ' has millions of objects to gc. the earliest tag > time is 2019-09 , but 99% of them are from 2020-03 to now > > `ceph df` > GLOBAL: > SIZE AVAIL RAW USED %RAW USED > 1.7 PiB 1.1 PiB 602 TiB 35.22 > POOLS: > NAME ID USED %USED MAX AVAIL > OBJECTS > .rgw.root 10 1.2 KiB 0 421 TiB > 4 > default.rgw.control 11 0 B 0 421 TiB > 8 > default.rgw.data.root 12 0 B 0 421 TiB > 0 > default.rgw.gc 13 0 B 0 421 TiB > 0 > default.rgw.log 14 4.8 GiB 0 421 TiB > 6414 > default.rgw.intent-log 15 0 B 0 421 TiB > 0 > default.rgw.meta 16 110 KiB 0 421 TiB > 463 > default.rgw.usage 17 0 B 0 421 TiB > 0 > default.rgw.users.keys 18 0 B 0 421 TiB > 0 > default.rgw.users.email 19 0 B 0 421 TiB > 0 > default.rgw.users.swift 20 0 B 0 421 TiB > 0 > default.rgw.users.uid 21 0 B 0 421 TiB > 0 > default.rgw.buckets.extra 22 0 B 0 421 TiB > 0 > default.rgw.buckets.index 23 0 B 0 421 TiB > 118720 > default.rgw.buckets.data 24 263 TiB 38.41 421 TiB > 138902771 > default.rgw.buckets.non-ec 25 0 B 0 421 TiB > 16678 > > however we counted each bucket usage by ' radosgw-admin bucket stats ' > ,it should cost only 160TiB , about 80TiB are in GC list > > former gc config setting before we find gc problem: > rgw_gc_max_objs = 32 > rgw_gc_obj_min_wait = 3600 > rgw_gc_processor_period = 3600 > rgw_gc_processor_max_time = 3600 > > yesterday we adjust our setting and restart rgw: > rgw_gc_max_objs = 1024 > rgw_gc_obj_min_wait = 300 > rgw_gc_processor_period = 600 > rgw_gc_processor_max_time = 600 > rgw_gc_max_concurrent_io = 40 > rgw_gc_max_trim_chunk = 1024 > > today we use ' rados -p default.rgw.log listomapkeys gc.$i --cluster > ceph -N gc | wc -l ' (i from 0 to 1023) > well , only gc.0 to gc.511 has data > > here are some result sorted > -time 14:43 result: > …… > 36 gc_202004111443/gc.502.tag > 38 gc_202004111443/gc.501.tag > 40 gc_202004111443/gc.136.tag > 46 gc_202004111443/gc.511.tag > 212 gc_202004111443/gc.9.tag > 218 gc_202004111443/gc.24.tag > 21976 gc_202004111443/gc.13.tag > 42956 gc_202004111443/gc.26.tag > 71772 gc_202004111443/gc.25.tag > 85766 gc_202004111443/gc.6.tag > 104504 gc_202004111443/gc.7.tag > 105444 gc_202004111443/gc.10.tag > 106114 gc_202004111443/gc.3.tag > 126860 gc_202004111443/gc.31.tag > 127352 gc_202004111443/gc.23.tag > 147942 gc_202004111443/gc.27.tag > 148046 gc_202004111443/gc.15.tag > 167116 gc_202004111443/gc.28.tag > 167932 gc_202004111443/gc.21.tag > 187986 gc_202004111443/gc.5.tag > 188312 gc_202004111443/gc.22.tag > 209084 gc_202004111443/gc.30.tag > 209152 gc_202004111443/gc.18.tag > 209702 gc_202004111443/gc.19.tag > 231100 gc_202004111443/gc.8.tag > 249622 gc_202004111443/gc.14.tag > 251092 gc_202004111443/gc.2.tag > 251366 gc_202004111443/gc.12.tag > 251802 gc_202004111443/gc.0.tag > 252158 gc_202004111443/gc.11.tag > 272114 gc_202004111443/gc.1.tag > 291518 gc_202004111443/gc.20.tag > 293646 gc_202004111443/gc.16.tag > 312998 gc_202004111443/gc.17.tag > 352984 gc_202004111443/gc.29.tag > 488232 gc_202004111443/gc.4.tag > 5935806 total > > > -time 16:53 result: > …… > 28 gc_202004111653/gc.324.tag > 28 gc_202004111653/gc.414.tag > 30 gc_202004111653/gc.350.tag > 30 gc_202004111653/gc.456.tag > 204 gc_202004111653/gc.9.tag > 208 gc_202004111653/gc.24.tag > 21986 gc_202004111653/gc.13.tag > 42964 gc_202004111653/gc.26.tag > 71780 gc_202004111653/gc.25.tag > 85778 gc_202004111653/gc.6.tag > 104512 gc_202004111653/gc.7.tag > 105452 gc_202004111653/gc.10.tag > 106122 gc_202004111653/gc.3.tag > 126866 gc_202004111653/gc.31.tag > 127372 gc_202004111653/gc.23.tag > 147944 gc_202004111653/gc.27.tag > 148058 gc_202004111653/gc.15.tag > 167124 gc_202004111653/gc.28.tag > 167936 gc_202004111653/gc.21.tag > 187992 gc_202004111653/gc.5.tag > 188320 gc_202004111653/gc.22.tag > 209090 gc_202004111653/gc.30.tag > 209170 gc_202004111653/gc.18.tag > 209704 gc_202004111653/gc.19.tag > 231108 gc_202004111653/gc.8.tag > 249632 gc_202004111653/gc.14.tag > 251096 gc_202004111653/gc.2.tag > 251376 gc_202004111653/gc.12.tag > 251806 gc_202004111653/gc.0.tag > 252170 gc_202004111653/gc.11.tag > 272118 gc_202004111653/gc.1.tag > 291526 gc_202004111653/gc.20.tag > 293656 gc_202004111653/gc.16.tag > 313000 gc_202004111653/gc.17.tag > 352988 gc_202004111653/gc.29.tag > 488238 gc_202004111653/gc.4.tag > 5932714 total > > > -time 18:55 result: > …… > 22 gc_202004111855/gc.187.tag > 22 gc_202004111855/gc.331.tag > 24 gc_202004111855/gc.163.tag > 24 gc_202004111855/gc.83.tag > 194 gc_202004111855/gc.9.tag > 208 gc_202004111855/gc.24.tag > 21994 gc_202004111855/gc.13.tag > 42966 gc_202004111855/gc.26.tag > 71796 gc_202004111855/gc.25.tag > 85788 gc_202004111855/gc.6.tag > 104522 gc_202004111855/gc.7.tag > 105458 gc_202004111855/gc.10.tag > 106126 gc_202004111855/gc.3.tag > 126872 gc_202004111855/gc.31.tag > 127378 gc_202004111855/gc.23.tag > 147954 gc_202004111855/gc.27.tag > 148066 gc_202004111855/gc.15.tag > 167136 gc_202004111855/gc.28.tag > 167940 gc_202004111855/gc.21.tag > 187996 gc_202004111855/gc.5.tag > 188326 gc_202004111855/gc.22.tag > 209092 gc_202004111855/gc.30.tag > 209178 gc_202004111855/gc.18.tag > 209720 gc_202004111855/gc.19.tag > 231126 gc_202004111855/gc.8.tag > 249646 gc_202004111855/gc.14.tag > 251106 gc_202004111855/gc.2.tag > 251378 gc_202004111855/gc.12.tag > 251816 gc_202004111855/gc.0.tag > 252176 gc_202004111855/gc.11.tag > 272134 gc_202004111855/gc.1.tag > 291528 gc_202004111855/gc.20.tag > 293662 gc_202004111855/gc.16.tag > 313010 gc_202004111855/gc.17.tag > 352996 gc_202004111855/gc.29.tag > 488250 gc_202004111855/gc.4.tag > 5931710 total > > -time 22:54 result: > …… > 32 gc_202004112254/gc.191.tag > 32 gc_202004112254/gc.196.tag > 32 gc_202004112254/gc.486.tag > 34 gc_202004112254/gc.382.tag > 208 gc_202004112254/gc.9.tag > 210 gc_202004112254/gc.24.tag > 22012 gc_202004112254/gc.13.tag > 43008 gc_202004112254/gc.26.tag > 71828 gc_202004112254/gc.25.tag > 85814 gc_202004112254/gc.6.tag > 104544 gc_202004112254/gc.7.tag > 105468 gc_202004112254/gc.10.tag > 106146 gc_202004112254/gc.3.tag > 126882 gc_202004112254/gc.31.tag > 127396 gc_202004112254/gc.23.tag > 147980 gc_202004112254/gc.27.tag > 148088 gc_202004112254/gc.15.tag > 167146 gc_202004112254/gc.28.tag > 167960 gc_202004112254/gc.21.tag > 188006 gc_202004112254/gc.5.tag > 188352 gc_202004112254/gc.22.tag > 209114 gc_202004112254/gc.30.tag > 209204 gc_202004112254/gc.18.tag > 209758 gc_202004112254/gc.19.tag > 231148 gc_202004112254/gc.8.tag > 249668 gc_202004112254/gc.14.tag > 251126 gc_202004112254/gc.2.tag > 251406 gc_202004112254/gc.12.tag > 251850 gc_202004112254/gc.0.tag > 252186 gc_202004112254/gc.11.tag > 272154 gc_202004112254/gc.1.tag > 291552 gc_202004112254/gc.20.tag > 293684 gc_202004112254/gc.16.tag > 313038 gc_202004112254/gc.17.tag > 353014 gc_202004112254/gc.29.tag > 488274 gc_202004112254/gc.4.tag > 5935738 total > > > it seems gc.0-gc.31 keeps on increasing. but gc.32 -gc.511 works fine. > maybe some objects in gc.0-gc.31 stuck gc process, so gc thread on gc.0 - > gc.31 cannot proceed > > after setting debug_rgw=20 , we find only a few process period (every 10 > minutes as set) can log like > '2020-04-11 23:02:18.724 7f7d1e61b700 5 gc::process: removing > default.rgw.buckets.data:d9d41307-13ce-4b43-aa24-2495540b67f5.985987.4__multipart_playback/2009550000004491/6b97a8a61424129d996d7a13769e8c66.ts.2~KnoW62bPMfmWBWtH0THWWzV3urFJglW.1' > > most of process period only shows: > '2020-04-11 23:32:14.814 7f7d1de1a700 2 object expiration: start > 2020-04-11 23:32:14.814 7f7d1de1a700 20 proceeding shard = > obj_delete_at_hint.0000000000 > 2020-04-11 23:32:14.815 7f7d1de1a700 20 proceeding shard = > obj_delete_at_hint.0000000001 > …… > 2020-04-11 23:22:14.811 7f7d1de1a700 20 proceeding shard = > obj_delete_at_hint.0000000126 > 2020-04-11 23:22:14.813 7f7d1de1a700 2 object expiration: stop > 2020-04-11 23:22:21.364 7f7d1e61b700 10 RGWGC::process() failed to acquire > lock on gc.4 > 2020-04-11 23:22:31.074 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:22:53.074 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:23:15.075 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:23:37.075 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:23:59.075 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:24:21.075 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:24:43.076 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:25:05.075 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:25:27.076 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:25:49.075 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:26:11.075 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:26:33.075 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:26:55.076 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:27:17.075 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:27:39.075 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:28:01.075 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:28:23.076 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:28:45.075 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:29:07.076 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:29:29.076 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:29:51.076 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:30:13.076 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:30:35.077 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:30:57.076 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:31:19.076 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:31:41.076 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:32:03.077 7f7d1f61d700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-11 23:32:14.814 7f7d1de1a700 2 object expiration: start > 2020-04-11 23:32:14.814 7f7d1de1a700 20 proceeding shard = > obj_delete_at_hint.0000000000 > 2020-04-11 23:32:14.815 7f7d1de1a700 20 proceeding shard = > obj_delete_at_hint.0000000001 > ……' > > and sometimes gc removes a few objects like: > > '2020-04-10 12:01:45.141 7f2acdd38700 2 object expiration: stop > 2020-04-10 12:01:45.886 7f2ace539700 10 RGWGC::process() failed to acquire > lock on gc.8 > 2020-04-10 12:01:45.911 7f2ace539700 5 gc::process: removing > default.rgw.buckets.data:d9d41307-13ce-4b43-aa24-2495540b67f5.985987.4__multipart_playback/2009510000009744/8d5cb6722d478fd2395af1bc25b73880.ts.2~rDnSwcUmF-MKFwneDyMQRiI4tXuw6K7.1 > 2020-04-10 12:01:45.911 7f2ace539700 5 gc::process: removing > default.rgw.buckets.data:d9d41307-13ce-4b43-aa24-2495540b67f5.985987.4__multipart_playback/2009510000009744/8d5cb6722d478fd2395af1bc25b73880.ts.2~astXMDje7aBnOLzHG7qIdofTrpHTsUB.1 > 2020-04-10 12:01:45.911 7f2ace539700 5 gc::process: removing > default.rgw.buckets.data:d9d41307-13ce-4b43-aa24-2495540b67f5.985987.4__multipart_playback/2009550000003154/39abc9fe1bf0f289c6e02132cbc2d96f.ts.2~YW4Uv7qhbKqZoru7wIGvJXZAuytD7s6.1 > 2020-04-10 12:01:45.911 7f2ace539700 5 gc::process: removing > default.rgw.buckets.data:d9d41307-13ce-4b43-aa24-2495540b67f5.985987.4__multipart_playback/2009550000004519/296b0c23da59ff9eba8a522fc5febd11.ts.2~bL7CZCB-mKP7w2RHFomEsgBzAH1rZt9.1 > 2020-04-10 12:01:45.912 7f2ace539700 5 gc::process: removing > default.rgw.buckets.data:d9d41307-13ce-4b43-aa24-2495540b67f5.2190746.7__shadow_.VYxHLF5VGGXMFsYy-CNiMOfDVI-NTfQ_1 > 2020-04-10 12:01:45.912 7f2ace539700 5 gc::process: removing > default.rgw.buckets.data:d9d41307-13ce-4b43-aa24-2495540b67f5.2190746.7__shadow_.VYxHLF5VGGXMFsYy-CNiMOfDVI-NTfQ_2 > 2020-04-10 12:01:45.912 7f2ace539700 5 gc::process: removing > default.rgw.buckets.data:d9d41307-13ce-4b43-aa24-2495540b67f5.2190746.7__shadow_.VYxHLF5VGGXMFsYy-CNiMOfDVI-NTfQ_3 > 2020-04-10 12:01:45.912 7f2ace539700 5 gc::process: removing > default.rgw.buckets.data:d9d41307-13ce-4b43-aa24-2495540b67f5.2190746.7__shadow_.VYxHLF5VGGXMFsYy-CNiMOfDVI-NTfQ_4 > 2020-04-10 12:01:45.912 7f2ace539700 5 gc::process: removing > default.rgw.buckets.data:d9d41307-13ce-4b43-aa24-2495540b67f5.2190746.7__shadow_.VYxHLF5VGGXMFsYy-CNiMOfDVI-NTfQ_5 > 2020-04-10 12:01:45.914 7f2ace539700 10 RGWGC::process() failed to acquire > lock on gc.10 > 2020-04-10 12:01:45.915 7f2ace539700 10 RGWGC::process() failed to acquire > lock on gc.11 > 2020-04-10 12:02:02.638 7f2acf53b700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-10 12:02:24.638 7f2acf53b700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-10 12:02:46.638 7f2acf53b700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2020-04-10 12:03:08.638 7f2acf53b700 2 > RGWDataChangesLog::ChangesRenewThread: start > ……' > > we scan those logs on 5 gc node( no other workload like object read/write > or lifecycle, only for gc). all is the same. the "removing xxx" log appears > only a few times within a few seconds in whole day's work > so we suppose some objs in gc.0-gc.31 stuck gc process > mannual ' radosgw-admin gc process ' not work , log is the same as above > > Has anyone encountered the same problem? > what can i do to make gc thread proceed on gc.0-gc.31 or get everything > back to normal > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx