Re: Tons of "cls_rgw.cc:3284: gc_iterate_entries end_key=" records in OSD logs

David Turner <drakonstein@xxxxxxxxx> · Mon, 20 Aug 2018 09:08:29 -0400

In luminous they consolidated a lot of the rgw metadata pools by using namespace inside of the pools. I would say that the GC pool was consolidated into the log pool based on the correlation you've found with the primary osds.  At least that mystery is solved as to why those 8 osds. I don't know why there logs are being spammed with GC messages though. Hopefully someone else can shed a light on that. I cc'd Yehuda on this, the primary RGW dev.

On Mon, Aug 20, 2018, 7:09 AM Jakub Jaszewski <jaszewski.jakub@xxxxxxxxx> wrote:
Hi David,

Right, we use this cluster (v12.2.5, fresh installation) for RGW, however, I don't see default.rgw.gc pool like we have on other cluster which was upgraded to Luminous, 10.2.9 -> 10.2.10 -> 12.2.2 (I believe that default.rgw.gc pool is there from the time of setting up RGW on Jewel version and the pool was automatically created).

On impacted cluster we have below pools
pool 1 '.rgw.root' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1499 flags hashpspool stripe_width 0 application rgw
pool 2 'rbd' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 2230 flags hashpspool stripe_width 0 application rbd
pool 3 'default.rgw.control' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1501 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1491 flags hashpspool stripe_width 0 application rgw
pool 5 'default.rgw.log' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1486 flags hashpspool stripe_width 0 application rgw
pool 7 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1483 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 8 'default.rgw.buckets.data' erasure size 12 min_size 10 crush_rule 3 object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 2228 flags hashpspool max_bytes 879609302220800 stripe_width 36864 application rgw
pool 9 'default.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1506 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw

Eight annoying OSDs match to primary OSDs of PGs that make default.rgw.log pool. 

Many thanks
Jakub 

On Mon, Aug 20, 2018 at 11:54 AM David Turner <drakonstein@xxxxxxxxx> wrote:
I'm assuming you use RGW and that you have a GC pool for RGW. It also might beat assumed that your GC pool only has 8 PGs.  Are any of those guesses correct?

On Mon, Aug 20, 2018, 5:13 AM Jakub Jaszewski <jaszewski.jakub@xxxxxxxxx> wrote:
Issue tracker http://tracker.ceph.com/issues/23801.
Still don't know why only particular OSDs write this information to log files.

Jakub

On Wed, Aug 8, 2018 at 12:02 PM Jakub Jaszewski <jaszewski.jakub@xxxxxxxxx> wrote:
Hi All, exactly the same story today, same 8 OSDs and a lot of garbage collection objects to process

Below is the number of "cls_rgw.cc:3284: gc_iterate_entries end_key=" entries per OSD log file
hostA:
  /var/log/ceph/ceph-osd.58.log
  1826467
hostB:
  /var/log/ceph/ceph-osd.88.log
  2924241
hostC:
  /var/log/ceph/ceph-osd.153.log
  581002
  /var/log/ceph/ceph-osd.164.log
  3278606
hostD:
  /var/log/ceph/ceph-osd.95.log
  1426963
hostE:
  /var/log/ceph/ceph-osd.4.log
  2716914
  /var/log/ceph/ceph-osd.53.log
  943749
hostF:
  /var/log/ceph/ceph-osd.172.log
  4085334

# radosgw-admin gc list --include-all|grep oid |wc -l
302357
#

Can anyone please explain what is going on ?

Thanks!
Jakub

On Tue, Aug 7, 2018 at 3:03 PM Jakub Jaszewski <jaszewski.jakub@xxxxxxxxx> wrote:
Hi,

8 out of 192 OSDs in our cluster (version 12.2.5) write plenty of records like "cls_rgw.cc:3284: gc_iterate_entries end_key=" to the corresponding log files, e.g.

2018-08-07 04:34:06.000585 7fdd8f012700  0 <cls> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries end_key=1_01533616446.000580407
2018-08-07 04:34:06.001888 7fdd8f012700  0 <cls> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries end_key=1_01533616446.001886318
2018-08-07 04:34:06.003395 7fdd8f012700  0 <cls> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries end_key=1_01533616446.003390299
2018-08-07 04:34:06.005205 7fdd8f012700  0 <cls> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries end_key=1_01533616446.005200341

# grep '2018-08-07 04:34:06' /var/log/ceph/ceph-osd.4.log |wc -l
712
#

At the same time there were like 500 000 expired garbage collection objects.

Log level of OSD subsystem is set to default 1/5 across all OSDs.

I wonder why only few OSDs record this information and is it something to be logged in log level = 1 or maybe higher? https://github.com/ceph/ceph/blob/v12.2.5/src/cls/rgw/cls_rgw.cc#L3284

Thanks
Jakub

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com