Re: Tons of "cls_rgw.cc:3284: gc_iterate_entries end_key=" records in OSD logs

Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> · Mon, 20 Aug 2018 09:29:36 -0700

There was an existing bug reported for this one, and it's fixed on master:

http://tracker.ceph.com/issues/23801

It will be backport to luminous and mimic.

On Mon, Aug 20, 2018 at 9:25 AM, Yehuda Sadeh-Weinraub
<yehuda@xxxxxxxxxx> wrote:
> That message has been there since 2014. We should lower the log level though.
>
> Yehuda
>
> On Mon, Aug 20, 2018 at 6:08 AM, David Turner <drakonstein@xxxxxxxxx> wrote:
>> In luminous they consolidated a lot of the rgw metadata pools by using
>> namespace inside of the pools. I would say that the GC pool was consolidated
>> into the log pool based on the correlation you've found with the primary
>> osds.  At least that mystery is solved as to why those 8 osds. I don't know
>> why there logs are being spammed with GC messages though. Hopefully someone
>> else can shed a light on that. I cc'd Yehuda on this, the primary RGW dev.
>>
>>
>> On Mon, Aug 20, 2018, 7:09 AM Jakub Jaszewski <jaszewski.jakub@xxxxxxxxx>
>> wrote:
>>>
>>> Hi David,
>>>
>>> Right, we use this cluster (v12.2.5, fresh installation) for RGW, however,
>>> I don't see default.rgw.gc pool like we have on other cluster which was
>>> upgraded to Luminous, 10.2.9 -> 10.2.10 -> 12.2.2 (I believe that
>>> default.rgw.gc pool is there from the time of setting up RGW on Jewel
>>> version and the pool was automatically created).
>>>
>>> On impacted cluster we have below pools
>>> pool 1 '.rgw.root' replicated size 3 min_size 2 crush_rule 2 object_hash
>>> rjenkins pg_num 8 pgp_num 8 last_change 1499 flags hashpspool stripe_width 0
>>> application rgw
>>> pool 2 'rbd' replicated size 3 min_size 2 crush_rule 2 object_hash
>>> rjenkins pg_num 2048 pgp_num 2048 last_change 2230 flags hashpspool
>>> stripe_width 0 application rbd
>>> pool 3 'default.rgw.control' replicated size 3 min_size 2 crush_rule 2
>>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 1501 flags hashpspool
>>> stripe_width 0 application rgw
>>> pool 4 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 2
>>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 1491 flags hashpspool
>>> stripe_width 0 application rgw
>>> pool 5 'default.rgw.log' replicated size 3 min_size 2 crush_rule 2
>>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 1486 flags hashpspool
>>> stripe_width 0 application rgw
>>> pool 7 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule
>>> 2 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1483 owner
>>> 18446744073709551615 flags hashpspool stripe_width 0 application rgw
>>> pool 8 'default.rgw.buckets.data' erasure size 12 min_size 10 crush_rule 3
>>> object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 2228 flags
>>> hashpspool max_bytes 879609302220800 stripe_width 36864 application rgw
>>> pool 9 'default.rgw.buckets.non-ec' replicated size 3 min_size 2
>>> crush_rule 2 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1506 owner
>>> 18446744073709551615 flags hashpspool stripe_width 0 application rgw
>>>
>>> Eight annoying OSDs match to primary OSDs of PGs that make default.rgw.log
>>> pool.
>>>
>>> Many thanks
>>> Jakub
>>>
>>>
>>> On Mon, Aug 20, 2018 at 11:54 AM David Turner <drakonstein@xxxxxxxxx>
>>> wrote:
>>>>
>>>> I'm assuming you use RGW and that you have a GC pool for RGW. It also
>>>> might beat assumed that your GC pool only has 8 PGs.  Are any of those
>>>> guesses correct?
>>>>
>>>> On Mon, Aug 20, 2018, 5:13 AM Jakub Jaszewski <jaszewski.jakub@xxxxxxxxx>
>>>> wrote:
>>>>>
>>>>> Issue tracker http://tracker.ceph.com/issues/23801.
>>>>> Still don't know why only particular OSDs write this information to log
>>>>> files.
>>>>>
>>>>> Jakub
>>>>>
>>>>> On Wed, Aug 8, 2018 at 12:02 PM Jakub Jaszewski
>>>>> <jaszewski.jakub@xxxxxxxxx> wrote:
>>>>>>
>>>>>> Hi All, exactly the same story today, same 8 OSDs and a lot of garbage
>>>>>> collection objects to process
>>>>>>
>>>>>> Below is the number of "cls_rgw.cc:3284: gc_iterate_entries end_key="
>>>>>> entries per OSD log file
>>>>>> hostA:
>>>>>>   /var/log/ceph/ceph-osd.58.log
>>>>>>   1826467
>>>>>> hostB:
>>>>>>   /var/log/ceph/ceph-osd.88.log
>>>>>>   2924241
>>>>>> hostC:
>>>>>>   /var/log/ceph/ceph-osd.153.log
>>>>>>   581002
>>>>>>   /var/log/ceph/ceph-osd.164.log
>>>>>>   3278606
>>>>>> hostD:
>>>>>>   /var/log/ceph/ceph-osd.95.log
>>>>>>   1426963
>>>>>> hostE:
>>>>>>   /var/log/ceph/ceph-osd.4.log
>>>>>>   2716914
>>>>>>   /var/log/ceph/ceph-osd.53.log
>>>>>>   943749
>>>>>> hostF:
>>>>>>   /var/log/ceph/ceph-osd.172.log
>>>>>>   4085334
>>>>>>
>>>>>>
>>>>>> # radosgw-admin gc list --include-all|grep oid |wc -l
>>>>>> 302357
>>>>>> #
>>>>>>
>>>>>> Can anyone please explain what is going on ?
>>>>>>
>>>>>> Thanks!
>>>>>> Jakub
>>>>>>
>>>>>> On Tue, Aug 7, 2018 at 3:03 PM Jakub Jaszewski
>>>>>> <jaszewski.jakub@xxxxxxxxx> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> 8 out of 192 OSDs in our cluster (version 12.2.5) write plenty of
>>>>>>> records like "cls_rgw.cc:3284: gc_iterate_entries end_key=" to the
>>>>>>> corresponding log files, e.g.
>>>>>>>
>>>>>>> 2018-08-07 04:34:06.000585 7fdd8f012700  0 <cls>
>>>>>>> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
>>>>>>> end_key=1_01533616446.000580407
>>>>>>> 2018-08-07 04:34:06.001888 7fdd8f012700  0 <cls>
>>>>>>> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
>>>>>>> end_key=1_01533616446.001886318
>>>>>>> 2018-08-07 04:34:06.003395 7fdd8f012700  0 <cls>
>>>>>>> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
>>>>>>> end_key=1_01533616446.003390299
>>>>>>> 2018-08-07 04:34:06.005205 7fdd8f012700  0 <cls>
>>>>>>> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
>>>>>>> end_key=1_01533616446.005200341
>>>>>>>
>>>>>>> # grep '2018-08-07 04:34:06' /var/log/ceph/ceph-osd.4.log |wc -l
>>>>>>> 712
>>>>>>> #
>>>>>>>
>>>>>>> At the same time there were like 500 000 expired garbage collection
>>>>>>> objects.
>>>>>>>
>>>>>>> Log level of OSD subsystem is set to default 1/5 across all OSDs.
>>>>>>>
>>>>>>> I wonder why only few OSDs record this information and is it something
>>>>>>> to be logged in log level = 1 or maybe higher?
>>>>>>> https://github.com/ceph/ceph/blob/v12.2.5/src/cls/rgw/cls_rgw.cc#L3284
>>>>>>>
>>>>>>> Thanks
>>>>>>> Jakub
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com