Re: Pool Latency

Seena Fallah <seenafallah@xxxxxxxxx> · Mon, 19 Jul 2021 11:00:18 +0430

Is there any way I can unset pglog_hardlimit from osdmap?
I see the release note about this flag that it could not be unset but I
don't get why because the only diff when this flag is on is in timing pg
logs in aggressive mode and I don't get why if I unset this flag anything
might be hurt?
The way I want to unset is to decompile osdmap remove this flag and compile
it again and set it to Ceph.

On Mon, Jul 19, 2021 at 12:04 AM Seena Fallah <seenafallah@xxxxxxxxx> wrote:

> I don't think it's a pool based config and in my cluster, it's set on
> osdmap level flags. The pool I test in the higher latency cluster that has
> much lower latency had 18 pgs and the higher latency pool has 8212 pgs.
> The higher latency cluster has this flag the lower one doesn't have.
>
> On Sun, Jul 18, 2021 at 11:57 PM Brett Niver <bniver@xxxxxxxxxx> wrote:
>
>> Seena,
>>
>> Which pool has the hardlimit flag set, the lower latency one, or the
>> higher?
>> Brett
>>
>>
>> On Sun, Jul 18, 2021 at 12:17 PM Seena Fallah <seenafallah@xxxxxxxxx>
>> wrote:
>>
>>> I've checked out my logs and see there is pg log trimming on each op and
>>> it's in aggressive mode. I checked the osdmap flags and see there is a
>>> pglog_hardlimit flag set in it, but the other cluster doesn't have.
>>> Should I tune any config related to this flag in v12.2.13?
>>> I've seen this PR (https://github.com/ceph/ceph/pull/20394) that is not
>>> backported to the luminus. Could this help?
>>>
>>> On Sun, Jul 18, 2021 at 12:09 AM Seena Fallah <seenafallah@xxxxxxxxx>
>>> wrote:
>>>
>>> > I've trimmed pg log on all OSDs and whoops (!) latency came from 100ms
>>> to
>>> > 20ms! But based on the other cluster I think it should come to around
>>> 7ms.
>>> > Is there anything related to pg log or other things that can help to
>>> > continue debugging?
>>> >
>>> > On Thu, Jul 15, 2021 at 3:13 PM Seena Fallah <seenafallah@xxxxxxxxx>
>>> > wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> I'm facing something strange in ceph (v12.2.13, filestore). I have two
>>> >> clusters with the same config (kernel, network, disks, ...). One of
>>> them
>>> >> has 3ms latency the other has 100ms latency. Both physical disk
>>> latency on
>>> >> write is less than 1ms.
>>> >> In the cluster with 100ms latency on write when I create another pool
>>> >> with the same configs (crush rule, replica, ...) and test the
>>> latency, it
>>> >> would like my another cluster. So it seems there is a problem in one
>>> of my
>>> >> pools!
>>> >> The pool has 8212 PGs and each PG is around 12GB with 844 objects.
>>> Also,
>>> >> I have many removed_snaps in this pool and I don't know if it impacts
>>> >> performance or not?
>>> >>
>>> >> Do you have any idea what is wrong with my pool? Is there any way to
>>> >> debug this problem?
>>> >>
>>> >> Thanks.
>>> >>
>>> >
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>
>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx