Re: snaptrim number of objects

Angelo Höngens <angelo@xxxxxxxxxx> · Tue, 22 Aug 2023 00:38:33 +0200

On 21/08/2023 16:47, Manuel Lausch wrote:
> Hello,
>
> on my testcluster I played a bit with ceph quincy (17.2.6).
> I also see slow ops while deleting snapshots. With the previous major
> (pacific) this wasn't a issue.
> In my case this is related to the new mclock scheduler which is
> defaulted with quincy. With "ceph config set global osd_op_queue wpq".
> Thie issue is gone.(after restarting the OSDs of course). wpq was the
> previous default scheduler.
>
> Maybe this will help you.
>
> On the other hand, mclock shouldn't break down the cluster in this way.
> At least not with "high_client_ops" which I used. Maybe someone should
> have a look at this.
>
>
> Manuel

Hey Manuel,

You made me a happy man (for now!)

In short: wpq indeed seems to do waaaaay better in my setup.

We did a lot of tuning with the mclock scheduler, tuned
osd_mclock_max_capacity_iops_hdd, tried a lot of different settings
for osd_snap_trim_sleep_hdd/ssd, etc, but it did not yet have the
desired effect. The only thing that prevented my cluster from going
down, was setting osd_max_trimming_pgs to 0 on all disks, and set it
to 1 or 2 for a few OSD's at a time. As soon as I enable too many
OSD's, everything would bog down, slow ops everywhere, hanging cephfs
clients, etc. I think I could do a max of 100 objects/sec
snaptrimming.

I also played around with the different mclock profiles to speed up
recovery. I think with high_client_ops, we got 400-600MB/s client io,
and about 50MB/s recovery io (we had a few degraded objects and some
rebalancing). With the high_recovery_ops profile I think I was able to
get around 400-500MB/s client write, and 300MB/s recovery. As soon as
I enabled snaptrimming, stuff would get quite a bit slower.

At your suggestion I just changed the osd_op_queue to wpq, removed
almost all other osd config variables and restarted all osd's.

Now I see 400-600MB/s client i/o (normal) AND I see recovery at
1500MB/s(!) AND it's also snaptrimmimng at 250 objects/sec. And I
haven't seen the first slow op warning yet!

I'm still cautious, but for now, this looks very positive!

This also leads me to agree with you there's 'something wrong' with
the mclock scheduler. I was almost starting to suspect hardware issues
or something like that, I was at my wit's end.

Angelo.

On Mon, Aug 21, 2023 at 4:49 PM Manuel Lausch <manuel.lausch@xxxxxxxx> wrote:
>
> Hello,
>
> on my testcluster I played a bit with ceph quincy (17.2.6).
> I also see slow ops while deleting snapshots. With the previous major
> (pacific) this wasn't a issue.
> In my case this is related to the new mclock scheduler which is
> defaulted with quincy. With "ceph config set global osd_op_queue wpq".
> Thie issue is gone.(after restarting the OSDs of course). wpq was the
> previous default scheduler.
>
> Maybe this will help you.
>
> On the other hand, mclock shouldn't break down the cluster in this way.
> At least not with "high_client_ops" which I used. Maybe someone should
> have a look at this.
>
>
> Manuel
>
>
>
> On Fri, 4 Aug 2023 17:40:42 -0400
> Angelo Höngens <angelo@xxxxxxxxxx> wrote:
>
> > Hey guys,
> >
> > I'm trying to figure out what's happening to my backup cluster that
> > often grinds to a halt when cephfs automatically removes snapshots.
> > Almost all OSD's go to 100% CPU, ceph complains about slow ops, and
> > CephFS stops doing client i/o.
> >
> > I'm graphing the cumulative value of the snaptrimq_len value, and that
> > slowly decreases over time. One night it takes an hour, but other
> > days, like today, my cluster has been down for almost 20 hours, and I
> > think we're half way. Funny thing is that in both cases, the
> > snaptrimq_len value initially goes to the same value, around 3000, and
> > then slowly decreases, but my guess is that the number of objects that
> > need to be trimmed varies hugely every day.
> >
> > Is there a way to show the size of cephfs snapshots, or get the number
> > of objects or bytes that need snaptrimming? Perhaps I can graph that
> > and see where the differences are.
> >
> > That won't explain why my cluster bogs down, but at least it gives
> > some visibility. Running 17.2.6 everywhere by the way.
> >
> > Angelo.
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx