Re: Ceph recovery network speed

Curt <lightspd@xxxxxxxxx> · Fri, 24 Jun 2022 22:11:41 +0400

> You wrote 2TB before, are they 2TB or 18TB?  Is that 273 PGs total or per
osd?
Sorry, 18TB of data and 273 PGs total.

> `ceph osd df` will show you toward the right how many PGs are on each
OSD.  If you have multiple pools, some PGs will have more data than others.
>  So take an average # of PGs per OSD and divide the actual HDD capacity
by that.
20 pg on avg / 2TB(technically 1.8 I guess) which would be 10.  Shouldn't
that be used though, not capacity? My usage is only 23% capacity.  I
thought ceph autoscalling pg's changed the size dynamically according to
usage?  I'm guessing I'm misunderstanding that part?

Thanks,
Curt

On Fri, Jun 24, 2022 at 9:48 PM Anthony D'Atri <anthony.datri@xxxxxxxxx>
wrote:

>
> > Yes, SATA, I think my benchmark put it around 125, but that was a year
> ago, so could be misremembering
>
> A FIO benchmark, especially a sequential one on an empty drive, can
> mislead as to the real-world performance one sees on a fragmented drive.
>
> >  273 pg at 18TB so each PG would be 60G.
>
> You wrote 2TB before, are they 2TB or 18TB?  Is that 273 PGs total or per
> osd?
>
> >  Mainly used for RBD, using erasure coding.  cephadm bootstrap with
> docker images.
>
> Ack.  Have to account for replication.
>
> `ceph osd df` will show you toward the right how many PGs are on each
> OSD.  If you have multiple pools, some PGs will have more data than others.
>
> So take an average # of PGs per OSD and divide the actual HDD capacity by
> that.
>
>
>
>
> >
> > On Fri, Jun 24, 2022 at 9:21 PM Anthony D'Atri <anthony.datri@xxxxxxxxx>
> wrote:
> >
> >
> > >
> > > 2 PG's shouldn't take hours to backfill in my opinion.  Just 2TB
> enterprise HD's.
> >
> > SATA? Figure they can write at 70 MB/s
> >
> > How big are your PGs?  What is your cluster used for?  RBD? RGW? CephFS?
> >
> > >
> > > Take this log entry below, 72 minutes and still backfilling
> undersized?  Should it be that slow?
> > >
> > > pg 12.15 is stuck undersized for 72m, current state
> active+undersized+degraded+remapped+backfilling, last acting [34,10,29,NONE]
> > >
> > > Thanks,
> > > Curt
> > >
> > >
> > > On Fri, Jun 24, 2022 at 8:53 PM Anthony D'Atri <
> anthony.datri@xxxxxxxxx> wrote:
> > > Your recovery is slow *because* there are only 2 PGs backfilling.
> > >
> > > What kind of OSD media are you using?
> > >
> > > > On Jun 24, 2022, at 09:46, Curt <lightspd@xxxxxxxxx> wrote:
> > > >
> > > > Hello,
> > > >
> > > > I'm trying to understand why my recovery is so slow with only 2 pg
> > > > backfilling.  I'm only getting speeds of 3-4/MiB/s on a 10G
> network.  I
> > > > have tested the speed between machines with a few tools and all
> confirm 10G
> > > > speed.  I've tried changing various settings of priority and
> recovery sleep
> > > > hdd, but still the same. Is this a configuration issue or something
> else?
> > > >
> > > > It's just a small cluster right now with 4 hosts, 11 osd's per.
> Please let
> > > > me know if you need more information.
> > > >
> > > > Thanks,
> > > > Curt
> > > > _______________________________________________
> > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > >
> >
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx