Re: Upgrade to hammer, crush tuneables issue

Tomasz Kuzemko <tomasz@xxxxxxxxxxx> · Thu, 26 Nov 2015 09:52:09 +0100

This has nothing to do with the number of seconds between backfills. It is actually the number of objects from a PG being scanned during a single op when PG is backfilled. From what I can tell by looking at the source code, impact on performance comes from the fact that during this scanning the PG is locked for other operations.

From my benchmarks it's clearly evident that this has big impact on client latency during backfill. The lower the values for osd_backfill_scan_min and osd_backfill_scan_max, the less impact on latency but *longer* recovery time. Changing these values online will probably take affect only for PGs on which backfill has not yet started, which can explain why you did not see immediate effect of changing these on the fly.

--
Tomasz Kuzemko
tomasz@xxxxxxxxxxx

2015-11-26 0:24 GMT+01:00 Robert LeBlanc <robert@xxxxxxxxxxxxx>:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> I don't think this does what you think it does.....
>
> This will almost certainly starve the client of IO. This is the number
> of seconds between backfills, not the number of objects being scanned
> during a backfill. Setting these to higher values will make recovery
> take longer, but hardly affect the client. Setting these to low values
> will increase the rate of recovery so it takes less time, but will
> impact the performance of the clients.
>
> Also, I haven't had much luck changing these on the fly for
> recovery/backfill already in progress or queued.
> - ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Wed, Nov 25, 2015 at 2:42 PM, Tomasz Kuzemko  wrote:
> > To ease on clients you can change osd_backfill_scan_min and
> > osd_backfill_scan_max to 1. It's possible to change this online:
> > ceph tell osd.\* injectargs '--osd_backfill_scan_min 1'
> > ceph tell osd.\* injectargs '--osd_backfill_scan_max 1'
> >
> > 2015-11-24 16:52 GMT+01:00 Joe Ryner :
> >>
> >> Hi,
> >>
> >> Last night I upgraded my cluster from Centos 6.5 -> Centos 7.1 and in the
> >> process upgraded from Emperor -> Firefly -> Hammer
> >>
> >> When I finished I changed the crush tunables from
> >> ceph osd crush tunables legacy -> ceph osd crush tunables optimal
> >>
> >> I knew this would cause data movement.  But the IO for my clients is
> >> unacceptable.  Can any please tell what the best settings are for my
> >> configuration.  I have 2 Dell R720 Servers and 2 Dell R730 servers.  I have
> >> 36 1TB SATA SSD Drives in my cluster.  The servers have 128 GB of RAM.
> >>
> >> Below is some detail the might help.  According to my calculations the
> >> rebalance will take over a day.
> >>
> >> I would greatly appreciate some help on this.
> >>
> >> Thank you,
> >>
> >> Joe
> >>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com