Re: Upgrade to hammer, crush tuneables issue

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Thu, 26 Nov 2015 09:00:04 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Based on [1] and my experience with Hammer it is seconds. After
adjusting this back to the defaults and doing recovery in our
production cluster I saw batches of recovery start every 64 seconds.
It initially started out nice and distributed, but over time it
clumped up into the same 15 seconds. We would have 15 seconds of high
speed recovery than nothing for almost 50 seconds.

It is quite possible that Infernalis has changed the way this works
(the master documentation shows number of objects instead of seconds).
I haven't looked at the code to know for sure. I do know that setting
these values osd_backfill_scan_min=2,osd_backfill_san_max=16 on our
busy cluster where osd_max_backfills=1 would cause periodic slow I/O
during backfill/recovery. Setting osd_backfill_scan_min=16,
osd_backfill_scan_max=32 and osd_max_backfills=10 on the same cluster
eliminated the slow I/O.

[1] http://docs.ceph.com/docs/v0.80.5/rados/configuration/osd-config-ref/#backfilling
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.3
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWVyyBCRDmVDuy+mK58QAAevcP/3dACiBgu/VghcMLn7mB
6d8VwqGLcq0yekkzSm+iB4INMmlC4sBoQ0OF71Yn5YRRiztIY1nr4MHRsu5+
z+mrwxTwYw4yHcHO9xKBOwESdNhJuEJikrK3LqmHcZ1mK0gaHFuK+HhTR3N3
j099qtsSWH4E2YQ1RPX6ch0lNOdGeSTcfXycKnM9eD/1TAlksipDqqa4hVNV
tYQlA2dqTPnPx3D7rdkwdQDhV+EGlcAOQjL5Vf+R1mtPCrBoFEo3oBSXItDX
qrvUT6A5xsXOtmoTfER5TYIA2jNedOismM7ectY/+qYhwjKnYdTFQ4tI9GNK
FmcuKcgG8jGdGJDoLwYa58iBs7TdEDzhDtST/OUMEBlu7NeGQmLm9LM5PAsG
MF+vHaNAYFVN2EmPnI+zGzsSv/C+LNlJRKoUwbgGZ0BT0mU+pWcf2DkbUzNV
ThgqsJHVJ0EVMZ9pMTifAroeh3VwHVi0hbCIms5qvmgGikWh8Yr4YiCprn9D
Wa0f93rl/pzUPOyWTw2951sUlGNaJUX1MpxU8YNd340QTf2eBa9XratHWcsz
JmdRUyTghpk19MU19nVuJ4vXA3iELx4EcokCVXnexCLsyg7Nbpr1tCHDbNaK
QjWQGbAbz15QrcCmeAY+79fq3Ndl5PhUfIt5tlW2+jKJgK27YaAhoTyVx7Rm
IXC8
=JwlH
-----END PGP SIGNATURE-----
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Thu, Nov 26, 2015 at 1:52 AM, Tomasz Kuzemko <tomasz@xxxxxxxxxxx> wrote:
> This has nothing to do with the number of seconds between backfills. It is
> actually the number of objects from a PG being scanned during a single op
> when PG is backfilled. From what I can tell by looking at the source code,
> impact on performance comes from the fact that during this scanning the PG
> is locked for other operations.
>
> From my benchmarks it's clearly evident that this has big impact on client
> latency during backfill. The lower the values for osd_backfill_scan_min and
> osd_backfill_scan_max, the less impact on latency but *longer* recovery
> time. Changing these values online will probably take affect only for PGs on
> which backfill has not yet started, which can explain why you did not see
> immediate effect of changing these on the fly.
>
> --
> Tomasz Kuzemko
> tomasz@xxxxxxxxxxx
>
>
> 2015-11-26 0:24 GMT+01:00 Robert LeBlanc <robert@xxxxxxxxxxxxx>:
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> I don't think this does what you think it does.....
>>
>> This will almost certainly starve the client of IO. This is the number
>> of seconds between backfills, not the number of objects being scanned
>> during a backfill. Setting these to higher values will make recovery
>> take longer, but hardly affect the client. Setting these to low values
>> will increase the rate of recovery so it takes less time, but will
>> impact the performance of the clients.
>>
>> Also, I haven't had much luck changing these on the fly for
>> recovery/backfill already in progress or queued.
>> - ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Wed, Nov 25, 2015 at 2:42 PM, Tomasz Kuzemko  wrote:
>> > To ease on clients you can change osd_backfill_scan_min and
>> > osd_backfill_scan_max to 1. It's possible to change this online:
>> > ceph tell osd.\* injectargs '--osd_backfill_scan_min 1'
>> > ceph tell osd.\* injectargs '--osd_backfill_scan_max 1'
>> >
>> > 2015-11-24 16:52 GMT+01:00 Joe Ryner :
>> >>
>> >> Hi,
>> >>
>> >> Last night I upgraded my cluster from Centos 6.5 -> Centos 7.1 and in
>> >> the
>> >> process upgraded from Emperor -> Firefly -> Hammer
>> >>
>> >> When I finished I changed the crush tunables from
>> >> ceph osd crush tunables legacy -> ceph osd crush tunables optimal
>> >>
>> >> I knew this would cause data movement.  But the IO for my clients is
>> >> unacceptable.  Can any please tell what the best settings are for my
>> >> configuration.  I have 2 Dell R720 Servers and 2 Dell R730 servers.  I
>> >> have
>> >> 36 1TB SATA SSD Drives in my cluster.  The servers have 128 GB of RAM.
>> >>
>> >> Below is some detail the might help.  According to my calculations the
>> >> rebalance will take over a day.
>> >>
>> >> I would greatly appreciate some help on this.
>> >>
>> >> Thank you,
>> >>
>> >> Joe
>> >>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com