-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 I don't think the script will help our situation as it is just setting osd_max_backfill from 1 to 0. It looks like that change doesn't go into effect until after it finishes the PG. It would be nice if backfill/recovery would skip the journal, but there would have to be some logic if the obect was changed as it was being replicated. Maybe just a log in the journal that the objects are starting restore and finished restore, then the journal flush knows if it needs to commit the write? - ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Sep 10, 2015 at 3:33 PM, Lionel Bouton wrote: > Le 10/09/2015 22:56, Robert LeBlanc a écrit : >> We are trying to add some additional OSDs to our cluster, but the >> impact of the backfilling has been very disruptive to client I/O and >> we have been trying to figure out how to reduce the impact. We have >> seen some client I/O blocked for more than 60 seconds. There has been >> CPU and RAM head room on the OSD nodes, network has been fine, disks >> have been busy, but not terrible. > > It seems you've already exhausted most of the ways I know. When > confronted to this situation, I used a simple script to throttle > backfills (freezing them, then re-enabling them), this helped our VMs at > the time but you must be prepared for very long migrations and some > experimentations with different schedulings. You simply pass it the > number of seconds backfills are allowed to proceed then the number of > seconds during them they pause. > > Here's the script, which should be self-explanatory: > http://pastebin.com/sy7h1VEy > > something like : > > ./throttler 10 120 > > limited the impact on our VMs (the idea being that during the 10s the > backfill won't be able to trigger filestore syncs and the 120s pause > will allow the filestore syncs to remove "dirty" data from the journals > without interfering too much with concurrent writes). > I believe you must have a high filestore sync value to hope to benefit > from this (we use 30s). > At the very least the long pause will eventually allow VMs to move data > to disk regularly instead of being nearly frozen. > > Note that your pgs are more than 10G each, if the OSDs can't stop a > backfill before finishing transferring the current pg this won't help (I > assume backfills go through journals and they probably won't be able to > act as write-back caches anymore as one PG will be enough to fill them up). > > Best regards, > > Lionel -----BEGIN PGP SIGNATURE----- Version: Mailvelope v1.0.2 Comment: https://www.mailvelope.com wsFcBAEBCAAQBQJV8gIbCRDmVDuy+mK58QAAGOgQAMLGgbrgsHF2n9ZVGxol 4X1jezsXAjrPc19U38u8JLv1kVsSal6MBh+uSt1O6RnHWT+fMYOh1knPSYgl aWvjYP9yJ+yVnWtuz5YxRI45WJ8XvJ8V7FPUYLRxSId7IX4EToupUf30AjdD KZfjfLgpNKz98UMmFBRporTsvIX1cHGVtN7tiqhAtRPQYMhgXCA2pyqUFkhJ H86287DZnnXrlDOsT7e+0Gel+eYKjUF7QsUYKCUMVx1Mj5oAm9gC0ZIm+icS YIeUOzIO8LGV3YXHWmUQClzV9w0uQ7CBvvLoCBbFjvQOgQizsOUpgXv818Fr Fp6ihpoNKDGaQ7lylLmT8Yu4Rf+JFQn3xfLBE0lPg41CkI8/MQIQsyYLlr5D Pdd1msxy14Y1lvRbwsNnn+ICzvz/YhbuwtTSVFT+EnRSwc+fkRhKi1ipB1Zx 5zyvVI0ge8SRIelXYfueBmC/LCxjYp9ntfSSQujxlVejgUCxmG3HTd3TvBcn SdyA7F5sQOpOSK+Hc/eRGwxYgWq4r/jd3TJQt6F2qRHi/nx2K4oFFv6r6SgT zkDdZewlE+kVx8GkKnB4h1xI3DhGsIyPaS7rCSqy1DrMmxUSFFGgYto7umok s5cpOeq35owbiv9Da8t3MCzoZvYfhuXCitWn+Jl69v5vfGHm6ha4A59mcigz S9DN =6xla -----END PGP SIGNATURE----- _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com