Hi, checking the actual value for osd_max_backfills at our cluster (0.94.9) I also made a config diff of the osd configuration (ceph daemon osd.0 config diff) and wondered why there's a displayed default of 10 which differs from the documented default at http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/. Did the default value changed since hammer? Regards Steffen >>> David Turner <drakonstein@xxxxxxxxx> schrieb am Dienstag, 9. Mai 2017 um 00:03: > WOW!!! Those are some awfully high backfilling settings you have there. > They are 100% the reason that your customers think your system is down. > You're telling each OSD to be able to have 20 backfill operations running > at the exact same time. I bet if you were watching iostat -x 1 on one of > your nodes before you inject those settings and then after you inject those > settings, the disk usage will go from a decent amount of 40-70% and jump > all the way up to 100% as soon as those settings are injected. > > When you are backfilling, you are copying data from one drive to another. > Each osd-max-backfill you set it to is another file it tries to copy at the > same time. These can be receiving data (writing to the disk) or moving > data off (reading from the disk followed by a delete). So by having 20 > backfills happening at a time, you are telling each disk to allow 20 files > to be written and/or read from it at the same time. What happens to a disk > when you are copying 20 large files to it at a time? all of them move > slower (a lot to do with disk thrashing having 20 threads all reading and > writing to different parts of the disk). > > What you want to find is the point where your disks are usually around > 80-90% utilized while backfilling, but not consistently 100%. The easy way > to do that is to increase your osd-max-backfills by 1 or 2 at a time until > you see it go too high, and then back off. I don't know many people that > go above 5 max backfills in a production cluster on spinning disks. > Usually the ones that do, do it temporarily while they know their cluster > isn't being utilized by customers much. > > Personally I have used osd-recover-threads ands osd-recover-max-active, > I've been able to tune my clusters only using osd-max-backfills. The lower > you leave these the longer the backfill will take, but the less impact your > customers will notice. I've found 3 to be a generally safe number if > customer IO is your priority, 5 works well if your customers can be ok with > it being slow (but still usable)... but all of this depends on your > hardware and software use-cases. Test it while watching your disk > utilizations and test your application while finding the right number for > your environment. > > Good Luck :) > > On Mon, May 8, 2017 at 5:43 PM Daniel Davidson <danield@xxxxxxxxxxxxxxxx> > wrote: > >> Our ceph system performs very poorly or not even at all while the >> remapping procedure is underway. We are using replica 2 and the >> following ceph tweaks while it is in process: >> >> 1013 ceph tell osd.* injectargs '--osd-recovery-max-active 20' >> 1014 ceph tell osd.* injectargs '--osd-recovery-threads 20' >> 1015 ceph tell osd.* injectargs '--osd-max-backfills 20' >> 1016 ceph -w >> 1017 ceph osd set noscrub >> 1018 ceph osd set nodeep-scrub >> >> After the remapping finishes, we set these back to default. >> >> Are any of these causing our problems or is there another way to limit >> the impact of the remapping so that users do not think the system is >> down while we add more storage? >> >> >> thanks, >> >> Dan >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> -- Klinik-Service Neubrandenburg GmbH Allendestr. 30, 17036 Neubrandenburg Amtsgericht Neubrandenburg, HRB 2457 Geschaeftsfuehrerin: Gudrun Kappich _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com