Hi Jonathan, Anthony and Steve, Thanks very much for your valuable advise and suggestions! MJ On 03/21/2017 08:53 PM, Jonathan Proulx wrote:
If it took 7hr for one drive you probably already done this (or defaults are for low impact recovery) but before doing anything you want to besure you OSD settings max backfills, max recovery active, recovery sleep (perhaps others?) are set such that revovery and backfilling doesn't overwhelm produciton use. look through the recovery section of http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/ This is important because if you do have a failure and thus unplanned recovery you want to have this tuned to your prefered balance of quick performance or quick return to full redundancy. That said my theory is to add things in as balanced a way as possible to minimize moves. What that means depends on your crush map. For me I have 3 "racks" and all (most) of my pools are 3x replication so each object should have one copy in each rack. I've only expanded once, but what I did was to add three servers. One to each 'rack'. I set them all 'in' at the same time which should have minimized movement between racks and moved obbjects from other servers' osds in the same rack onto the osds in the new server. This seemed to work well for me. In your case this would mean adding drives to all servers at once in a balanced way. That would prevent copy across servers since the balance amoung servers wouldn't change. You could do one disk on each server or load them all up and trust recovery settings to keep the thundering herd in check. As I said I've only gone through one expantion round and while this theory seemed to work out for me hopefully someone with deeper knowlege can confirm or deny it's general applicability. -Jon On Tue, Mar 21, 2017 at 07:56:57PM +0100, mj wrote: :Hi, : :Just a quick question about adding OSDs, since most of the docs I can find :talk about adding ONE OSD, and I'd like to add four per server on my :three-node cluster. : :This morning I tried the careful approach, and added one OSD to server1. It :all went fine, everything rebuilt and I have a HEALTH_OK again now. It took :around 7 hours. : :But now I started thinking... (and that's when things go wrong, therefore :hoping for feedback here....) : :The question: was I being stupid to add only ONE osd to the server1? Is it :not smarter to add all four OSDs at the same time? : :I mean: things will rebuild anyway...and I have the feeling that rebuilding :from 4 -> 8 OSDs is not going to be much heavier than rebuilding from 4 -> 5 :OSDs. Right? : :So better add all new OSDs together on a specific server? : :Or not? :-) : :MJ :_______________________________________________ :ceph-users mailing list :ceph-users@xxxxxxxxxxxxxx :http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com