On Fri, 3 Dec 2010, Henry C Chang wrote: > > The primary changing nodes is more of a concern, though; the rest is small > > optimizations. Let's figure out why the mapping is changing like that! > > Hi Sage, > > Thanks for you explanation. > > I use 'ceph pg dump -o -' to dump the pg stats before and after osd2 > got down and out. > As shown in the attached files, pg 3.1p2 changed from [2,0] to [1.0]. Oh, that explains it: the 'p' placement groups are ones in which the primary is 'forcefed' into the crush algorithm. The way things behave if that choice isn't usable doesn't behave as well as the unconstrianed placement. Those PGs are not used unless you call the setlayout ioctl on a file and set the preferred osd. For normal users these pgs should all be empty. > What I meant to say (sorry for my bad English and bad example) is: > Since osd0 is the replica before change, it should be able to shorten > the client's waiting time (at least for read) if we can choose osd0 as > the primary. Right. If you see this one non-'p' placement groups, definitely let us know! sage