Thank you Ilya for the detailed explanation.
Regards,
Ridwan Noel
On Mon, Oct 24, 2016 at 6:30 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
On Fri, Oct 21, 2016 at 10:35 PM, Ridwan Rashid Noel
<ridwan064@xxxxxxxxx> wrote:
> Thank you for your reply Greg. Is there any detailed resource that describe
> about how the primary affinity changing works? All I got from searching was
> one paragraph from the documentation.
No, probably nothing detailed. There isn't much to it though.
Normally, the first up OSD in the raw OSD set returned by CRUSH is the
primary. primary-affinity then sets the probability with which the
mapping code will choose some other OSD from the raw OSD set to serve
as primary. (Your quote is technically incorrect, as this adjustment
happens after CRUSH is run, not within CRUSH.)
You can think of it as a step in the mapping process, which will
potentially reorder OSDs in the OSD set. Say your CRUSH output is
[osd0, osd1, osd2] and all of them are up. If osd0's primary-affinity
is 1, osd0 will be the primary with p=1 (i.e. always). If osd0's
primary-affinity is 0.75, either osd1 or osd2 will take its place with
p=0.25 and the resulting OSD set will be one of:
[osd0, osd1, osd2], p=0.75
[osd1, osd0, osd2] or [osd2, osd0, osd1], p=0.25
Once you've adjusted the primary-affinity, this choice is fixed until
you either re-adjust the primary-affinity or the mapping is affected in
some other way (e.g. OSD goes down, etc).
Another way to look at it is as follows. If, according to CRUSH +
current osdmap, osd0 gets to be the primary for 100 PGs, 25 of those
PGs will be rejected and get another primary, leaving osd0 with 75 PGs.
Why would you want to do this? To quickly redistribute the read
workload away from overloaded OSDs (writes go to all OSDs in the PG,
while reads are served by the primary).
Thanks,
Ilya
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com