Re: crush multi-pick anomaly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 22/11/2016, Sage Weil wrote:
> Hi Adam,
>
> Sam had a suggestion about the CRUSH weight anomaly[1]. Instead of
> adjusting the weight for a given bucket based on an expected num_rep
> value, instead we could store a vector of weight values for every bucket
> in the tree for a range of num_reps (2..15, or whatever range is
> appropriate given the min_size/max_size values for the rules).  In general
> the tools will show the normal weight (which is a sum of the children) but
> we'd also keep the adjusted values for any given num_rep and use those for
> the actual choose.
>
> What do you think?

I thought of something along those lines, though it makes me a bit
uneasy. Right now, if I have a bunch of objects stored on a bunch of
hosts and we increase the replication count, objects migrate are
copied to the NEW hosts but don't migrate between hosts. (This is part
of the RUSH family monotonicity guarantee.)

This seems like it might catch users by surprise and result in
undesired behavior. Having Ceph operate this way would violate the
/expectations/ people have from looking at a description of our
algorithm.

Rather than having CRUSH automagically pick the distribution based on
the replication count, could we make it more explicit? I'm not sure
what the best form would be. We might have 'auxiliary weightings' in
straw2 and list buckets and a way for a CRUSH rule to select one of
the alternates. That way we wouldn't have to replicate the entire
hierarchy of devices, and people could 'opt in' explicitly.

That might be a bit too fiddly, but I think you get the idea. I'm very
uneasy about having 'magic replication count' behavior sneak up on
people.

-- 
Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux