Re: crush: straw is dead, long live straw2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/08/2014 03:48 PM, Sage Weil wrote:
  - Use floating point log function.  This is problematic for the kernel
implementation (no floating point), is slower than the lookup table, and
makes me worry about whether the floating point calculations are
consistent across architectures (the mapping has to be completely
deterministic).

This also won't work for QEMU, which may not restore floating point
modes while doing I/O (leading to crashes like http://tracker.ceph.com/issues/3521).

  - Use some approximation of the logarithm with fixed-point arithmetic.
I spent a bit of time search and found

  http://www.researchgate.net/publication/230668515_A_fixed-point_implementation_of_the_natural_logarithm_based_on_a_expanded_hyperbolic_CORDIC_algorithm

but it also involves a couple of lookup tables and (judging by figure 1)
is probably slower than the 256 KB table.

We could probably expand out taylor series and get something half decent,
but any precision we lose will translate to OSD utilizations that are off
from the input weights--something that costs real disk space and we
probably want to avoid.

  - Stick with the 128KB lookup table.  Performance sensitive clients can
precalculate all PG mappings when they get OSDMap updates if they are
concerned about leave the CRUSH calculation in the IO path.

Any other suggestions?

It could be a lookup table generated (or chosen) at runtime to a
particular size configured by the crushmap, but that's probably more
complex than it's worth.

Here is my implementation of the lookup-table based approach:

	https://github.com/ceph/ceph/commit/1d462c9f6a262de3a51533193ed2dff34c730727
	https://github.com/ceph/ceph/commits/wip-crush-straw2

You'll notice that included in there is a unit test that verifies that
changing a single item's weight does not effect the distribution of inputs
among other items in the bucket.  :)

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux