Preamble: you might want to read the decent formatted version of this mail at: > https://gist.github.com/daniel-j-h/2daae2237bb21596c97d Motivation ---------- Recently I was wondering how Ceph (i.e. Rados in particular) maps object to Placement Groups (PGs). It basically works like the following: pg = hash(object) osd = crush(pg) By hashing the object's name (not the content!) Ceph determines the PG. For assigning PGs to OSDs, the CRUSH algorithm is used. But what happens if the hash function always returns the same value, i.e. what happens on hash collisions? Does Ceph really map objects to the same PG then? I asked this on the IRC Channel and was a bit surprised as I got the response, that if Ceph really maps all objects to a single PG I'm simply out of luck: > with enough PG and enough objects of reasonable sizes, it should yield a decent data distribution Confirming, that the hash function is indeed responsible for the distribution of objects to PGs. So, what could possibly go wrong? Let's try this! Setting up the environment -------------------------- I modified the librados usage example (src/examples/librados/hello_world.cc) to write a objects with specific names: std::string object_name("\x41\x5b\xd2\xc2\xc3"); // bear with me here //std::string object_name("\x41\x1e\xf8\x5e\x22"); //std::string object_name("\x01\x01\x01\x17\x2e"); After setting up a local cluster via the vstart.sh tool and writing those three objects, executing ./ceph pg dump | less -S indeed shows our three **objects getting mapped to the same PG**! Collisions ---------- How is it possible to come up with such colliding names? It's actually quite simple: Ceph defines two hash functions (src/common/ceph_hash.cc) from which the so called rjenkins hash function is used by default. (Note: src/crush/hash.c actually implements the rjenkins function again -- why?) The hash function's implementation is deterministic in the sense that the output only depends on it's input. No random or setup-specific seed is used. What implications does this have? Once we compute collisions, we're able to force the object mapping on **every rados object store out there**. And because the hash function's output is only 32 bits wide, collisions are not that hard to find at all! Here are a few if you're following along and want to test your setup: | hash (u32) |byte sequence (hex) | | -----------|--------------------| | 381933444 | 41 5b d2 c2 c3 | | 381933444 | 41 1e f8 5e 22 | | 381933444 | 1 1 1 17 2e | | 381933444 | 1 a4 34 f6 c0 | | 381933444 | 42 ca 78 4f 47 | | 381933444 | 82 e8 ea fe 16 | Implications ------------- I did this as a proof-of-concept in a few hours but I'm still not sure about the full implications. I'm able to force object mappings. Can this be used as a DOS attack vector? Maybe. I looked through the RadosGW source, in order to find out how object names are generated there. I only found the second hash function implementation getting used -- but I'm not familiar with the source to say more about it at this time. Open questions -------------- I'd like to know more about what an impact this could have. In particular: * How do higher-level API's (e.g. RadosGW) determine the object's name? Can this be influenced or set by the user? * Is it possible to introduce some randomness, in order to mitigate the explained scenario? Maybe on a per-cluster basis or deduced from some internal state. (Note: SipHash is doing something similar) Summary ------- Ceph's object mapping depends on the rjenkins hash function. It's possible to force Ceph into mapping all objects to a single PG. Please discuss! Thanks, Daniel J. Hofmann -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html