Hi Gregory, Le 21/10/2014 19:39, Gregory Farnum a écrit : > On Tue, Oct 21, 2014 at 10:15 AM, Lionel Bouton <lionel+ceph@xxxxxxxxxxx> wrote: >> [...] >> Any thought? Is it based on wrong assumptions? Would it prove to be a >> can of worms if someone tried to implement it? > Yeah, there's one big thing you're missing: we strictly order reads > and writes to an object, and the primary is the serialization point. Of course... I should have anticipated this. As you explain later (thanks for the detailed explanation by the way) implementing redirect would need a whole new way of coordinating accesses. I'm not yet familiar with Ceph internals but I suspect this would mutate Ceph in another beast entirely... > If we were to proxy reads to another replica it would be easy enough > for the primary to continue handling the ordering, but if it were just > a redirect it wouldn't be able to do so (the primary doesn't know when > the read is completed, allowing it to start a write). Setting up the > proxy of course requires a lot more code, but more importantly it's > more resource-intensive on the primary, so I'm not sure if it's worth > it. :/ Difficult to know without real-life testing. It's a non-trivial CPU/network/disk capacity trade-off... > The "primary affinity" value we recently introduced is designed to > help alleviate persistent balancing problems around this by letting > you reduce how many PGs an OSD is primary for without changing the > location of the actual data in the cluster. But dynamic updates to > that aren't really feasible either (it's a map change and requires > repeering). [...] I forgot about this. Thanks for the reminder: this definitely would help in some of my use cases where the load is predictable over a relatively long period. I'll have to dig into the sources one day, I can't stop wondering about various aspects of the internals since I began using Ceph (I've worked on the code of distributed systems on several occasions and I've always been hooked easily)... Best regards, Lionel Bouton _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com