Re: Question/idea about performance problems with a few overloaded OSDs

Lionel Bouton <lionel+ceph@xxxxxxxxxxx> · Tue, 21 Oct 2014 20:06:08 +0200

Hi Gregory,

Le 21/10/2014 19:39, Gregory Farnum a écrit :
> On Tue, Oct 21, 2014 at 10:15 AM, Lionel Bouton <lionel+ceph@xxxxxxxxxxx> wrote:
>> [...]
>> Any thought? Is it based on wrong assumptions? Would it prove to be a
>> can of worms if someone tried to implement it?
> Yeah, there's one big thing you're missing: we strictly order reads
> and writes to an object, and the primary is the serialization point.

Of course... I should have anticipated this. As you explain later
(thanks for the detailed explanation by the way) implementing redirect
would need a whole new way of coordinating accesses. I'm not yet
familiar with Ceph internals but I suspect this would mutate Ceph in
another beast entirely...

> If we were to proxy reads to another replica it would be easy enough
> for the primary to continue handling the ordering, but if it were just
> a redirect it wouldn't be able to do so (the primary doesn't know when
> the read is completed, allowing it to start a write). Setting up the
> proxy of course requires a lot more code, but more importantly it's
> more resource-intensive on the primary, so I'm not sure if it's worth
> it. :/

Difficult to know without real-life testing. It's a non-trivial
CPU/network/disk capacity trade-off...

> The "primary affinity" value we recently introduced is designed to
> help alleviate persistent balancing problems around this by letting
> you reduce how many PGs an OSD is primary for without changing the
> location of the actual data in the cluster. But dynamic updates to
> that aren't really feasible either (it's a map change and requires
> repeering). [...]

I forgot about this. Thanks for the reminder: this definitely would help
in some of my use cases where the load is predictable over a relatively
long period.

I'll have to dig into the sources one day, I can't stop wondering about
various aspects of the internals since I began using Ceph (I've worked
on the code of distributed systems on several occasions and I've always
been hooked easily)...

Best regards,

Lionel Bouton
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com