Re: read performance, separate client CRUSH maps or limit osd read access from each client

Vlad Kopylov <vladkopy@xxxxxxxxx> · Mon, 26 Nov 2018 11:08:38 -0500

I see. Thank you Greg.

Ultimately leading to some kind of multi-primary OSD/MON setup, which
will most likely add lookup overheads. Though might be a reasonable
trade off for network distributed setups.
Good feature for major version.

With Glusterfs I solved it, funny as it sounds, by writing tiny fuse
fs as overlay, making all reads locally and writes to cluster. Having
that with Glusterfs there are real files on each node for local reads.

Wish there was a way to file-access local OSD so I can use same approach.

-vlad
On Mon, Nov 26, 2018 at 8:47 AM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>
> On Tue, Nov 20, 2018 at 9:50 PM Vlad Kopylov <vladkopy@xxxxxxxxx> wrote:
>>
>> I see the point, but not for the read case:
>>   no overhead for just choosing or let Mount option choose read replica.
>>
>> This is simple feature that can be implemented, that will save many
>> people bandwidth in really distributed cases.
>
>
> This is actually much more complicated than it sounds. Allowing reads from the replica OSDs while still routing writes through a different primary OSD introduces a great many consistency issues. We've tried adding very limited support for this read-from-replica scenario in special cases, but have had to roll them all back due to edge cases where they don't work.
>
> I understand why you want it, but it's definitely not a simple feature. :(
> -Greg
>
>>
>>
>> Main issue this surfaces is that RADOS maps ignore clients - they just
>> see cluster. There should be the part of RADOS map unique or possibly
>> unique for each client connection.
>>
>> Lets file feature request?
>>
>> p.s. honestly, I don't see why anyone would use ceph for local network
>> RAID setups, there are other simple solutions out there even in your
>> own RedHat shop.
>> On Tue, Nov 20, 2018 at 8:38 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
>> >
>> > You either need to accept that reads/writes will land on different data centers, primary OSD for a given pool is always in the desired data center, or some other non-Ceph solution which will have either expensive, eventual, or false consistency.
>> >
>> > On Fri, Nov 16, 2018, 10:07 AM Vlad Kopylov <vladkopy@xxxxxxxxx wrote:
>> >>
>> >> This is what Jean suggested. I understand it and it works with primary.
>> >> But what I need is for all clients to access same files, not separate sets (like red blue green)
>> >>
>> >> Thanks Konstantin.
>> >>
>> >> On Fri, Nov 16, 2018 at 3:43 AM Konstantin Shalygin <k0ste@xxxxxxxx> wrote:
>> >>>
>> >>> On 11/16/18 11:57 AM, Vlad Kopylov wrote:
>> >>> > Exactly. But write operations should go to all nodes.
>> >>>
>> >>> This can be set via primary affinity [1], when a ceph client reads or
>> >>> writes data, it always contacts the primary OSD in the acting set.
>> >>>
>> >>>
>> >>> If u want to totally segregate IO, you can use device classes:
>> >>>
>> >>> Just create osds with different classes:
>> >>>
>> >>> dc1
>> >>>
>> >>>    host1
>> >>>
>> >>>      red osd.0 primary
>> >>>
>> >>>      blue osd.1
>> >>>
>> >>>      green osd.2
>> >>>
>> >>> dc2
>> >>>
>> >>>    host2
>> >>>
>> >>>      red osd.3
>> >>>
>> >>>      blue osd.4 primary
>> >>>
>> >>>      green osd.5
>> >>>
>> >>> dc3
>> >>>
>> >>>    host3
>> >>>
>> >>>      red osd.6
>> >>>
>> >>>      blue osd.7
>> >>>
>> >>>      green osd.8 primary
>> >>>
>> >>>
>> >>> create 3 crush rules:
>> >>>
>> >>> ceph osd crush rule create-replicated red default host red
>> >>>
>> >>> ceph osd crush rule create-replicated blue default host blue
>> >>>
>> >>> ceph osd crush rule create-replicated green default host green
>> >>>
>> >>>
>> >>> and 3 pools:
>> >>>
>> >>> ceph osd pool create red 64 64 replicated red
>> >>>
>> >>> ceph osd pool create blue 64 64 replicated blue
>> >>>
>> >>> ceph osd pool create blue 64 64 replicated green
>> >>>
>> >>>
>> >>> [1]
>> >>> http://docs.ceph.com/docs/master/rados/operations/crush-map/#primary-affinity'
>> >>>
>> >>>
>> >>>
>> >>> k
>> >>>
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users@xxxxxxxxxxxxxx
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com