Re: Need some clarifications about the disperse feature

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ayelet,

I attached a spread sheet with some test data. It contains 3 sets of tests:

* Tests with 0-byte files (touch)
* Tests with 1-byte files
* Tests with 1 MB files

For each set, a creation (write), a recursive ls, a read and a rm is done over 11100 files distributed in 111 directories by 10 concurrent processes.

Each individual test is performed twice: one run with all caches (gluster and filesystem) cleared, and another run just after finishing the first one (that should have caches full). The only exception is rm, that cannot be done twice (the second test takes almost no time because there's nothing to delete).

x1 means a single brick volume.
x3 means a distributed volume with 3 bricks.
x1_r2 means a replica-2 volume.
x1_r3 means a replica-3 volume.
x1_d3_1 means a dispersed 3:1 volume.
x1_d5_1 means a dispersed 5:1 volume.
x1_d6_2 means a dispersed 6:2 volume.

These tests have been made in a virtualized environment, so I'm not very sure if the comparison is reliable or not. You should test it in your environment with your workload and see how it works. If possible, I would also be very interested to know your results :)

I think there's some room to improve read performance, that probably is the worst part, but this will be done after being sure it's really stable.

Xavi


On 11/26/2014 09:35 AM, Ayelet Shemesh wrote:
Thank you Xavi, it's very helpful (also to Atin).

Have you had any benchmarks of how much penalty in performance I should
expect for an intense reading using this feature? Naturaly I will test
in my specific environment, just want to know if there are any
benchmarks I can see for now.

Ayelet




On Tue, Nov 25, 2014 at 5:19 PM, Xavier Hernandez <xhernandez@xxxxxxxxxx
<mailto:xhernandez@xxxxxxxxxx>> wrote:

    Hi Ayelet,

    On 11/25/2014 02:41 PM, Ayelet Shemesh wrote:

        Hello Gluster experts,

        I have been using gluster for a small cluster for a few years
        now and I
        have a question regarding the new disperse feature, which is for
        me a
        much anticipated addition.

        *Suppose* I create a volume with a disperse set of 3, redundancy 1
        (let's call them A1, A2, A3) and then I add 3 more bricks to
        that volume
        (we'll call them B1, B2, B3).

        *First question* - which of the bricks will be the one carrying the
        redundancy data?


    In current implementation, there's no difference between data and
    redundancy. All bricks behave exactly equal and there isn't anyone
    more important than another. In a configuration with 3 bricks and
    redundancy 1, you can lose any brick and everything will continue
    working normally.


        *Second question* - If I have machines with faster disk - should I
        assign them to the data or the redundancy bricks? What should I
        expect
        the load to be on the redundancy machine in heavy read scenarios
        and in
        heavy write scenarios?


    As I said, there isn't a dedicated redundancy brick, so there's no
    benefit in assigning the fast disk to a specific brick.

    Read requests only need to be processed on N - R bricks (N = total
    number of bricks, R = redundancy). This means that in your
    configuration, each read will be sent to 2 bricks. If all bricks are
    alive and healthy, the disperse translator balances these reads
    among all nodes, giving 2/3 of the load to each brick.

    Write requests are processed by all bricks, so the load is the same
    on all of them.


        *Third question* - _does this require reading the entire data_
        of A1, A2
        and A3 by initiating a heal or another operation?


    Healing operations are on file basis. If only some files of A3 have
    been damaged, it will only read the corresponding data from A1 and
    A2, but not the entire contents of A1 and A2. To heal a file, all
    file contents are read.

        *4th question* (and most important for me) - I saw in the list
        that it
        is now a Distributed-Dispersed volume. I understand I can now
        lose, for
        example bricks A1 and B1 and still have my entire data intact.


    Correct

        Is this also correct for bricks from the same set, for example
        A1 and A2?


    No, each disperse set is independent and have the same redundancy.
    It's equivalent to a distributed replicated: if you lose both bricks
    of the same replica set, you will lose access to the data stored in
    that replica set.

        Or to put it in a more generic way - _does this create the exact
        same
        dispersed volume as if I created it originally with A1, A2 A3 B1
        B2 B3
        and a redundancy of 2?


    No. These are two different configurations. Both have the same
    effective capacity, but the probability of failure in the second
    case is several times lower than the first one (you can lose *any*
    two bricks without losing access to the data). However it's more
    expensive to grow the volume because you will need to add 6 new
    bricks at the same time, while with the first case you only need to
    add 3.

    Xavi


Attachment: comparison.ods
Description: application/vnd.oasis.opendocument.spreadsheet

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux