Re: EC planning

Xavier Hernandez <xhernandez@xxxxxxxxxx> · Wed, 14 Oct 2015 13:03:54 +0200

Hi Serkan,

On 13/10/15 15:53, Serkan Çoban wrote:
Hi Xavier and thanks for your answers.

Servers will have 26*8TB disks.I don't want to loose more than 2 disk
for raid,
so my options are HW RAID6 24+2 or 2 * HW RAID5 12+1,

A RAID5 of more than 8-10 disks is normally considered unsafe because 
the probability of a second drive failure while reconstructing another 
failed drive is considerably high. The same happens with a RAID6 of more 
than 16-20 disks.

in both cases I can create 2 bricks per server using LVM and use one brick
per server to create two distributed-disperse volumes. I will test those
configurations when servers arrive.

I'm not sure if I understand you. Are you saying you will create two 
separate gluster volumes or you will add both bricks to the same 
distributed-dispersed volume ?

I can go with 8+1 or 16+2, will make tests when servers arrive. But 8+2 will
be too much, I lost nearly %25 space in this case.

For the client count, this cluster will get backups from hadoop nodes
so there will be 750-1000 clients at least which sends data at the same
time.
Can 16+2 * 3 = 54 gluster nodes handle this or should I increase node count?

In this case I think it would be better to increase the number of 
bricks, otherwise you may have some performance hit to serve all these 
clients.

One possibility is to get rid of the server RAID and use each disk as a 
single brick. This way you can create 26 bricks per server and assign 
each one to a different disperse set. A big distributed-dispersed volume 
balances I/O load between bricks better. Note that RAID configurations 
have a reduction in the available number of IOPS. For sequential writes, 
this is not so bad, but if you have many clients accessing the same 
bricks, you will see many random accesses even if clients are doing 
sequential writes. Caching can alleviate this, but if you want to 
sustain a throughput of 2-3 GB/s, caching effects are not so evident.

Without RAID you could use a 16+2 or even a 16+3 dispersed volume. This 
gives you a good protection and increased storage.

Xavi

I will check the parameters you mentioned.

Serkan

On Tue, Oct 13, 2015 at 1:43 PM, Xavier Hernandez <xhernandez@xxxxxxxxxx
<mailto:xhernandez@xxxxxxxxxx>> wrote:

    +gluster-users

    On 13/10/15 12:34, Xavier Hernandez wrote:

        Hi Serkan,

        On 12/10/15 16:52, Serkan Çoban wrote:

            Hi,

            I am planning to use GlusterFS for backup purposes. I write
            big files
            (>100MB) with a throughput of 2-3GB/sn. In order to gain
            from space we
            plan to use erasure coding. I have some questions for EC and
            brick
            planning:
            - I am planning to use 200TB XFS/ZFS RAID6 volume to hold
            one brick per
            server. Should I increase brick count? is increasing brick
            count also
            increases performance?

        Using a distributed-dispersed volume increases performance. You can
        split each RAID6 volume into multiple bricks to create such a
        volume.
        This is because a single brick process cannot achieve the maximum
        throughput of the disk, so creating multiple bricks improves this.
        However having too many bricks could be worse because all
        request will
        go to the same filesystem and will compete between them in your
        case.

        Another thing to consider is the size of the RAID volume. A
        200TB RAID
        will require *a lot* of time to reconstruct in case of failure
        of any
        disk. Also, a 200 TB RAID means you need almost 30 8TB disks. A
        RAID6 of
        30 disks is quite fragile. Maybe it would be better to create
        multiple
        RAID6 volumes, each with 18 disks at most (16+2 is a good and
        efficient
        configuration, specially for XFS on non-hardware raids). Even in
        this
        configuration, you can create multiple bricks in each RAID6 volume.

            - I plan to use 16+2 for EC. Is this a problem? Should I
            decrease this
            to 12+2 or 10+2? Or is it completely safe to use whatever we
            want?

        16+2 is a very big configuration. It requires much computation
        power and
        forces you to grow (if you need to grow the gluster volume at some
        point) in multiples of 18 bricks.

        Considering that you are already using a RAID6 in your servers,
        what you
        are really protecting with the disperse redundancy is the
        failure of the
        servers themselves. Maybe a 8+1 configuration could be enough
        for your
        needs and requires less computation. If you really need
        redundancy 2,
        8+2 should be ok.

        Using values that are not a power of 2 has a theoretical impact
        on the
        performance of the disperse volume when applications write
        blocks whose
        size is a multiple of a power of 2 (which is the most normal
        case). This
        means that it's possible that a 10+2 performs worse than a 8+2.
        However
        this depends on many other factors, some even internal to
        gluster, like
        caching, meaning that the real impact could be almost negligible
        in some
        cases. You should test it with your workload.

            - I understand that EC calculation is performed on client
            side, I want
            to know if there are any benchmarks how EC affects CPU
            usage? For
            example each 100MB/sn traffic may use 1CPU core?

        I don't have a detailed measurement of CPU usage related to
        bandwidth,
        however we have made some tests that seem to indicate that the CPU
        overhead caused by disperse is quite small for a 4+2
        configuration. I
        don't have access to this data right now. When I have it, I'll
        send it
        to you.

        I will also try to do some tests with a 8+2 and 16+2
        configuration to
        see the difference.

            - Is client number affect cluster performance? Is there any
            difference
            if I connect 100 clients each writing with 20-30MB/s to
            cluster vs 1000
            clients each writing 2-3MB/s?

        Increasing the number of clients improves performance however I
        wont' go
        over 100 clients as this could have a negative impact on performance
        caused by the overhead of managing all of them. In our tests, the
        maximum performance if obtained with ~8 parallel clients (if my
        memory
        doesn't fail).

        You will also probably want to tweak some volume parameters, like
        server.event-threads, client.event-threads,
        performance.client-io-threads and server.outstanding-rpc-limit to
        increase performance.

        Xavi

            Thank you for your time,
            Serkan

            _______________________________________________
            Gluster-users mailing list
            Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
            http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users