Hi Xavier, >I'm not sure if I understand you. Are you saying you will create two separate gluster volumes or you will add both bricks to the same distributed-dispersed volume ? Is adding more than one brick from same host to a disperse gluster volume recommended? I meant two different gluster volume. If I add two bricks from same server to same dispersed volume and lets say it is 8+1 configuration, then loosing one host will bring down the volume right? >One possibility is to get rid of the server RAID and use each disk as a single brick. This way you can create 26 bricks per server and assign each one to a different disperse set. A big distributed-dispersed volume balances I/O load >between bricks better. Note that RAID configurations have a reduction in the available number of IOPS. For sequential writes, this is not so bad, but if you have many clients accessing the same bricks, you will see many random ?>accesses even if clients are doing sequential writes. Caching can alleviate this, but if you want to sustain a throughput of 2-3 GB/s, caching effects are not so evident. I can create 26 JBOD disks and use them as bricks but is this recommended? By using 50 servers, brick count will be 1300, is this not a problem? Can you explain the configuration a bit more? For example by using 16+2, 26 brick per server and 54 servers total. In the end I only want one gluster volume and protection for 2 host failure. Also in this case disk failures will be handled by gluster I hope this don't bring more problems. But I will also test this configuration when I get the servers.. Serkan On Wed, Oct 14, 2015 at 2:03 PM, Xavier Hernandez <xhernandez@xxxxxxxxxx> wrote: > Hi Serkan, > > On 13/10/15 15:53, Serkan Çoban wrote: >> >> Hi Xavier and thanks for your answers. >> >> Servers will have 26*8TB disks.I don't want to loose more than 2 disk >> for raid, >> so my options are HW RAID6 24+2 or 2 * HW RAID5 12+1, > > > A RAID5 of more than 8-10 disks is normally considered unsafe because the > probability of a second drive failure while reconstructing another failed > drive is considerably high. The same happens with a RAID6 of more than 16-20 > disks. > >> in both cases I can create 2 bricks per server using LVM and use one brick >> per server to create two distributed-disperse volumes. I will test those >> configurations when servers arrive. > > > I'm not sure if I understand you. Are you saying you will create two > separate gluster volumes or you will add both bricks to the same > distributed-dispersed volume ? > >> >> I can go with 8+1 or 16+2, will make tests when servers arrive. But 8+2 >> will >> be too much, I lost nearly %25 space in this case. >> >> For the client count, this cluster will get backups from hadoop nodes >> so there will be 750-1000 clients at least which sends data at the same >> time. >> Can 16+2 * 3 = 54 gluster nodes handle this or should I increase node >> count? > > > In this case I think it would be better to increase the number of bricks, > otherwise you may have some performance hit to serve all these clients. > > One possibility is to get rid of the server RAID and use each disk as a > single brick. This way you can create 26 bricks per server and assign each > one to a different disperse set. A big distributed-dispersed volume balances > I/O load between bricks better. Note that RAID configurations have a > reduction in the available number of IOPS. For sequential writes, this is > not so bad, but if you have many clients accessing the same bricks, you will > see many random accesses even if clients are doing sequential writes. > Caching can alleviate this, but if you want to sustain a throughput of 2-3 > GB/s, caching effects are not so evident. > > Without RAID you could use a 16+2 or even a 16+3 dispersed volume. This > gives you a good protection and increased storage. > > Xavi > >> >> I will check the parameters you mentioned. >> >> Serkan >> >> On Tue, Oct 13, 2015 at 1:43 PM, Xavier Hernandez <xhernandez@xxxxxxxxxx >> <mailto:xhernandez@xxxxxxxxxx>> wrote: >> >> +gluster-users >> >> >> On 13/10/15 12:34, Xavier Hernandez wrote: >> >> Hi Serkan, >> >> On 12/10/15 16:52, Serkan Çoban wrote: >> >> Hi, >> >> I am planning to use GlusterFS for backup purposes. I write >> big files >> (>100MB) with a throughput of 2-3GB/sn. In order to gain >> from space we >> plan to use erasure coding. I have some questions for EC and >> brick >> planning: >> - I am planning to use 200TB XFS/ZFS RAID6 volume to hold >> one brick per >> server. Should I increase brick count? is increasing brick >> count also >> increases performance? >> >> >> Using a distributed-dispersed volume increases performance. You >> can >> split each RAID6 volume into multiple bricks to create such a >> volume. >> This is because a single brick process cannot achieve the maximum >> throughput of the disk, so creating multiple bricks improves this. >> However having too many bricks could be worse because all >> request will >> go to the same filesystem and will compete between them in your >> case. >> >> Another thing to consider is the size of the RAID volume. A >> 200TB RAID >> will require *a lot* of time to reconstruct in case of failure >> of any >> disk. Also, a 200 TB RAID means you need almost 30 8TB disks. A >> RAID6 of >> 30 disks is quite fragile. Maybe it would be better to create >> multiple >> RAID6 volumes, each with 18 disks at most (16+2 is a good and >> efficient >> configuration, specially for XFS on non-hardware raids). Even in >> this >> configuration, you can create multiple bricks in each RAID6 >> volume. >> >> - I plan to use 16+2 for EC. Is this a problem? Should I >> decrease this >> to 12+2 or 10+2? Or is it completely safe to use whatever we >> want? >> >> >> 16+2 is a very big configuration. It requires much computation >> power and >> forces you to grow (if you need to grow the gluster volume at some >> point) in multiples of 18 bricks. >> >> Considering that you are already using a RAID6 in your servers, >> what you >> are really protecting with the disperse redundancy is the >> failure of the >> servers themselves. Maybe a 8+1 configuration could be enough >> for your >> needs and requires less computation. If you really need >> redundancy 2, >> 8+2 should be ok. >> >> Using values that are not a power of 2 has a theoretical impact >> on the >> performance of the disperse volume when applications write >> blocks whose >> size is a multiple of a power of 2 (which is the most normal >> case). This >> means that it's possible that a 10+2 performs worse than a 8+2. >> However >> this depends on many other factors, some even internal to >> gluster, like >> caching, meaning that the real impact could be almost negligible >> in some >> cases. You should test it with your workload. >> >> - I understand that EC calculation is performed on client >> side, I want >> to know if there are any benchmarks how EC affects CPU >> usage? For >> example each 100MB/sn traffic may use 1CPU core? >> >> >> I don't have a detailed measurement of CPU usage related to >> bandwidth, >> however we have made some tests that seem to indicate that the CPU >> overhead caused by disperse is quite small for a 4+2 >> configuration. I >> don't have access to this data right now. When I have it, I'll >> send it >> to you. >> >> I will also try to do some tests with a 8+2 and 16+2 >> configuration to >> see the difference. >> >> - Is client number affect cluster performance? Is there any >> difference >> if I connect 100 clients each writing with 20-30MB/s to >> cluster vs 1000 >> clients each writing 2-3MB/s? >> >> >> Increasing the number of clients improves performance however I >> wont' go >> over 100 clients as this could have a negative impact on >> performance >> caused by the overhead of managing all of them. In our tests, the >> maximum performance if obtained with ~8 parallel clients (if my >> memory >> doesn't fail). >> >> You will also probably want to tweak some volume parameters, like >> server.event-threads, client.event-threads, >> performance.client-io-threads and server.outstanding-rpc-limit to >> increase performance. >> >> Xavi >> >> >> Thank you for your time, >> Serkan >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx> >> http://www.gluster.org/mailman/listinfo/gluster-users >> >> > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users