Hi Alastair, the numbers I'm giving correspond to an Intel Xeon E5-2630L 2 GHz CPU. On 08/05/17 22:44, Alastair Neil wrote:
so the bottleneck is that computations with 16x20 matrix require ~4 times the cycles?
This is only part of the problem. A 16x16 matrix can be processed at a rate of 400 MB/s, so a single fragment on a brick will be processed at 400/16 = 25 MB/s which is not the case.
Note that the fragment on a brick is only part of a whole file, so 25 MB/s on a brick means that the real file is being processed at 400 MB/s.
It seems then that there is ample room for improvement, as there are many linear algebra packages out there that scale better than O(nxm).
That's true for much bigger matrices where synchronization time between threads is negligible compared to the computation time. In this case the algorithm is highly optimized and any attempt to distribute the computation would be worse.
Note that the current algorithm can rebuild the original data at a rate of ~5 CPU cycles per byte with a 16x16 configuration without any SIMD extension. With SSE or AVX this goes down to near 1 cycle per byte.
In this case the best we can do is to do more than one heal in parallel. This will use more than one core to compute the matrices, getting an overall better performance.
Is the healing time dominated by the EC compute time? If Serkan saw a hard 2x scaling then it seems likely.
Partially. The computation speed is doubled on a 8+2 configuration, but also the number of IOPS is halved, and each one is of twice the size of a 16+4 operation. This means that we only have half of the latencies when using 8+2 and bandwidth is better utilized.
The theoretical speed of matrix processing is 25 MB/s per brick, but the real speed seen is considerably smaller, so network latencies and other factors also contribute to the heal time.
Xavi
-Alastair On 8 May 2017 at 03:02, Xavier Hernandez <xhernandez@xxxxxxxxxx <mailto:xhernandez@xxxxxxxxxx>> wrote: On 05/05/17 13:49, Pranith Kumar Karampuri wrote: On Fri, May 5, 2017 at 2:38 PM, Serkan Çoban <cobanserkan@xxxxxxxxx <mailto:cobanserkan@xxxxxxxxx> <mailto:cobanserkan@xxxxxxxxx <mailto:cobanserkan@xxxxxxxxx>>> wrote: It is the over all time, 8TB data disk healed 2x faster in 8+2 configuration. Wow, that is counter intuitive for me. I will need to explore about this to find out why that could be. Thanks a lot for this feedback! Matrix multiplication for encoding/decoding of 8+2 is 4 times faster than 16+4 (one matrix of 16x16 is composed by 4 submatrices of 8x8), however each matrix operation on a 16+4 configuration takes twice the amount of data of a 8+2, so net effect is that 8+2 is twice as fast as 16+4. An 8+2 also uses bigger blocks on each brick, processing the same amount of data in less I/O operations and bigger network packets. Probably these are the reasons why 16+4 is slower than 8+2. See my other email for more detailed description. Xavi On Fri, May 5, 2017 at 10:00 AM, Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx <mailto:pkarampu@xxxxxxxxxx> <mailto:pkarampu@xxxxxxxxxx <mailto:pkarampu@xxxxxxxxxx>>> wrote: > > > On Fri, May 5, 2017 at 11:42 AM, Serkan Çoban <cobanserkan@xxxxxxxxx <mailto:cobanserkan@xxxxxxxxx> <mailto:cobanserkan@xxxxxxxxx <mailto:cobanserkan@xxxxxxxxx>>> wrote: >> >> Healing gets slower as you increase m in m+n configuration. >> We are using 16+4 configuration without any problems other then heal >> speed. >> I tested heal speed with 8+2 and 16+4 on 3.9.0 and see that heals on >> 8+2 is faster by 2x. > > > As you increase number of nodes that are participating in an EC set number > of parallel heals increase. Is the heal speed you saw improved per file or > the over all time it took to heal the data? > >> >> >> >> On Fri, May 5, 2017 at 9:04 AM, Ashish Pandey <aspandey@xxxxxxxxxx <mailto:aspandey@xxxxxxxxxx> <mailto:aspandey@xxxxxxxxxx <mailto:aspandey@xxxxxxxxxx>>> wrote: >> > >> > 8+2 and 8+3 configurations are not the limitation but just suggestions. >> > You can create 16+3 volume without any issue. >> > >> > Ashish >> > >> > ________________________________ >> > From: "Alastair Neil" <ajneil.tech@xxxxxxxxx <mailto:ajneil.tech@xxxxxxxxx> <mailto:ajneil.tech@xxxxxxxxx <mailto:ajneil.tech@xxxxxxxxx>>> >> > To: "gluster-users" <gluster-users@xxxxxxxxxxx <mailto:gluster-users@xxxxxxxxxxx> <mailto:gluster-users@xxxxxxxxxxx <mailto:gluster-users@xxxxxxxxxxx>>> >> > Sent: Friday, May 5, 2017 2:23:32 AM >> > Subject: disperse volume brick counts limits in RHES >> > >> > >> > Hi >> > >> > we are deploying a large (24node/45brick) cluster and noted that the >> > RHES >> > guidelines limit the number of data bricks in a disperse set to 8. Is >> > there >> > any reason for this. I am aware that you want this to be a power of 2, >> > but >> > as we have a large number of nodes we were planning on going with 16+3. >> > Dropping to 8+2 or 8+3 will be a real waste for us. >> > >> > Thanks, >> > >> > >> > Alastair >> > >> > >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx> <mailto:Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>> >> > http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users> <http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users>> >> > >> > >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx> <mailto:Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>> >> > http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users> <http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users>> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx> <mailto:Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>> >> http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users> <http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users>> > > > > > -- > Pranith -- Pranith _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx> http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users> _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx> http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users