Re: Benchmark results on mdds::multi_type_vector

Kohei Yoshida <kohei@xxxxxxxxxxxxxxx> · Fri, 13 Dec 2019 17:38:01 -0500

On 13.12.2019 05:43, Luboš Luňák wrote:
On Friday 13 of December 2019, Kohei Yoshida wrote:
I just finished my benchmark testing on mdds::multi_type_vector, and
summarized my results in this blog post:

http://kohei.us/2019/12/12/benchmark-results-on-mdds-multi_type_vector/

Hopefully my findings and intepretations make sense.  In short, the
numbers look great.  The overhead of block shifting is a concern, but
I'm optimistic that this is going to be a non-issue for the most part.

 I'd really like to see benchmarks of Calc with this new mdds, 
especially to
see how many regressions there will be, as I'm concerned whether it 
really
would be worth it in reality.

Sure, I do share your concern, which is why I spent time designing and 
implementing the benchmark I did so that I can get some answers for my 
concern.

 You say that the vast majority of Calc
performance problems are with updating cell values without shifting, 
but that
makes sense because that's where the current bottleneck is. Once the
bottleneck moves to shifting of cells, we may get a whole new slew of
bugreports about that.

Sure, but that's just as much of a speculation as my own interpretation. 
 To be fair, it is possible that you are right, and I am wrong.  But I 
did provide my own interpretations of those numbers based on my own 
experience and educated guesses.  I'm not claiming that I'm right, but 
I'm claiming that what I concluded in my post is my truly honest, 
hopefully reasonably researched opinions.

E.g. copy&paste of a column is very likely to hit a
problem there, IIRC it internally results in a lot of shifting of 
cells.

Yes, which is why I ran the benchmarks to get some numbers to get more 
clarity.

 One interpretation of the graphs may be that the change helps a lot at 
the
cost of a regression in one place, but other possible interpretation is 
that
the change brings an improvement that can already be mostly achieved 
using
hints at the expense of a cost that cannot be alleviated. Moreover we 
did go
over all the reported performance problems related to mdds some months 
back
and fixed all of them (at least I'm not aware of any pending ones). So 
the
real question for me is how many of real-world cases will be improved 
and
worsened by this, which is why I'd like to see non-artifical 
benchmarks.

So, I'm a bit concerned about your use of the word "artificial" to 
describe my benchmark, because that word implies that I somehow made 
those numbers up.  Those are real numbers.  Now, the numbers will of 
course be quite different if you measure the entire Calc operations 
which include a whole bunch of other operations, and I believe this is 
what you are alluding to.  I do share your concern there.  But I thought 
it was reasonable to draw the conclusions that I did, given that the I/O 
with mdds::multi_type_vector do constitute a large part of Calc's cell 
I/O's.  Also, keep in mind that the rest of the Calc operations are 
constant, and the only variable is the mdds portion.  On this point, I 
believe it's not unreasonable to draw *some* conclusions based on the 
numbers on mdds alone.

Having said that, you are of course free to draw your own, different 
conclusions.

 BTW, I have you considered using vector operations like SSE for the 
updates
(either checking whether the compiler can employ them automatically or
hand-writing them)?

Yes.  For one, I did look into e.g. OpenMP's auto SIMD support.  But its 
support appeared to be very limited, and MSVC did not seem to support 
it.  I also thought about hand-writing SIMD directly, and I am still 
considering that as one of my future possibilities (note that I'm not 
entirely done with this work).  But I couldn't think of a good one to 
use, especially when multi_type_vector uses array of structures (AoS).  
SIMD intrinsics I know of are mostly not suitable for AoS.  If you know 
of good SIMD instinsics that may work for multi_type_vector, I would be 
interested.

I've done some SIMD coding in orcus to speed up XML and JSON parsing, 
but I can't say I'm expert at it, and I did not always manage to get the 
code to run faster with SIMD.

Alright, since now one person is raising objection on hastily 
integrating this piece, I should hold on to integrating this piece for 
now, and let the discussion continue.

And, while I would love to craft another benchmark test involving the 
entire Calc piece, I'm afraid I won't have enough bandwidth to do that.  
Even running this benchmark on mdds alone took me one month to do it 
end-to-end.  It would be nice to have someone else chip in and conduct 
another, more through and satisfactory benchmark test, if anybody is 
interested.

Thanks,

Kohei

--
Kohei Yoshida, LibreOffice Calc volunteer hacker
_______________________________________________
LibreOffice mailing list
LibreOffice@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/libreoffice