Hi there,
So - opening this to a wider audience with minor edits - and
more comments in-line:
On 03/09/18 16:21, Alexander Bock wrote:
Sorry for the delay in getting back to you.
Michael wrote:
First - great to meet you; and good to find another team interested in
spreadsheets. The LibreOffice Calc team has been working hard to really
improve the state of the art wrt. spreadsheet representations - at least
in the form of things that we can feasibly re-architect towards while
continuing to ship a working product =) - not trivial.
Likewise :) we are still a relatively small crowd. I was very pleased to
read about all the improvements from the slide materials to get a better
idea of all the improvements that have been made.
Isn't that implementation decision such 'fun' =) we work quite hard to
kill implicit intersection in our internal compiled form of formulae -
that helps have precise dependencies: so =SIN(A:A) doesn't end up with
some massive heap there.
Ugh…I’m very glad Funcalc does not support it. I would be very
interested to know more about the internal compiled formulae if you have
any resources on that. Funcalc is interpreted except for the
sheet-defined functions (SDFs) which are compiled on-the-fly to .NET
bytecode and executed by the JIT compiler.
Here’s a picture of an SDF for rolling an n-sided die to get a better
idea of how it works. Cell A1 defines the function. The green cells are
zero or more input cells, the light blue cells (not shown) are
intermediate calculations and the blue cell is the single output cell of
the function. The pink border stems from the fact that SDFs can only be
defined on special “function sheets”.
Having said that - reading funcalc - it seems your dependency
structures are a cell <-> cell(s) mapping - not special casing range
references - which is unusual. I personally think that one of the big
errors of spreadsheet design is separating dependencies from formulae
but ... ;-) These days we do group dependency tracking which makes life
hugely more efficient.
To be more precise, it is indeed a cell-to-cell mapping although the
graph is implicitly defined, i.e. there is no omniscient, global data
structure. We do special case areas/ranges for supported cells (i.e. the
inverse direction of the dependency graph) which does a lot of caching
and cleverness to remain efficient. For each cell, we can still query
its dependencies and supported cells. For the full details, it is
probably better to refer you to my supervisor’s book “Spreadsheet
Implementation Technology”
<https://www.amazon.com/Spreadsheet-Implementation-Technology-Basics-Extensions/dp/0262526646>.
There may be an older version available somewhere on his site
<https://www.itu.dk/people/sestoft/>.
Sounds like an interesting book =) It's a shame we had a
LibreOffice conference in Denmark a couple of years back and would have
been interesting to meet up and compare spreadsheet design notes. Perhaps
you could join one of our Hamburg hackfests at some stage.
I was actually considering that we should track cell arrays/groups
explicitly especially for dependency management, instead of finding them
when necessary which is what I do in my tool currently. However, I am
unsure how well it will fit with the current infrastructure.
Hmm - so do you handle =RAND() for example - it's trivial using
=INDIRECT(ADDRESS(RAND()...)) to build dependency graph mutating horrors
- or just some vlookup on a semi-random cell range.
Good point. Simply by not having similar “problematic” functions
available. I guess that is one benefit of working with a research
prototype…I believe my supervisor did not want to deal with those
issues, as important as they may be, since the primary proof-of-concept
for Funcalc was sheet-defined functions.
This is all really interesting. I'd prefer we move this mail to the
public LibreOffice developer list if that's ok (?) may I re-post this
mail there ? No doubt that would get more input - and we can CC more
people outside Collabora that would be curious.
That is perfectly fine. Do I need to respond there then? It’s been a
quite a while since I’ve dealt with mailing lists :)
Easy - just reply to the mail - and be aware it's permanently public.
In addition - you should really check out Kohie's MDDS library - the
Multi-Dimensional-Data-Structures, which is the C++ template magic of
our core, and Kohei has his own new-gen spreadsheet prototype called
'Orcus' that we re-use pieces of in LibreOffice.
Ah yes, I saw it in one of the slides. Very interesting stuff and cool
use case! I’ll make sure to look into it.
...
Oh - and, another thing - in many of the cases that we can parallelize
LibreOffice 6.1 has some great performance characteristics vs. Excel by
wall-clock time measurements.
I'm wondering if you've measured any of that ?
Is this parallelisation via OpenCL or threads or both?
Both - but I think the representation wins and CPU parallelism
should be particularly interesting in 6.1.
We haven’t measured our tools against Excel. In part because felt the
comparison would be hard to attribute to specific optimisations without
taking into account all the internals of both pieces of software
(Funcalc is written in C#, not C/C++). Additionally, we are not trying
to beat Excel’s speed but use Funcalc as a research platform for
experimentation (we might have used LibreOffice Calc instead).
My initial guess would be that Excel will easily beat at least the
sequential execution since it has enjoyed ~25 years of optimisations
(barring issues related to a 25 year old codebase). It would definitely
be interesting to compare wall-clock time measurements anyway,
especially if Excel already has a feature for getting those numbers.
You might think so - but I couldn't possibly comment =) It would
be interesting too do some measurements and benchmarking here I think - we
have our own numbers, but some 3rd party looking at this might find some
good things.
Ultimately I'd be really interested in how your compiled & jitted
C# stuff works too, you can emit IL rather than C-strings (as we do for CL)
so perhaps it is also rather fast.
So far, we have modified the LibreOffice Calc benchmark spreadsheets
<https://gerrit.libreoffice.org/gitweb?p=benchmark.git;a=tree> to run in
Funcalc (by converting functions like PEARSON and SLOPE to sheet-defined
functions) and have only compared speed-ups between sequential and
parallel versions of Funcalc.
Interesting. Anyhow I'd suggest that LibreOffice is a great code-base
to read through to update Peter's book - we're wrestling mightily with the
data structures to make it cleaner, faster and more sensible while staying
fully compatible with Excel.
I'm also curious if there are good spreadsheet conferences out there,
I attended EUSPRIG in the past which was interesting - any recommendations ?
All the best,
Michael.
*Alexander Asp Bock*, PhD student
Computer Science Department <https://computerscience.wikit.itu.dk/>
IT University of Copenhagen <http://en.itu.dk/>
--
michael.meeks@xxxxxxxxxxxxx <><, GM Collabora Productivity
Hangout:
mejmeeks@xxxxxxxxx, Skype: mmeeks
(M) +44 7795 666 147 - timezone usually UK / Europe
<Attached Message Part.png>