On 10/12/2019 17:30, Aditya Parameswaran wrote:
Wols,
Thanks for sharing your concerns, replete with quotes from Einstein :-) !
I believe I share those concerns... but there are of course some
liberties we can take as an academic group that you folks managing a
popular spreadsheet tool cannot take (e.g., doing a proof of concept as
opposed to a robust implementation.) From a research/academic
standpoint it is valuable to note that something is possible, even if
the solution is not ideal given other pragmatic considerations. BTW, I
don't believe that anything we're doing *requires* a relational database
-- a NoSQL setup would work just fine.
Bear in mind I'm not a regular developer. My name is on the list of
credits, and I follow the mailing list, but I'm more one of those
annoying people who moans about it. However, as they say, "you want
customers who moan, because they're the people who want your product to
improve. It's the people who leave without moaning that you should be
concerned about".
I'd be happy to discuss more. Our goal is to understand the stumbling
blocks in translating our work into spreadsheet tools such as Calc and
see how we can best help with what we've learned.
I'm more in to proving that relational databases are basically rubbish
(as is the mathematical justification behind them) :-) That said, the
maths itself is good, provided you don't use it to build a real-world
database engine!
Okay, to demolish RDBMSs :-) I don't know which rule it is (no 1?) that
says "Data comes in rows and columns". So your first job in building
your relational application is to call in a data analyst, to whom you
give the job of bashing square pegs in to round holes...
To me, data is what the user gives you. metadata is what the analyst
deduces about the data. AND YOU DON'T MIX THE TWO!!!
Given that an RDBMS stores data as sets, and a lot of data comes as
lists (which contain a lot of metadata) the only way to store a list in
an RDBMS is to muddle data and metadata. Put mathematically, if you
store a list when you want a set, you can just throw away the
information you don't want (or need). However, once you've thrown that
information away (ie the order of the items) so you can store your list
in a set-based RDBMS, *you can't get it back*!
My database of choice is Pick, where the basic unit of storage is the
object, not the row. And because an object is an n-dimensional array,
rather than the 1-dimensional row, I can store an entire relational view
in it. (Which is what you should do, relational maths is okay, RDBMSs
aren't :-) And because I have an n-dimensional array, I can store lists
of lists :-)
The problem I have at this point is that I'm now discussing the merit of
databases. I'm not actually helping you "translate your work into
spreadsheet tools". It's easy for me to extract a two-dimensional table
from Pick and, truth be told, I can probably extract it faster
(including massaging it into shape) from Pick than selecting it from an
RDBMS! But unless we can find out some way of displaying a 3- or
4-dimensional array (a relational view) in a spreadsheet that isn't
horribly confusing, I don't really know what to suggest other than to
use a SQL query as the layer between the database and the spreadsheet.
The database may be n-dimensional, but if your spreadsheet only has two,
you might as well use a 2-dimensional query and let the database handle
the complexity. It's not that hard.
(I have - personally - managed a ram-starved 32-user Pick system that
provided perfectly acceptable user response times despite thrashing like
mad...)
(Oh, and as for pushing calculation into the database, Pick has had
virtual fields for ages. I can define a field as being a calculation eg
VAT = PRICE * 20%, so I can "store price, retrieve vat" and it's just
automagical :-) (I get the impression most users are unaware of similar
capabilities in RDBMSs)
Cheers,
Wol
Cheers,
Aditya
On Mon, Dec 9, 2019 at 3:37 PM Wols Lists <antlists@xxxxxxxxxxxxxxx
<mailto:antlists@xxxxxxxxxxxxxxx>> wrote:
On 09/12/19 19:14, Aditya Parameswaran wrote:
> The idea of converting to SQL queries is an
interesting one
> but I find
> it very hard to believe it would provide any performance
advantage at
> the same memory footprint. Furthermore - I'd be interested to
know how
> you do other spreadsheet operations: row & column insertion,
addressing,
> and dependency work on top of a SQL database with any efficiency.
>
>
> We started by having the relational database be a simple persistent
> storage layer, when coupled with an index to retrieve data by
position,
> can allow us to scroll through large datasets of billions of rows at
> ease. We developed a new positional index to handle insertions and
> deletions in O(log(n)) -- https://arxiv.org/pdf/1708.06712.pdf. I
agree
> that pushing the computation to the relational database does have
> overheads; but at the same time, it allows for scaling to arbitrarily
> large datasets.
"the quickest way to optimise database access is to ditch first normal
form".
A provocative statement I know, but I'm very much in the NoSQL camp. I
can hunt up the details of a face-off between Oracle and Cache, where
Oracle had to "cheat" to achieve 100K tpm (things like deferring index
updates) whereas Cache blasted through 250K barely breaking a sweat ...
(or it might well have been tps)
The maths supports this ...
That said, a spreadsheet is inherently first normal formal, so tying a
spreadsheet and a relational database together MAY make sense.
In general though, Einstein said "make things as simple as possible BUT
NO SIMPLER". Relational oversimplifies the database side, which means
the application side is over-complex in order to compensate. (Which is
why Cache blew Oracle out of the water.)
I'm quite happy to wax lyrical, but I'd rather not preach to an audience
who aren't at least interested. Feel free to ask me to continue on list,
or contact me privately, and I'll try to prove everything as
mathematically as I can :-)
> but at the same time, it allows for scaling to arbitrarily
> large datasets.
At the price of massive unnecessary complexity.
Cheers,
Wol
_______________________________________________
LibreOffice mailing list
LibreOffice@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/libreoffice
_______________________________________________
LibreOffice mailing list
LibreOffice@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/libreoffice