Re: Fw: benchmark of Excel, Calc, Google Docs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/12/2019 17:30, Aditya Parameswaran wrote:
Wols,

Thanks for sharing your concerns, replete with quotes from Einstein :-) !

I believe I share those concerns... but there are of course some liberties we can take as an academic group that you folks managing a popular spreadsheet tool cannot take (e.g., doing a proof of concept as opposed to a robust implementation.)   From a research/academic standpoint it is valuable to note that something is possible, even if the solution is not ideal given other pragmatic considerations.  BTW, I don't believe that anything we're doing *requires* a relational database -- a NoSQL setup would work just fine.

Bear in mind I'm not a regular developer. My name is on the list of credits, and I follow the mailing list, but I'm more one of those annoying people who moans about it. However, as they say, "you want customers who moan, because they're the people who want your product to improve. It's the people who leave without moaning that you should be concerned about".

I'd be happy to discuss more.  Our goal is to understand the stumbling blocks in translating our work into spreadsheet tools such as Calc and see how we can best help with what we've learned.

I'm more in to proving that relational databases are basically rubbish (as is the mathematical justification behind them) :-) That said, the maths itself is good, provided you don't use it to build a real-world database engine!

Okay, to demolish RDBMSs :-) I don't know which rule it is (no 1?) that says "Data comes in rows and columns". So your first job in building your relational application is to call in a data analyst, to whom you give the job of bashing square pegs in to round holes...

To me, data is what the user gives you. metadata is what the analyst deduces about the data. AND YOU DON'T MIX THE TWO!!!

Given that an RDBMS stores data as sets, and a lot of data comes as lists (which contain a lot of metadata) the only way to store a list in an RDBMS is to muddle data and metadata. Put mathematically, if you store a list when you want a set, you can just throw away the information you don't want (or need). However, once you've thrown that information away (ie the order of the items) so you can store your list in a set-based RDBMS, *you can't get it back*!

My database of choice is Pick, where the basic unit of storage is the object, not the row. And because an object is an n-dimensional array, rather than the 1-dimensional row, I can store an entire relational view in it. (Which is what you should do, relational maths is okay, RDBMSs aren't :-) And because I have an n-dimensional array, I can store lists of lists :-)

The problem I have at this point is that I'm now discussing the merit of databases. I'm not actually helping you "translate your work into spreadsheet tools". It's easy for me to extract a two-dimensional table from Pick and, truth be told, I can probably extract it faster (including massaging it into shape) from Pick than selecting it from an RDBMS! But unless we can find out some way of displaying a 3- or 4-dimensional array (a relational view) in a spreadsheet that isn't horribly confusing, I don't really know what to suggest other than to use a SQL query as the layer between the database and the spreadsheet. The database may be n-dimensional, but if your spreadsheet only has two, you might as well use a 2-dimensional query and let the database handle the complexity. It's not that hard.

(I have - personally - managed a ram-starved 32-user Pick system that provided perfectly acceptable user response times despite thrashing like mad...)

(Oh, and as for pushing calculation into the database, Pick has had virtual fields for ages. I can define a field as being a calculation eg VAT = PRICE * 20%, so I can "store price, retrieve vat" and it's just automagical :-) (I get the impression most users are unaware of similar capabilities in RDBMSs)

Cheers,
Wol

Cheers,
Aditya

On Mon, Dec 9, 2019 at 3:37 PM Wols Lists <antlists@xxxxxxxxxxxxxxx <mailto:antlists@xxxxxxxxxxxxxxx>> wrote:

    On 09/12/19 19:14, Aditya Parameswaran wrote:
     >            The idea of converting to SQL queries is an
    interesting one
     >     but I find
     >     it very hard to believe it would provide any performance
    advantage at
     >     the same memory footprint. Furthermore - I'd be interested to
    know how
     >     you do other spreadsheet operations: row & column insertion,
    addressing,
     >     and dependency work on top of a SQL database with any efficiency.
     >
     >
     > We started by having the relational database be a simple persistent
     > storage layer, when coupled with an index to retrieve data by
    position,
     > can allow us to scroll through large datasets of billions of rows at
     > ease. We developed a new positional index to handle insertions and
     > deletions in O(log(n)) -- https://arxiv.org/pdf/1708.06712.pdf. I
    agree
     > that pushing the computation to the relational database does have
     > overheads; but at the same time, it allows for scaling to arbitrarily
     > large datasets.

    "the quickest way to optimise database access is to ditch first normal
    form".

    A provocative statement I know, but I'm very much in the NoSQL camp. I
    can hunt up the details of a face-off between Oracle and Cache, where
    Oracle had to "cheat" to achieve 100K tpm (things like deferring index
    updates) whereas Cache blasted through 250K barely breaking a sweat ...
    (or it might well have been tps)

    The maths supports this ...

    That said, a spreadsheet is inherently first normal formal, so tying a
    spreadsheet and a relational database together MAY make sense.

    In general though, Einstein said "make things as simple as possible BUT
    NO SIMPLER". Relational oversimplifies the database side, which means
    the application side is over-complex in order to compensate. (Which is
    why Cache blew Oracle out of the water.)

    I'm quite happy to wax lyrical, but I'd rather not preach to an audience
    who aren't at least interested. Feel free to ask me to continue on list,
    or contact me privately, and I'll try to prove everything as
    mathematically as I can :-)

     > but at the same time, it allows for scaling to arbitrarily
     > large datasets.

    At the price of massive unnecessary complexity.

    Cheers,
    Wol


_______________________________________________
LibreOffice mailing list
LibreOffice@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/libreoffice


_______________________________________________
LibreOffice mailing list
LibreOffice@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/libreoffice




[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux