Re: Using ndb in RPM

Michael Schroeder <mls@xxxxxxx> · Tue, 11 Jul 2017 13:52:32 +0200

On Tue, Jul 11, 2017 at 06:41:05AM -0400, Neal Gompa wrote:
> And we do use SQLite today in DNF with the yumdb, as well as the new
> SWDB coming soon(TM). I'm not sure why the SQLite backend was removed
> in rpm 4.9.0, but maybe it should be revisited for rpm 4.14.

AFAIR it was removed because it was unbearable slow, nobody used it,
and nobody wanted to maintain it.

Here's a bit of technical input for discussions about rpm's database:

1) Background: how rpm uses the database

An Rpm package contains a header, which is basically a key->value
store. This header is put in the database as a binary blob (you
need the blob form for verification purposes). Rpm's database
layer maintains a couple of indices for parts of the header
like dependencies or the filelist, so queries are fast.
rpm has just two operations that modify the database: "add a
new header" and "remove an old header".

2) Problems with Berkeley DB

  - license issues
  - way too complex and buggy
  - bad locking scheme (stale locks)

3) What about ndb?

  - written as experiment how an ideal database would look like
  - currently optional to gather experience with it

  - main "packages" database written with data safty in mind, i.e.
    checksumming, extra tests to make sure data is not overwritten,
    simplistic design (1300 lines of code).
  - index databases are basically mmaped hashes
  - the database detects if the index is out of sync and auto-rebuilds
  - reader/writer locking scheme (multiple readers, one writer)

4) Does it make sense to use a self-made database?

There are pros and cons. On the pro side is that the code is pretty small
and tailored for rpm's need (speed & locking). With Berkeley DB, the
database code was a "black box", you couldn't do much which bugzilla
reports about a corrupt rpm database (and there were lots of those).
With the "database" being part of rpm the rpm maintainers can
actively fix issues.
The con side is, of course, that there's a bit more code to maintain.

So, suggesting different databases is fine and all, but they have
to be integrated and well tested. We re-added support for multiple
database just for that, so that we can test things and decide what
to do.

Cheers,
  Michael.

-- 
Michael Schroeder                                   mls@xxxxxxx
SUSE LINUX GmbH,           GF Jeff Hawn, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx