Re: [discuss] Entry cache and backend txn plugin problems

Howard Chu <hyc@xxxxxxxxx> · Wed, 27 Feb 2019 10:49:32 +0000

> Date: Tue, 26 Feb 2019 17:21:50 -0700
> From: Rich Megginson <rmeggins@xxxxxxxxxx>
> Message-ID: <d40bde83-1e88-b34f-9b5d-d2b320468f14@xxxxxxxxxx>

> On 2/26/19 4:26 PM, William Brown wrote:
>>> I think the recursive/nested transaction on the database level are not the problem, we do this correctly already, either all or no change becomes persistent.
>>> What we do not manage is modifications we do in parallel on the in memory structure like the entry cache, changes to the EC are not managed by any txn and I do not see how any of the database txn models would help, they do not know about ec and can abort changes.
>>> We would need to incorporate the EC into a generic txn model, or have a way to flag ec entries as garbage for if a txn is aborted
>> The issue is we allow parallel writes, which breaks the consistency guarantees of the EC anyway. LMDB won’t allow parallel writes (it’s single write - concurrent parallel readers), and most other modern kv stores take this approach too, so we should be remodelling our transactions to match this IMO. It will make the process of how we reason about the EC much much simpler I think.

> Some sort of in-memory data structure with fast lookup and transactional semantics (modify operations are stored as mvcc/cow so each read of the database with a given txn handle sees its own 
> view of the ec, a txn commit updates the parent txn ec view, or the global ec view if no parent, from the copy, a txn abort deletes the txn's copy of the ec) is needed.  A quick google search 
> turns up several hits.  I'm not sure if the B+Tree proposed at http://www.port389.org/docs/389ds/design/cache_redesign.html has transactional semantics, or if such code could be added to its 
> implementation.
> 
> With LMDB, if we could make the on-disk entry representation the same as the in-memory entry representation, then we could use LMDB as the entry cache too - the database would be the entry 
> cache as well.

Exactly. This was the original design goal for back-mdb and LMDB in OpenLDAP.
http://www.openldap.org/lists/openldap-devel/200905/msg00036.html

Note that the back-mdb in OpenLDAP 2.4 is a compromise from this original design; we still
have a slight deserialization pass when reading entries from the DB. But it's much simpler
and faster than what we used to do with back-bdb/hdb.

Ultimately - if your local persistence layer is so slow that it needs an in-memory cache,
that local persistence layer is broken. This conclusion is inescapable, after many years of
working with BerkeleyDB.

-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx