On Sat, 2016-02-20 at 15:59 +0700, Duy Nguyen wrote: > On Thu, Feb 18, 2016 at 12:17 PM, David Turner < > dturner@xxxxxxxxxxxxxxxx> wrote: > > LMDB has a few features that make it suitable for usage in git: > > ... > > I'm reading lmdb documents and hitting the caveat section [1]. > Random thoughts > > * "There is normally no pure read-only mode, since readers need write > access to locks and lock file.". > > This will be a problem for server side that serves git:// protocol > only. Some of those servers disable write access to the entire > repository and git still works fine (but won't when lmdb is used). > Should we do something in this case? Just tell server admins to relax > file access, or use MDB_NOLOCK (and when? based on config var?) MDB_NOLOCK is a good idea. I'm going to add this to the "Weaknesses" section of the docs and plan to fix it later, unless you feel strongly otherwise. > * " Use an MDB_env* in the process which opened it, without > fork()ing.". We do use fork on non-Windows in run-command.c, but it > should be followed by exec() with no ref access in between, so we're > almost good. > > I notice atexit() is used in this for/exec code, which reminds me we > also use atexit() in many other places. I hope none of them access > refs, or we could be in trouble. I don't think they should do that anyway -- it's too much complexity for an atexit handler. > * "Do not have open an LMDB database twice in the same process at the > same time. Not even from a plain open() call - close()ing it breaks > flock() advisory locking" > > I wonder what happens if we do open twice, will it error out or > silently ignore and move on? Because if it's the latter case, we need > some protection from the caller and I'm not sure if > lmdb-backend.c:init_env() has it, especially when it open submodule's > lmdb. I think LMDB could check this, but it doesn't seem to. I've refactored some of the submodule transaction stuff so that now we have more protection there. > * "Avoid long-lived transactions...." > > OK we don't have a problem with this. But it makes me realize lmdb > transactions do not map with ref transactions. We don't open lmdb > transaction at ref_transaction_begin(), for example. Is it simply > more > convenient to do transactions the current way, or is it impossible or > incorrect to attach lmdb transactions to ref_transaction_*()? That was what I did originally, but reviewers here noted that it had some problems: 1. What if, while a transaction is open, git opens a subprocess that wants to make its own transaction? There can only be one writer transaction open at a time. 2. As you note, long-lived transactions. Also, the files backend also doesn't do any locking until the last moment, and it's reasonable to try to be compatible with that. > * "Avoid aborting a process with an active transaction. The > transaction becomes "long-lived" as above until a check for stale > readers is performed or the lockfile is reset, since the process may > not remove it from the lockfile." > > Does it mean we should at atexit() and signal handler to release > currently active transaction? Probably a good idea. > * "Do not use LMDB databases on remote filesystems, even between > processes on the same host. This breaks flock() on some OSes, > possibly > memory map sync, and certainly sync between programs on different > hosts." > > OK can't do anything about it anyway, but maybe it should be > mentioned > somewhere in Git documentation. Sure. > * "Opening a database can fail if another process is opening or > closing it at exactly the same time." > > We have some retry logic in resolve_ref_1(). Do we need the same for > lmdb? Not very important though. > > [1] http://symas.com/mdb/doc/ wat. OK, I'll add a retry loop on that. I guess we can just keep retrying on EACCES or EAGAIN. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html