Re: reftable [v5]: new ref storage format

Shawn Pearce <spearce@xxxxxxxxxxx> · Tue, 8 Aug 2017 02:16:49 -0700

On Tue, Aug 8, 2017 at 12:52 AM, Jeff King <peff@xxxxxxxx> wrote:
> On Mon, Aug 07, 2017 at 03:40:48PM +0000, David Turner wrote:
>
>> > -----Original Message-----
>> > From: Shawn Pearce [mailto:spearce@xxxxxxxxxxx]
>> > In git-core, I'm worried about the caveats related to locking. Git tries to work
>> > nicely on NFS, and it seems LMDB wouldn't. Git also runs fine on a read-only
>> > filesystem, and LMDB gets a little weird about that. Finally, Git doesn't have
>> > nearly the risks LMDB has about a crashed reader or writer locking out future
>> > operations until the locks have been resolved. This is especially true with shared
>> > user repositories, where another user might setup and own the semaphore.
>>
>> FWIW, git has problems with stale lock file in the event of a crash (refs/foo.lock
>> might still exist, and git does nothing to clean it up).
>>
>> In my testing (which involved a *lot* of crashing), I never once had to clean up a
>> stale LMDB lock.  That said, I didn't test on a RO filesystem.
>
> Yeah, I'd expect LMDB to do much better than Git in a crash, because it
> relies on flock. So when the kernel goes away, so too does your lock
> (ditto if a git process dies without remembering to remove the lock,
> though I don't think we've ever had such a bug).
>
> But that's also why it may not work well over NFS (though my impression
> is that flock _does_ work on modern NFS; I've been lucky enough not to
> ever use it). Lack of NFS support wouldn't be a show-stopper for most
> people, but it would be for totally replacing the existing code, I'd
> think. I'm just not clear on what the state of lmdb-on-nfs is.
>
> Assuming it could work, the interesting tradeoffs to me are:
>
>   - something like reftable is hyper-optimized for high-latency
>     block-oriented access. It's not clear to me if lmdb would even be
>     usable for the distributed storage case Shawn has.
>
>   - reftable is more code for us to implement, but we'd "own" the whole
>     stack down to the filesystem. That could be a big win for debugging
>     and optimizing for our use case.
>
>   - reftable is re-inventing a lot of the database wheel. lmdb really is
>     a debugged, turn-key solution.
>
> I'm not opposed to a world where lmdb becomes the standard solution and
> Google does their own bespoke thing. But that's easy for me to say
> because I'm not Google. I do care about keeping complexity and bugs to a
> minimum for most users, and it's possible that lmdb could do that. But
> if it can't become the baseline standard (due to NFS issues), then we'd
> still want something to replace the current loose/packed storage. And if
> reftable does that, then lmdb becomes a lot less interesting.

Peff, thank you for this summary. It echos my opinions as well.

On the one hand, I love the idea of offloading the database stuff to
lmdb. But its got two technical blockers for me: behavior on NFS, and
virtualizing onto a different filesystem in userspace.

I really need a specialized reference store on a virtualized
distributed storage. The JGit reftable implementation fits that need
today. So we're probably going to go ahead and deploy that in our
environment.

I'd like to start writing a prototype reftable in C for git-core soon,
but I've been distracted by the JGit version first. It would be good
to have something to compare against the lmdb approach for git-core
before we make any decisions about what git-core wants to promote as
the new standard for ref storage.