Re: [PATCH 2/2] fsck: use oidset for skiplist

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Sat, 11 Aug 2018 18:54:01 +0200

On Sat, Aug 11 2018, René Scharfe wrote:

> Object IDs to skip are stored in a shared static oid_array.  Lookups do
> a binary search on the sorted array.  The code checks if the object IDs
> are already in the correct order while loading and skips sorting in that
> case.

I think this change makes sense, but it's missing an update to the
relevant documentation in Documentation/config.txt:

    fsck.skipList::
    	The path to a sorted list of object names (i.e. one SHA-1 per
    	line) that are known to be broken in a non-fatal way and should
    	be ignored. This feature is useful when an established project
    	should be accepted despite early commits containing errors that
    	can be safely ignored such as invalid committer email addresses.
    	Note: corrupt objects cannot be skipped with this setting.

Also, while I use the skipList feature it's for something on the order
of 10-100 objects, so whatever algorithm the lookup uses isn't going to
matter, but I think it's interesting to describe the trade-off in the
commit message.

I.e. what if I have 100K objects listed in the skipList, is it only
going to be read lazily during fsck if there's an issue, or on every
object etc? What's the difference in performance?

Before this change, I wanted to follow-up my ab/fsck-transfer-updates
with something where we'd die if we found the skipList wasn't ordered as
we read it, but from a UI POV this is even better.