Re: Making split commit graphs pick up new options (namely --changed-paths)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/11/2021 3:01 PM, Taylor Blau wrote:
> On Fri, Jun 11, 2021 at 01:47:28PM -0400, Derrick Stolee wrote:
>> On 6/10/2021 8:50 PM, Taylor Blau wrote:
>>> On Fri, Jun 11, 2021 at 01:56:31AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>>> So yeah, maybe we can just unlink() them right away, or another way to
>>>> handle the race is that load_commit_graph_chain() could just try again
>>>> from the beginning in such a case, and presumably picking up the fresh
>>>> just-rewritten chain.
>>>
>>> I'd probably be in favor of the latter.
>>
>> I want to point out that on Windows we cannot successfully unlink()
>> a layer that is currently being read by another Git process. That
>> will not affect server scenarios (to the best of my knowledge) but
>> is important to many end users.
> 
> Right, but isn't this already a problem today? Since the expiration
> window is zero we are already effectively trying to unlink all merged
> layers immediately:
> 
>   - Marking merged commit-graph layers as expired via
>     mark_commit_graphs() by setting their mtime to "now", and then
> 
>   - Immediately removing all layers which have mtime older than an
>     instant later in expire_commit_graphs().

The commit-graph management we built in Scalar and VFS for Git uses
a non-zero expire time to avoid this problem. However, since we will
clean up the ones that fail to unlink() in a future cleanup, we did
not do a similar thing in Git's background maintenance and have not
had any issues.

> (I almost suggested that a race already exists between multiple writers
> that merge multiple layers of the commit-graph, but that race doesn't
> exist because the commit-graph chain is written before other layers are
> marked and expired.)

And the maintenance buitin solves this by using a maintenance.lock
file to avoid concurrent processes operating on the same repo.

> In any case, it seems like the return value from unlink() is
> deliberately ignored in case another process is holding an expired layer
> open when we try to unlink it. So we'll eventually clean up all layers
> that don't belong to the commit-graph-chain, but at the granularity of
> new writes.

Correct. It also requires new data for the write, or the commit-graph
write will leave early and skip the cleanup IIRC.

> (FWIW, I had to re-read 8d84097f96 (commit-graph: expire commit-graph
> files, 2019-06-18) which mentions that a configuration variable would be
> introduced to change the expiration window, but we don't have any such
> configuration option. It also doesn't make any mention of handling this
> problem on Windows, which made me think that the unlink() calls weren't
> checking their return values by accident when in fact it was probably on
> purpose.)

That config option never appeared, probably because ignoring the
unlink() return was sufficient to get around this problem. Thanks
for digging in and making sure I remembered this correctly.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux