Re: [PATCH v2 1/8] packfile: prepare for the existence of '*.rev' files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 25, 2021 at 11:04:33AM -0800, Junio C Hamano wrote:
> Taylor Blau <me@xxxxxxxxxxxx> writes:
>
> >> Thinking out loud a bit: a .rev file means we're spending an extra map
> >> per pack (but not a descriptor, since we close after mmap). And like the
> >> .idx files (but unlike .pack file maps), we don't keep track of these
> >> and try to close them when under memory pressure. I think that's
> >> probably OK in terms of bytes. It may mean running up against operating
> >> system number-of-mmap limits more quickly ...
> >> ...
> >> >  	if (ends_with(name, ".idx"))
> >> >  		return 3;
> >> > -	return 4;
> >> > +	if (ends_with(name, ".rev"))
> >> > +		return 4;
> >> > +	return 5;
> >> >  }
> >>
> >> Probably not super important, but: should the .idx file still come last
> >> here? Simultaneous readers won't start using the pack until the .idx
> >> file is present. We'd probably prefer they see the whole thing
> >> atomically, than see a .idx missing its .rev (they won't ever produce a
> >> wrong answer, but they'll generate the in-core revindex on the fly when
> >> they don't need to).
>
> At some point, we may want to
>
>  - introduce .idx version 3 that is more extensible, so that the
>    reverse info is included in one of its chunks;

I'm not opposed to doing so outside of this series. I'd be fine with
resurrecting the series that Stolee posted a while ago to extract a
"chunk writer" API (for using in the commit-graph and MIDX code) to also
be used here.

That got stalled out because of the conflicts that it would have
produced with Abhishek's work, but now that that's getting picked up it
could move forward.

But brian is also working on an index v3 for some hash transition stuff,
so I'd rather let that settle first.

Of course, you could combine those efforts, but I don't think that what
we have here is such a bad interim state.

>  - make the .rev data for all packs stored as a chunk in .midx, so
>    we can first check with .midx and not open any .rev files.

This is trickier than you might think because of how the MIDX selects
a pack when more than one pack contains a given object. There is also
the concept of a "multi-pack reverse index" which you can think of as
the reverse index for a pack that is more-or-less concatenating all of
the objects in the MIDX together (in pack order). That is the same order
we'll use to lay out the bits of a multi-pack bitmap, eventually.

> either of which would reduce the numberfrom 30k down to 10k ;-)

Your ";-)" is a good reminder that there are probably many other
problems with having 30k (or even 10k!) packs that would make solving
this (index v3 + MIDX chunk) not worthwhile.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux