Re: [PATCH 14/24] pack-objects: keep track of `pack_start` for each reuse pack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 28, 2023 at 02:08:32PM -0500, Taylor Blau wrote:
> When reusing objects from a pack, we keep track of a set of one or more
> `reused_chunk`s, corresponding to sections of one or more object(s) from
> a source pack that we are reusing. Each chunk contains two pieces of
> information:
> 
>   - the offset of the first object in the source pack (relative to the
>     beginning of the source pack)
>   - the difference between that offset, and the corresponding offset in
>     the pack we're generating
> 
> The purpose of keeping track of these is so that we can patch an
> OFS_DELTAs that cross over a section of the reuse pack that we didn't
> take.
> 
> For instance, consider a hypothetical pack as shown below:
> 
>                                                 (chunk #2)
>                                                 __________...
>                                                /
>                                               /
>       +--------+---------+-------------------+---------+
>   ... | <base> | <other> |      (unused)     | <delta> | ...
>       +--------+---------+-------------------+---------+
>        \                /
>         \______________/
>            (chunk #1)
> 
> Suppose that we are sending objects "base", "other", and "delta", and
> that the "delta" object is stored as an OFS_DELTA, and that its base is
> "base". If we don't send any objects in the "(unused)" range, we can't
> copy the delta'd object directly, since its delta offset includes a
> range of the pack that we didn't copy, so we have to account for that
> difference when patching and reassembling the delta.
> 
> In order to compute this value correctly, we need to know not only where
> we are in the packfile we're assembling (with `hashfile_total(f)`) but
> also the position of the first byte of the packfile that we are
> currently reusing.
> 
> Together, these two allow us to compute the reused chunk's offset
> difference relative to the start of the reused pack, as desired.

Hm. I'm not quite sure I fully understand the motivation here. Is this
something that was broken all along? Why does it become a problem now?
Sorry if I'm missing the obvious here.

> Helped-by: Jeff King <peff@xxxxxxxx>
> Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
> ---
>  builtin/pack-objects.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 7682bd65bb..eb8be514d1 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1016,6 +1016,7 @@ static off_t find_reused_offset(off_t where)
>  
>  static void write_reused_pack_one(struct packed_git *reuse_packfile,
>  				  size_t pos, struct hashfile *out,
> +				  off_t pack_start,
>  				  struct pack_window **w_curs)
>  {
>  	off_t offset, next, cur;
> @@ -1025,7 +1026,8 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile,
>  	offset = pack_pos_to_offset(reuse_packfile, pos);
>  	next = pack_pos_to_offset(reuse_packfile, pos + 1);
>  
> -	record_reused_object(offset, offset - hashfile_total(out));
> +	record_reused_object(offset,
> +			     offset - (hashfile_total(out) - pack_start));
>  
>  	cur = offset;
>  	type = unpack_object_header(reuse_packfile, w_curs, &cur, &size);
> @@ -1095,6 +1097,7 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile,
>  
>  static size_t write_reused_pack_verbatim(struct packed_git *reuse_packfile,
>  					 struct hashfile *out,
> +					 off_t pack_start UNUSED,
>  					 struct pack_window **w_curs)
>  {
>  	size_t pos = 0;
> @@ -1126,10 +1129,12 @@ static void write_reused_pack(struct packed_git *reuse_packfile,
>  {
>  	size_t i = 0;
>  	uint32_t offset;
> +	off_t pack_start = hashfile_total(f) - sizeof(struct pack_header);

Given that this patch in its current state doesn't seem to do anything
yet, am I right in assuming that `hashfile_total(f) - sizeof(struct
pack_header)` is always expected to be zero for now?

Patrick

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux