Re: [PATCH v2 1/8] packfile: add repository to struct `packed_git`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Taylor Blau <me@xxxxxxxxxxxx> writes:

> On Mon, Oct 28, 2024 at 02:43:39PM +0100, Karthik Nayak wrote:
>> The struct `packed_git` holds information regarding a packed object
>> file. Let's add the repository variable to this object, to represent the
>> repository that this packfile belongs to. This helps remove dependency
>> on the global `the_repository` object in `packfile.c` by simply using
>> repository information now readily available in the struct.
>
> Makes sense, good. I think it would be useful here to capture some of
> the discussion from just before you sent this series to indicate why
> it's OK to use the_repository even when we have alternates.
>
> I think it is now quite obvious in retrospect, but let's do our future
> selves a service by capturing it here, too ;-).
>

Agree, will add it in the next version.

>> ---
>>  10 files changed, 30 insertions(+), 16 deletions(-)
>
> Oh, good. I am glad to see that this new approach is already yielding
> far less disruptive of a change.
>
>> diff --git a/builtin/fast-import.c b/builtin/fast-import.c
>> index 76d5c20f14..ffee7d3abd 100644
>> --- a/builtin/fast-import.c
>> +++ b/builtin/fast-import.c
>> @@ -765,6 +765,7 @@ static void start_packfile(void)
>>
>>  	p->pack_fd = pack_fd;
>>  	p->do_not_close = 1;
>> +	p->repo = the_repository;
>
> Makes sense. Here we are crafting the packfile by hand, so initializing
> ->repo directly makes sense here.
>
> It would be nice if we could rewrite this in terms of
> packfile.c:alloc_packed_git(), but that is a static function. Exposing
> it as non-static is probably showing too much of the internals, so I
> think leaving this as-is makes sense.
>

Yes, I did consider that too, but dropped it for the same reasons you
stated.

>> diff --git a/commit-graph.c b/commit-graph.c
>> index 5bd89c0acd..83dd69bfeb 100644
>> --- a/commit-graph.c
>> +++ b/commit-graph.c
>> @@ -1914,7 +1914,7 @@ static int fill_oids_from_packs(struct write_commit_graph_context *ctx,
>>  		struct packed_git *p;
>>  		strbuf_setlen(&packname, dirlen);
>>  		strbuf_addstr(&packname, pack_indexes->items[i].string);
>> -		p = add_packed_git(packname.buf, packname.len, 1);
>> +		p = add_packed_git(ctx->r, packname.buf, packname.len, 1);
>
> I wondered if ctx->r was the right choice here or not, but it is, and it
> is (currently) always equal to the value of the_repository, so it's a
> moot point. Let's keep going...
>
>> diff --git a/midx-write.c b/midx-write.c
>> index b3a5f6c516..c57726ef94 100644
>> --- a/midx-write.c
>> +++ b/midx-write.c
>> @@ -154,7 +154,7 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len,
>>  			return;
>>
>>  		ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
>> -		p = add_packed_git(full_path, full_path_len, 0);
>> +		p = add_packed_git(the_repository, full_path, full_path_len, 0);
>
> Ugh. I thought we had already added a repository field to our auxiliary
> write_midx_context struct, but we have not, so this change looks right
> to me.  Doing so (adding that new field) seems like it would be a good
> piece of #leftoverbits.
>
>> diff --git a/midx.c b/midx.c
>> index e82d4f2e65..8edb75f51d 100644
>> --- a/midx.c
>> +++ b/midx.c
>> @@ -464,7 +464,7 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
>>  					strhash(key.buf), key.buf,
>>  					struct packed_git, packmap_ent);
>>  	if (!p) {
>> -		p = add_packed_git(pack_name.buf, pack_name.len, m->local);
>> +		p = add_packed_git(r, pack_name.buf, pack_name.len, m->local);
>
> OK, so here we're trusting the value of 'r' from the caller. That comes
> from 64404a24cf (midx: pass a repository pointer, 2019-04-29), which is
> doing the right thing. (As an aside, I thought that that change was from
> when we added the --object-dir flag to 'git multi-pack-index', but the
> change is in fact unrelated and has to do with adding installed packs to
> the repository's MRU list).
>
>>  		if (p) {
>>  			install_packed_git(r, p);
>>  			list_add_tail(&p->mru, &r->objects->packed_git_mru);
>> diff --git a/object-store-ll.h b/object-store-ll.h
>> index 53b8e693b1..8b31072b09 100644
>> --- a/object-store-ll.h
>> +++ b/object-store-ll.h
>> @@ -4,6 +4,7 @@
>>  #include "hashmap.h"
>>  #include "object.h"
>>  #include "list.h"
>> +#include "repository.h"
>
> Hmm. Do we need to include all of repository.h here? I don't think we
> do, because we never peek into any of the fields of that structure from
> within this header. So I think you could do something like:
>
> --- 8< ---
> diff --git a/object-store-ll.h b/object-store-ll.h
> index 6f9f4276e6..bcfae2e1bf 100644
> --- a/object-store-ll.h
> +++ b/object-store-ll.h
> @@ -4,13 +4,13 @@
>  #include "hashmap.h"
>  #include "object.h"
>  #include "list.h"
> -#include "repository.h"
>  #include "thread-utils.h"
>  #include "oidset.h"
>
>  struct oidmap;
>  struct oidtree;
>  struct strbuf;
> +struct repository;
>
>  struct object_directory {
>  	struct object_directory *next;
> --- >8 ---
>
> instead of #include-ing the whole thing, which would be preferable.
>

This is much better, I will patch it in.

>>  #include "thread-utils.h"
>>  #include "oidset.h"
>>
>> @@ -135,6 +136,10 @@ struct packed_git {
>>  	 */
>>  	const uint32_t *mtimes_map;
>>  	size_t mtimes_size;
>> +
>> +	/* repo dentoes the repository this packed file belongs to */
>> +	struct repository *repo;
>
> Calling this 'repo' makes sense, but...
>
>> diff --git a/packfile.c b/packfile.c
>> index 9560f0a33c..45f300e5e1 100644
>> --- a/packfile.c
>> +++ b/packfile.c
>> @@ -217,11 +217,12 @@ uint32_t get_pack_fanout(struct packed_git *p, uint32_t value)
>>  	return ntohl(level1_ofs[value]);
>>  }
>>
>> -static struct packed_git *alloc_packed_git(int extra)
>> +static struct packed_git *alloc_packed_git(struct repository *repo, int extra)
>
> Here and elsewhere, I think our usual convention is to call a 'struct
> repository *' (when it is a formal parameter of some function) just "r"
> instead of "repo".
>
> At least that's what my intuition told me, and a very rough grep says
> that '*r' appears as a parameter 815 times, while '*repo' appears only
> 577 times. It's close, but I think that '*r' is preferred here since
> it's fewer characters.
>

I agree, by now you know I prefer readability over fewer characters, so
it more of an intentional choice. But here, I think it can be '*r'
though, since it is sort of obvious what 'r' refers to in most cases.

I will change this in all commits in the next version.

>>  {
>>  	struct packed_git *p = xmalloc(st_add(sizeof(*p), extra));
>>  	memset(p, 0, sizeof(*p));
>
> Not at all the fault of this patch, but it feels like a bit of a
> foot-gun to allocate a bounds-checked version of 'sizeof(*p)+extra',
> while only zero'ing the first 'sizeof(*p)' bytes. I think in all cases
> where it actually matters via add_packed_git() we fill out that extra
> space anyway, but it might be nice cleanup to do something like:
>
>     struct packed_git *p;
>     size_t sz = sizeof(*p) + extra;
>
>     p = xcalloc(1, sz);
>
> , or something. But that can be dealt with later and/or as #leftoverbits.
>

Like you stated, there is bunch of refactoring we could do here, I mostly
don't want to digress and create too much noise, so I will turn a blind
eye in some cases and leave it for later.

> The rest is looking good, nicely done. Let's keep reading...
>
> Thanks,
> Taylor

Thanks for the review.

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux