Re: [PATCH v2] packfile: freshen the mtime of packfile by configuration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> 2021年7月14日 09:39,Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> 写道:
> 
>> +The default is 'pack' which means the *.pack file will be freshened by
>> +default. You can configure a different suffix to use, the file with the
>> +suffix will be created automatically, it's better not using any known
>> +suffix such like 'idx', 'keep', 'promisor'.
>> +
> 
> Hrm, per my v1 feedback (and I'm not sure if my suggestion is even good
> here, there's others more familiar with this area than I am), I was
> thinking of something like a *.bump file written via:
> 
>    core.packUseBumpFiles=bool
> 
> Or something like that, anyway, the edge case in allowing the user to
> pick arbitrary suffixes is that we'd get in-the-wild user arbitrary
> configuration squatting on a relatively sensitive part of the object
> store.
> 
> E.g. we recently added *.rev files to go with
> *.{pack,idx,bitmap,keep,promisor} (and I'm probably forgetting some
> suffix). What if before that a user had set:
> 
>    core.packMtimeSuffix=rev
> 
> In practice it's probably too obscure to worry about, but I think it's
> still worth it to only strictly write things we decide to write into the
> object store.

Thanks, this makes sense, allowing user to pick arbitrary suffixes may cause
some unpredictable problems, E.g. users who don’t know about the *.keep files
but setting core.packMtimeSuffix=keep, so I think it’s right to restrict the
suffix here.

>> core.deltaBaseCacheLimit::
>> 	Maximum number of bytes per thread to reserve for caching base objects
>> 	that may be referenced by multiple deltified objects.  By storing the
>> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
>> index 3fbc5d70777..60bacc8ee7f 100644
>> --- a/builtin/index-pack.c
>> +++ b/builtin/index-pack.c
>> @@ -1437,19 +1437,6 @@ static void fix_unresolved_deltas(struct hashfile *f)
>> 	free(sorted_by_pos);
>> }
>> 
>> -static const char *derive_filename(const char *pack_name, const char *strip,
>> -				   const char *suffix, struct strbuf *buf)
>> -{
>> -	size_t len;
>> -	if (!strip_suffix(pack_name, strip, &len) || !len ||
>> -	    pack_name[len - 1] != '.')
>> -		die(_("packfile name '%s' does not end with '.%s'"),
>> -		    pack_name, strip);
>> -	strbuf_add(buf, pack_name, len);
>> -	strbuf_addstr(buf, suffix);
>> -	return buf->buf;
>> -}
> 
> 
> Would be more readable to split this series up into at least two
> patches, starting with just moving this function as-is to a the new
> location (just renaming it & moving it), and then using it. I don't
> think there's changes to it, but right now I'm just eyeballing the
> diff. It's more obvious if it's split up.

Thanks, I will do it.

> 
>> +	if (!strcmp(var, "core.packmtimesuffix")) {
>> +		return git_config_string(&pack_mtime_suffix, var, value);
> 
> Can drop the {} braces here, per Documentation/CodingGuidelines

Thanks, I will read the CodingGuidelines again and fix the issue.

> 
>> +const char *pack_mtime_suffix = "pack";
> 
> I can see how having a configurable suffix made the implementation
> easier, perhaps that's how it started?
> 

yes, here is the default value.

> 
>> int fsync_object_files;
>> size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
>> size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
>> diff --git a/object-file.c b/object-file.c
>> index f233b440b22..b3e77213c42 100644
>> --- a/object-file.c
>> +++ b/object-file.c
>> @@ -1974,12 +1974,41 @@ static int freshen_loose_object(const struct object_id *oid)
>> static int freshen_packed_object(const struct object_id *oid)
>> {
>> 	struct pack_entry e;
>> +	struct stat st;
>> +	struct strbuf name_buf = STRBUF_INIT;
>> +	const char *filename;
>> +
>> 	if (!find_pack_entry(the_repository, oid, &e))
>> 		return 0;
>> 	if (e.p->freshened)
>> 		return 1;
>> -	if (!freshen_file(e.p->pack_name))
>> -		return 0;
>> +
>> +	filename = e.p->pack_name;
>> +	if (!strcasecmp(pack_mtime_suffix, "pack")) {
>> +		if (!freshen_file(filename))
>> +			return 0;
>> +		e.p->freshened = 1;
>> +		return 1;
>> +	}
>> +
>> +	/* If we want to freshen different file instead of .pack file, we need
>> +	 * to make sure the file exists and create it if needed.
>> +	 */
>> +	filename = derive_pack_filename(filename, "pack", pack_mtime_suffix, &name_buf);
> 
> You populate name_buf here, but don't strbuf_release(&name_buf) it at the end of this function.

Will fix it.

> 
>> +	if (lstat(filename, &st) < 0) {
>> +		int fd = open(filename, O_CREAT|O_EXCL|O_WRONLY, 0664);
>> +		if (fd < 0) {
>> +			// here we need to check it again because other git process may created it
> 
> /* */ comments, not //, if it's needed at all. Covered in CodingGuidelines

Will fix it.

> 
>> +			if (lstat(filename, &st) < 0)
>> +				die_errno("unable to create '%s'", filename);
> 
> If we can't create this specific file here shouldn't we just continue
> silently at this point? Surely if this process is screwed we're just
> about to die on something more important?

Yes, because here we have the *.pack file exists, we can step back to freshen
the *.pack file if the specific file cannot be created. And when we want to get
the mtime, we can also read mtime from *.pack file either if the specific
file does not exists. 

> 
> And lstat() can also return transitory errors that don't indicate
> "unable to create", e.g. maybe we can out of memory the kernel is
> willing to give us or something (just skimming the lstat manpage).

Thanks, I will think about it.

> 
>> +		} else {
>> +			close(fd);
>> +		}
>> +	} else {
>> +		if (!freshen_file(filename))
>> +			return 0;
> 
> Style/indentatino: just do "} else if (!freshen..." ?

Will fix it.
> 
>> +	}
>> +
>> 	e.p->freshened = 1;
>> 	return 1;
>> }
>> diff --git a/packfile.c b/packfile.c
>> index 755aa7aec5e..a607dda4e25 100644
>> --- a/packfile.c
>> +++ b/packfile.c
>> @@ -40,6 +40,19 @@ char *sha1_pack_index_name(const unsigned char *sha1)
>> 	return odb_pack_name(&buf, sha1, "idx");
>> }
>> 
>> +const char *derive_pack_filename(const char *pack_name, const char *strip,
>> +				const char *suffix, struct strbuf *buf)
>> +{
>> +	size_t len;
>> +	if (!strip_suffix(pack_name, strip, &len) || !len ||
>> +	    pack_name[len - 1] != '.')
>> +		die(_("packfile name '%s' does not end with '.%s'"),
>> +		    pack_name, strip);
>> +	strbuf_add(buf, pack_name, len);
>> +	strbuf_addstr(buf, suffix);
>> +	return buf->buf;
>> +}
> 
> Just have this return void?

I renamed the original 'derive_filename' here with new name 'derive_pack_filename',
when it is used like this:

    filename = derive_filename(pack_name, "pack", suffix, &name_buf);

and so it looks like a more convenient way to use the strbuf object, so I decided to
keep it the old way.

> 
>> static unsigned int pack_used_ctr;
>> static unsigned int pack_mmap_calls;
>> static unsigned int peak_pack_open_windows;
>> @@ -727,6 +740,17 @@ struct packed_git *add_packed_git(const char *path, size_t path_len, int local)
>> 	 */
>> 	p->pack_size = st.st_size;
>> 	p->pack_local = local;
>> +
>> +	/* If we have different file used to freshen the mtime, we should
>> +	 * use it at a higher priority.
>> +	 */
>> +	if (!!strcasecmp(pack_mtime_suffix, "pack")) {
>> +		struct strbuf name_buf = STRBUF_INIT;
>> +		const char *filename;
>> +
>> +		filename = derive_pack_filename(path, "idx", pack_mtime_suffix, &name_buf);
>> +		stat(filename, &st);
> 
> I.e. the "filename" here isn't needed, just call derive_pack_filename()
> and use name.buf.buf to stat.

Let me thinks about it, it's a good idea then.

> 
> Also: We should check the stat return value here & report errno if
> needed, no?

Here I think we should step back to use the mtime of *.pack file instead, so if the specific file does
not exists, there are some reasons like (a) the *.pack is just created and mtime of it has not freshened
again or (b) we can not create the specific file for some reason, and in this case we will freshen the
*.pack file instead. (c) someone or some process just delete it, we should try to go futher.

I don't know if it's right to do this, but I think we should avoid die here.
> 
>> +test_expect_success 'do not bother loosening old objects with core.packmtimesuffix config' '
>> +	obj1=$(echo three | git hash-object -w --stdin) &&
>> +	obj2=$(echo four | git hash-object -w --stdin) &&
>> +	pack1=$(echo $obj1 | git -c core.packmtimesuffix=bump pack-objects .git/objects/pack/pack) &&
>> +	pack2=$(echo $obj2 | git -c core.packmtimesuffix=bump pack-objects .git/objects/pack/pack) &&
>> +	git -c core.packmtimesuffix=bump prune-packed &&
>> +	git cat-file -p $obj1 &&
>> +	git cat-file -p $obj2 &&
>> +	touch .git/objects/pack/pack-$pack2.bump &&
>> +	test-tool chmtime =-86400 .git/objects/pack/pack-$pack2.bump &&
>> +	git -c core.packmtimesuffix=bump repack -A -d --unpack-unreachable=1.hour.ago &&
> 
> On command-lines we can spell it camel-cased, e.g. -c
> core.packMtimeSuffix[...].
> 

Thanks, will do it.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux