Re: [PATCH v5 02/17] pack-mtimes: support reading .mtimes files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Taylor Blau wrote:

> This patch prepares for cruft packs by defining the `.mtimes` format,
> and introducing a basic API that callers can use to read out individual
> mtimes.

Makes sense.  Does this intend to produce any functional change?  I'm
guessing not (and the lack of tests agrees), but the commit message
doesn't say so.

By the way, is this something we could cover in tests, e.g. using a
test helper that exercises the new code?

[...]
> --- a/Documentation/technical/pack-format.txt
> +++ b/Documentation/technical/pack-format.txt
> @@ -294,6 +294,25 @@ Pack file entry: <+
>  
>  All 4-byte numbers are in network order.
>  
> +== pack-*.mtimes files have the format:
> +
> +All 4-byte numbers are in network byte order.
> +
> +  - A 4-byte magic number '0x4d544d45' ('MTME').
> +
> +  - A 4-byte version identifier (= 1).
> +
> +  - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256).
> +
> +  - A table of 4-byte unsigned integers. The ith value is the
> +    modification time (mtime) of the ith object in the corresponding
> +    pack by lexicographic (index) order. The mtimes count standard
> +    epoch seconds.
> +
> +  - A trailer, containing a checksum of the corresponding packfile,
> +    and a checksum of all of the above (each having length according
> +    to the specified hash function).
> +

This describes the "syntax" but not the "semantics" of the file.
Should I look to a separate piece of documentation for the semantics?
If so, can this one include a mention of that piece of documentation
to make it easier to find?

[...]
> --- a/object-store.h
> +++ b/object-store.h
> @@ -115,12 +115,15 @@ struct packed_git {
>  		 freshened:1,
>  		 do_not_close:1,
>  		 pack_promisor:1,
> -		 multi_pack_index:1;
> +		 multi_pack_index:1,
> +		 is_cruft:1;
>  	unsigned char hash[GIT_MAX_RAWSZ];
>  	struct revindex_entry *revindex;
>  	const uint32_t *revindex_data;
>  	const uint32_t *revindex_map;
>  	size_t revindex_size;
> +	const uint32_t *mtimes_map;
> +	size_t mtimes_size;

What does mtimes_map contain?  A comment would help.


> --- /dev/null
> +++ b/pack-mtimes.c
> @@ -0,0 +1,126 @@
> +#include "pack-mtimes.h"
> +#include "object-store.h"
> +#include "packfile.h"

Missing #include of git-compat-util.h.

> +
> +static char *pack_mtimes_filename(struct packed_git *p)
> +{
> +	size_t len;
> +	if (!strip_suffix(p->pack_name, ".pack", &len))
> +		BUG("pack_name does not end in .pack");
> +	/* NEEDSWORK: this could reuse code from pack-revindex.c. */
> +	return xstrfmt("%.*s.mtimes", (int)len, p->pack_name);
> +}

This seems simple enough that it's not obvious we need more code
sharing.  Do you agree?  If so, I'd suggest just removing the
NEEDSWORK comment.

> +
> +#define MTIMES_HEADER_SIZE (12)
> +#define MTIMES_MIN_SIZE (MTIMES_HEADER_SIZE + (2 * the_hash_algo->rawsz))

Hm, the all-caps name makes this feel like a compile-time constant but
it contains a reference to the_hash_algo.  Could it be an inline
function instead?

> +
> +struct mtimes_header {
> +	uint32_t signature;
> +	uint32_t version;
> +	uint32_t hash_id;
> +};
> +
> +static int load_pack_mtimes_file(char *mtimes_file,
> +				 uint32_t num_objects,
> +				 const uint32_t **data_p, size_t *len_p)

What does this function do?  A comment would help.

> +{
> +	int fd, ret = 0;
> +	struct stat st;
> +	void *data = NULL;
> +	size_t mtimes_size;
> +	struct mtimes_header header;
> +	uint32_t *hdr;
> +
> +	fd = git_open(mtimes_file);
> +
> +	if (fd < 0) {

nit: this would be more readable without the blank line between
setting and checking fd (likewise for the other examples below).
> +		ret = -1;
> +		goto cleanup;
> +	}

[...]
> +	if (mtimes_size - MTIMES_MIN_SIZE != st_mult(sizeof(uint32_t), num_objects)) {

This presupposes that the hash_id matches the_hash_algo.  Maybe worth
a NEEDSWORK comment.

[...]
> +cleanup:
> +	if (ret) {
> +		if (data)
> +			munmap(data, mtimes_size);
> +	} else {
> +		*len_p = mtimes_size;
> +		*data_p = (const uint32_t *)data;

Do we know that 'data' is uint32_t aligned?  Casting earlier in the
function could make that more obvious.

[...]
> +int load_pack_mtimes(struct packed_git *p)

This could use a doc comment in the header file.  For example, what
requirements do we have on what the caller passes as 'p'?

[...]
> +uint32_t nth_packed_mtime(struct packed_git *p, uint32_t pos)

Likewise.

[...]
> --- a/packfile.c
> +++ b/packfile.c
[...]
> @@ -363,7 +373,7 @@ void close_object_store(struct raw_object_store *o)
>  
>  void unlink_pack_path(const char *pack_name, int force_delete)
>  {
> -	static const char *exts[] = {".pack", ".idx", ".rev", ".keep", ".bitmap", ".promisor"};
> +	static const char *exts[] = {".pack", ".idx", ".rev", ".keep", ".bitmap", ".promisor", ".mtimes"};

Are these in any particular order?  Should they be?

[...]
> @@ -718,6 +728,10 @@ struct packed_git *add_packed_git(const char *path, size_t path_len, int local)
>  	if (!access(p->pack_name, F_OK))
>  		p->pack_promisor = 1;
>  
> +	xsnprintf(p->pack_name + path_len, alloc - path_len, ".mtimes");
> +	if (!access(p->pack_name, F_OK))
> +		p->is_cruft = 1;
> +
>  	xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
>  	if (stat(p->pack_name, &st) || !S_ISREG(st.st_mode)) {
>  		free(p);
> @@ -869,7 +883,8 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
>  	    ends_with(file_name, ".pack") ||
>  	    ends_with(file_name, ".bitmap") ||
>  	    ends_with(file_name, ".keep") ||
> -	    ends_with(file_name, ".promisor"))
> +	    ends_with(file_name, ".promisor") ||
> +	    ends_with(file_name, ".mtimes"))

likewise

Thanks,
Jonathan



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux