Re: [PATCH 3/5] scalar: teach `diagnose` to gather packfile info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Stolee & Taylor,

On Thu, 27 Jan 2022, Derrick Stolee wrote:

> On 1/26/2022 5:43 PM, Taylor Blau wrote:
> > On Wed, Jan 26, 2022 at 08:41:45AM +0000, Matthew John Cheetham via GitGitGadget wrote:
> >> From: Matthew John Cheetham <mjcheetham@xxxxxxxxxxx>
> >>
> >> Teach the `scalar diagnose` command to gather file size information
> >> about pack files.
> >>
> >> Signed-off-by: Matthew John Cheetham <mjcheetham@xxxxxxxxxxx>
> >> ---
> >>  contrib/scalar/scalar.c          | 39 ++++++++++++++++++++++++++++++++
> >>  contrib/scalar/t/t9099-scalar.sh |  2 ++
> >>  2 files changed, 41 insertions(+)
> >>
> >> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> >> index e26fb2fc018..690933ffdf3 100644
> >> --- a/contrib/scalar/scalar.c
> >> +++ b/contrib/scalar/scalar.c
> >> @@ -653,6 +653,39 @@ cleanup:
> >>  	return res;
> >>  }
> >>
> >> +static void dir_file_stats(struct strbuf *buf, const char *path)
> >> +{
> >> +	DIR *dir = opendir(path);
> >> +	struct dirent *e;
> >> +	struct stat e_stat;
> >> +	struct strbuf file_path = STRBUF_INIT;
> >> +	size_t base_path_len;
> >> +
> >> +	if (!dir)
> >> +		return;
> >> +
> >> +	strbuf_addstr(buf, "Contents of ");
> >> +	strbuf_add_absolute_path(buf, path);
> >> +	strbuf_addstr(buf, ":\n");
> >> +
> >> +	strbuf_add_absolute_path(&file_path, path);
> >> +	strbuf_addch(&file_path, '/');
> >> +	base_path_len = file_path.len;
> >> +
> >> +	while ((e = readdir(dir)) != NULL)
> >
> > Hmm. Is there a reason that this couldn't use
> > for_each_file_in_pack_dir() with a callback that just does the stat()
> > and buffer manipulation?
> >
> > I don't think it's critical either way, but it would eliminate some of
> > the boilerplate that is shared between this implementation and the one
> > that already exists in for_each_file_in_pack_dir().
>
> It's helpful to see if there are other crud files in the pack
> directory. This method is also extended in microsoft/git to
> scan the alternates directory (which we expect to exist as the
> "shared objects cache).
>
> We might want to modify the implementation in this series to
> run dir_file_stats() on each odb in the_repository. This would
> give us the data for the shared object cache for free while
> being more general to other Git repos. (It would require us to
> do some reaction work in microsoft/git and be a change of
> behavior, but we are the only ones who have looked at these
> diagnose files before, so that change will be easy to manage.)

Good points all around. I went with the `for_each_file_in_pack_dir()`
approach, and threw in the now very simple change to also enumerate the
alternates, if there are any.

And yes, that will require some reaction work in microsoft/git, but for an
obvious improvement like this one, I don't grumble about the extra burden.

Ciao,
Dscho




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux