Re: [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Elijah,

On Thu, 27 Jan 2022, Elijah Newren wrote:

> On Wed, Jan 26, 2022 at 3:37 PM Matthew John Cheetham via GitGitGadget
> <gitgitgadget@xxxxxxxxx> wrote:
> >
> > From: Matthew John Cheetham <mjcheetham@xxxxxxxxxxx>
> >
> > When operating at the scale that Scalar wants to support, certain data
> > shapes are more likely to cause undesirable performance issues, such as
> > large numbers or large sizes of loose objects.
>
> Makes sense.
>
> > By including statistics about this, `scalar diagnose` now makes it
> > easier to identify such scenarios.
> >
> > Signed-off-by: Matthew John Cheetham <mjcheetham@xxxxxxxxxxx>
> > ---
> >  contrib/scalar/scalar.c          | 60 ++++++++++++++++++++++++++++++++
> >  contrib/scalar/t/t9099-scalar.sh |  2 ++
> >  2 files changed, 62 insertions(+)
> >
> > diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> > index 690933ffdf3..c0ad4948215 100644
> > --- a/contrib/scalar/scalar.c
> > +++ b/contrib/scalar/scalar.c
> > @@ -686,6 +686,60 @@ static void dir_file_stats(struct strbuf *buf, const char *path)
> >         closedir(dir);
> >  }
> >
> > +static int count_files(char *path)
> > +{
> > +       DIR *dir = opendir(path);
> > +       struct dirent *e;
> > +       int count = 0;
> > +
> > +       if (!dir)
> > +               return 0;
> > +
> > +       while ((e = readdir(dir)) != NULL)
> > +               if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
> > +                       count++;
> > +
> > +       closedir(dir);
> > +       return count;
> > +}
> > +
> > +static void loose_objs_stats(struct strbuf *buf, const char *path)
> > +{
> > +       DIR *dir = opendir(path);
> > +       struct dirent *e;
> > +       int count;
> > +       int total = 0;
> > +       unsigned char c;
> > +       struct strbuf count_path = STRBUF_INIT;
> > +       size_t base_path_len;
> > +
> > +       if (!dir)
> > +               return;
> > +
> > +       strbuf_addstr(buf, "Object directory stats for ");
> > +       strbuf_add_absolute_path(buf, path);
> > +       strbuf_addstr(buf, ":\n");
> > +
> > +       strbuf_add_absolute_path(&count_path, path);
> > +       strbuf_addch(&count_path, '/');
> > +       base_path_len = count_path.len;
> > +
> > +       while ((e = readdir(dir)) != NULL)
> > +               if (!is_dot_or_dotdot(e->d_name) &&
> > +                   e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
> > +                   !hex_to_bytes(&c, e->d_name, 1)) {
>
> You only recurse into directories, ignoring individual files.
>
> > +                       strbuf_setlen(&count_path, base_path_len);
> > +                       strbuf_addstr(&count_path, e->d_name);
> > +                       total += (count = count_files(count_path.buf));
> > +                       strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
>
> This shows the number of files within a directory.
>
> > +               }
> > +
> > +       strbuf_addf(buf, "Total: %d loose objects", total);
>
> and this shows the total number of files across all the directories.
>
> But the commit message suggested you also wanted to check for large
> sizes of loose objects.  Did that get ripped out at some point with
> the commit message not being updated, or is it perhaps going to be
> included later?

No, there was no plan to include this information later, as the original
.NET implementation of `scalar diagnose` did not provide that information,
either (which I take as a strong sign that we never needed this type of
information to help users, at least not up until this point).

Besides, it would be kind of a difficult thing to say conclusively what
makes a loose file "big". Is it the zlib-compressed size on disk? Or the
unpacked size? Should there be a configurable threshold to determine when
an object is big? Should `core.bigFileThreshold` be co-opted for this?

Together with the fact that there was no need for this information in
practice, it makes me doubt that we should add this type of information. I
actually suspect that _iff_ information of that type would be helpful, a
more complete tool like git-sizer (https://github.com/github/git-sizer/)
would be needed, and I do not really want to subsume git-sizer's
functionality in `scalar diagnose`.

I rephrased the commit message.

Ciao,
Dscho

>
> > +
> > +       strbuf_release(&count_path);
> > +       closedir(dir);
> > +}
> > +
> >  static int cmd_diagnose(int argc, const char **argv)
> >  {
> >         struct option options[] = {
> > @@ -734,6 +788,12 @@ static int cmd_diagnose(int argc, const char **argv)
> >         if ((res = stage(tmp_dir.buf, &buf, "packs-local.txt")))
> >                 goto diagnose_cleanup;
> >
> > +       strbuf_reset(&buf);
> > +       loose_objs_stats(&buf, ".git/objects");
> > +
> > +       if ((res = stage(tmp_dir.buf, &buf, "objects-local.txt")))
> > +               goto diagnose_cleanup;
> > +
> >         if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
> >             (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
> >             (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
> > diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
> > index b1745851e31..f2ec156d819 100755
> > --- a/contrib/scalar/t/t9099-scalar.sh
> > +++ b/contrib/scalar/t/t9099-scalar.sh
> > @@ -77,6 +77,8 @@ test_expect_success UNZIP 'scalar diagnose' '
> >         unzip -p "$zip_path" diagnostics.log >out &&
> >         test_file_not_empty out &&
> >         unzip -p "$zip_path" packs-local.txt >out &&
> > +       test_file_not_empty out &&
> > +       unzip -p "$zip_path" objects-local.txt >out &&
> >         test_file_not_empty out
> >  '
> >
> > --
> > gitgitgadget
>




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux