Hi Elijah, On Thu, 27 Jan 2022, Elijah Newren wrote: > On Wed, Jan 26, 2022 at 3:37 PM Matthew John Cheetham via GitGitGadget > <gitgitgadget@xxxxxxxxx> wrote: > > > > From: Matthew John Cheetham <mjcheetham@xxxxxxxxxxx> > > > > When operating at the scale that Scalar wants to support, certain data > > shapes are more likely to cause undesirable performance issues, such as > > large numbers or large sizes of loose objects. > > Makes sense. > > > By including statistics about this, `scalar diagnose` now makes it > > easier to identify such scenarios. > > > > Signed-off-by: Matthew John Cheetham <mjcheetham@xxxxxxxxxxx> > > --- > > contrib/scalar/scalar.c | 60 ++++++++++++++++++++++++++++++++ > > contrib/scalar/t/t9099-scalar.sh | 2 ++ > > 2 files changed, 62 insertions(+) > > > > diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c > > index 690933ffdf3..c0ad4948215 100644 > > --- a/contrib/scalar/scalar.c > > +++ b/contrib/scalar/scalar.c > > @@ -686,6 +686,60 @@ static void dir_file_stats(struct strbuf *buf, const char *path) > > closedir(dir); > > } > > > > +static int count_files(char *path) > > +{ > > + DIR *dir = opendir(path); > > + struct dirent *e; > > + int count = 0; > > + > > + if (!dir) > > + return 0; > > + > > + while ((e = readdir(dir)) != NULL) > > + if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG) > > + count++; > > + > > + closedir(dir); > > + return count; > > +} > > + > > +static void loose_objs_stats(struct strbuf *buf, const char *path) > > +{ > > + DIR *dir = opendir(path); > > + struct dirent *e; > > + int count; > > + int total = 0; > > + unsigned char c; > > + struct strbuf count_path = STRBUF_INIT; > > + size_t base_path_len; > > + > > + if (!dir) > > + return; > > + > > + strbuf_addstr(buf, "Object directory stats for "); > > + strbuf_add_absolute_path(buf, path); > > + strbuf_addstr(buf, ":\n"); > > + > > + strbuf_add_absolute_path(&count_path, path); > > + strbuf_addch(&count_path, '/'); > > + base_path_len = count_path.len; > > + > > + while ((e = readdir(dir)) != NULL) > > + if (!is_dot_or_dotdot(e->d_name) && > > + e->d_type == DT_DIR && strlen(e->d_name) == 2 && > > + !hex_to_bytes(&c, e->d_name, 1)) { > > You only recurse into directories, ignoring individual files. > > > + strbuf_setlen(&count_path, base_path_len); > > + strbuf_addstr(&count_path, e->d_name); > > + total += (count = count_files(count_path.buf)); > > + strbuf_addf(buf, "%s : %7d files\n", e->d_name, count); > > This shows the number of files within a directory. > > > + } > > + > > + strbuf_addf(buf, "Total: %d loose objects", total); > > and this shows the total number of files across all the directories. > > But the commit message suggested you also wanted to check for large > sizes of loose objects. Did that get ripped out at some point with > the commit message not being updated, or is it perhaps going to be > included later? No, there was no plan to include this information later, as the original .NET implementation of `scalar diagnose` did not provide that information, either (which I take as a strong sign that we never needed this type of information to help users, at least not up until this point). Besides, it would be kind of a difficult thing to say conclusively what makes a loose file "big". Is it the zlib-compressed size on disk? Or the unpacked size? Should there be a configurable threshold to determine when an object is big? Should `core.bigFileThreshold` be co-opted for this? Together with the fact that there was no need for this information in practice, it makes me doubt that we should add this type of information. I actually suspect that _iff_ information of that type would be helpful, a more complete tool like git-sizer (https://github.com/github/git-sizer/) would be needed, and I do not really want to subsume git-sizer's functionality in `scalar diagnose`. I rephrased the commit message. Ciao, Dscho > > > + > > + strbuf_release(&count_path); > > + closedir(dir); > > +} > > + > > static int cmd_diagnose(int argc, const char **argv) > > { > > struct option options[] = { > > @@ -734,6 +788,12 @@ static int cmd_diagnose(int argc, const char **argv) > > if ((res = stage(tmp_dir.buf, &buf, "packs-local.txt"))) > > goto diagnose_cleanup; > > > > + strbuf_reset(&buf); > > + loose_objs_stats(&buf, ".git/objects"); > > + > > + if ((res = stage(tmp_dir.buf, &buf, "objects-local.txt"))) > > + goto diagnose_cleanup; > > + > > if ((res = stage_directory(tmp_dir.buf, ".git", 0)) || > > (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) || > > (res = stage_directory(tmp_dir.buf, ".git/info", 0)) || > > diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh > > index b1745851e31..f2ec156d819 100755 > > --- a/contrib/scalar/t/t9099-scalar.sh > > +++ b/contrib/scalar/t/t9099-scalar.sh > > @@ -77,6 +77,8 @@ test_expect_success UNZIP 'scalar diagnose' ' > > unzip -p "$zip_path" diagnostics.log >out && > > test_file_not_empty out && > > unzip -p "$zip_path" packs-local.txt >out && > > + test_file_not_empty out && > > + unzip -p "$zip_path" objects-local.txt >out && > > test_file_not_empty out > > ' > > > > -- > > gitgitgadget >