This session was led by Elijah Newren. Supporting cast: Johannes "Dscho" Schindelin, Jonathan Tan, Jonathan "jrnieder" Nieder, brian m. carlson, Jeff "Peff" King, Ævar Arnfjörð Bjarmason, Emily Shaffer, CB Bailey, Taylor Blau, and Philip Oakley. Notes: * sent my idea for rebase merges on-list * Test suite is slow. Shell scripts and process forking. * What if we had a special shell that interpreted the commands in a single process? * Even Git commands like rev-parse and hash-object, as long as that’s not the command you’re trying to test * Dscho wants to slip in a C-based solution * Jonathan tan commented: going back to your custom shell for tests idea, one thing we could do is have a custom command that generates the repo commits that we want (and that saves process spawns and might make the tests simpler too) * We could replace several “setup repo” steps with “git fast-import” instead. * Dscho measured: 0.5 sec - 30 sec in setup steps. Can use fast-import, or can make a new format that helps us set up the test scenario * Elijah: test-lib-functions helpers could be built ins * Biggest idea: there are a lot of people who version control things via tarballs or .zip files per version. This prevents history from compressing well. Some people check in those compressed files into Git for purposes of history. * In particular, .jar files or npm packages. Initial testing showed that you can expand .jar files in a way that creates source-like files. * Jonathan Nieder points out that “pristine-tar” exists to do similar ideas: https://joeyh.name/code/pristine-tar/ * Others use “git archive” for this purpose to mixed success. * jars and npm packages compress better if you store them in expanded form instead of compressed form * So many tools are used to using the end-archive, so while it’s tempting to have the build system be responsible for this, being able to “git add” the archive and have the right thing happen behind the scenes would be nice for ease of use * Goal here isn’t bit-for-bit reproducibility, just semantic reproducibility * What about other file formats that use zips, such as LibreOffice? * Git Merge 2018: Designers Git-It; A unified design system workflow did something similar, except made the tool understand the “exploded” file view. * Jonathan Tan mentions that smudge/clean filters can help, except this is about tree<->blob instead of blob<->blob * brian m. carlson mentions “git archive” output isn’t stable across Git versions. Should we have a canonical tar format that provides reproducibility? * Peff: tree<->blob filters can get confusing in the tree<->index<->worktree mapping. Possible, but requires careful thought about the details about when each spot * Old suggestion of a “blob-tree” type that allows storing a single index entry that corresponds to multiple trees and blobs in the background, possibly. * One long-term dream (inspired by Avery Pennarun’s “bup” tool) is to store large binary files in a tree-structured way that can store common regions as deltas, improve random access, parallelized hashing. Involves a consistent way to split the file into stable pieces, like --rsyncable uses (based on a rolling hash being zero). * Peff: you can do that at the object model layer or at the storage layer. The latter is less invasive. * jrnieder: The benefits of blobtree are greater at the object model layer --- e.g. not having to transmit chunks over the wire that you already have. I think the main obstacle has been that the benefits haven’t been enough to be worth the complexity. If that changes, we can imagine bundling it with some other object format changes, e.g. putting blob sizes in tree objects, and rolling it out as a new object-format. * Ævar: can we do this in a simpler manner, without deep technical changes? (Context: was thinking about this in the context of some $id$ questions.) Clean/smudge filters have some significant UX drawbacks. Has experience helping users trying to commit .jar files. Some simple advice saying “maybe you don’t want to commit this file type, here are some ways to expand it to a committable format…” based on patterns such as .gitignore or .gitattributes. We don’t have ways to indicate “this repo uses Git LFS, but you don’t have the plugin.” * Emily: If I could rewrite the commit object format, I would change some things * Allow multiple authors * Add a layer of indirection to author name * brian has thought about this too: replace name with email address + some ssh key or something and use something mailmap-like to map it. Could be a backward-compatible approach * CB has been thinking about these problems in the background. Could randomly generate an identifier when you commit your first patch, an @example.com address to avoid conflicting with any real address. Mailmap can be a blob maintained by the project * In the process can get first-class multiple authors * If I have this id representing this particular pair of authors, can update what the id points to * Cool stuff but gets complicated * Just getting mailmap applied to trailers in “git log” would be huge * CB: main reason I don’t put myself in mailmap is that it’s not worth bothering without that feature * Ævar: “git log --author” would want the mapping, too. (and ‘git shortlog --group’) Do we do this only at the presentation layer or if we do it at a lower layer do we get such things for free? * If anyone’s interested, I might know where the dragons are hiding, happy to give advice * Peff: “git shortlog” already knows how to parse it out so this seems very possible * Taylor: https://lore.kernel.org/git/YW8A5FznqLYs7MqH@xxxxxxxxxxxxxxxxxxxxxxx/T/ * Generation number was discussed ~2011(?) * Ævar: does this really need a format change? Two “author” fields would break things, but could have “author” and “x-author” header * General principle when changing formats: teasing apart where it’s possible to achieve what you want backward compatibility * Philip Oakley would like a commit id referring to an unborn branch as a proper id * brian: empty tree works for what you’re talking about when you want a diff * Philip: motivating example was “first parent is going nowhere, but you have a second parent” * jrnieder: I see, you want the --first-parent history of your published branch to match the reflog. As a workaround, you’re able to use an empty initial commit and use --no-ff merges whenever you pull things in, but you’re referring to wishing you didn’t have to make that empty initial commit * Ævar: reminds me of the discussion in https://www.fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki of commit/branch relationships