On Fri, Oct 22, 2021 at 04:31:46PM +0200, Ævar Arnfjörð Bjarmason wrote: > I'd very much support this living in-tree just as the po/* directory > already does. I.e. periodically pulled down. Just a bit of a tangent here, since weblate was mentioned earlier. I'd caution a bit against pulling the history generated by weblate directly. It's pretty sub-optimal from a Git perspective: you have a bunch of big .po files and then a ton of little commits changing one or a handful of lines. So the "logical" size of the repository (the sum of the actual object sizes) ends up growing quite a bit. Deltas can help with the on-disk size, but: - lots of operations scale with the logical size. The client-side index-pack of a clone, for instance, but also everyday stuff like "git log -S". - empirically we don't do a great job of finding these. See below for some numbers. For instance, take https://github.com/phpmyadmin/phpmyadmin, a repository which uses weblate (I don't mean to pick on them; it's just a repo whose weblate-related packing I've looked into before). A fresh clone is 1.3GB. If you do an aggressive repack, you can get it down to about 550MB. But there's still tons of logical data. Running: git cat-file --batch-all-objects --batch-check='%(objectsize) %(objectsize:disk)' | perl -alne ' $logical += $F[0]; $disk += $F[1]; END { print "$logical / $disk = " . $logical / $disk } ' shows that there's over 70GB of logical data. It gets an impressive 156:1 compression ratio (for comparison, "normal" repos like linux.git and git.git are around 40-60x in my experience). If you split it up by directory, like this: git rev-list --objects --all --no-object-names -- po | git cat-file --batch-check='%(objectsize)' | perl -lne '$total += $_; END { print $total }' you'll see that po/ accounts for almost 60GB of that logical size. We face some of that in our current po/, too. They're big files, and that's the nature of the problem space. But our current ones tend to be edited by taking a pass over the whole file, rather than the one-liners that a web-based workflow encourages. To be clear, I'm not arguing against weblate in general. It's cool that it makes it easier for people to contribute to translations. But I think it has an outsized impact on size and performance compared to the rest of the repository. That's a big price to pay for carrying the history in-tree. Obviously one option there is to squash the po/ history before pulling it in. The weblate commit messages themselves aren't that useful. I'm not actually sure if jnavila's work so far has been using weblate. The commits in his git-html-l10n are much coarser than what I see in phpmyadmin, for example (so maybe he's doing similar squashing already). -Peff