Re: [PATCH v2] packfile: freshen the mtime of packfile by configuration

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 14 Jul 2021 20:19:15 +0200

On Wed, Jul 14 2021, Taylor Blau wrote:

> On Thu, Jul 15, 2021 at 12:46:47AM +0800, Sun Chao wrote:
>> > Stepping back, I'm not sure I understand why freshening a pack is so
>> > slow for you. freshen_file() just calls utime(2), and any sync back to
>> > the disk shouldn't need to update the pack itself, just a couple of
>> > fields in its inode. Maybe you could help explain further.
>> >
>> > [ ... ]
>>
>> The reason why we want to avoid freshen the mtime of ".pack" file is to
>> improve the reading speed of Git Servers.
>>
>> We have some large repositories in our Git Severs (some are bigger than 10GB),
>> and we created '.keep' files for large ".pack" files, we want the big files
>> unchanged to speed up git upload-pack, because in our mind the file system
>> cache will reduce the disk IO if a file does not changed.
>>
>> However we find the mtime of ".pack" files changes over time which makes the
>> file system always reload the big files, that takes a lot of IO time and result
>> in lower speed of git upload-pack and even further the disk IOPS is exhausted.
>
> That's surprising behavior to me. Are you saying that calling utime(2)
> causes the *page* cache to be invalidated and that most reads are
> cache-misses lowering overall IOPS?
>
> If so, then I am quite surprised ;). The only state that should be
> dirtied by calling utime(2) is the inode itself, so the blocks referred
> to by the inode corresponding to a pack should be left in-tact.
>
> If you're on Linux, you can try observing the behavior of evicting
> inodes, blocks, or both from the disk cache by changing "2" in the
> following:
>
>     hyperfine 'git pack-objects --all --stdout --delta-base-offset >/dev/null'
>       --prepare='sync; echo 2 | sudo tee /proc/sys/vm/drop_caches'
>
> where "1" drops the page cache, "2" drops the inodes, and "3" evicts
> both.
>
> I wonder if you could share the results of running the above varying
> the value of "1", "2", and "3", as well as swapping the `--prepare` for
> `--warmup=3` to warm your caches (and give us an idea of what your
> expected performance is probably like).

I think you may be right narrowly, but wrong in this context :)

I.e. my understanding of this problem is that they have some incremental
backup job, e.g. rsync without --checksum (not that doing that would
help, chicken & egg issue)..

So by changing the mtime you cause the file to be re-synced.

Yes Linux (or hopefully any modern OS) isn't so dumb as to evict your FS
cache because of such a metadata change, but that's besides the point.

If you have a backup job like that your FS cache will get evicted or be
subject to churn anyway, because you'll shortly be having to deal with
the "rsync" job that's noticed the changed mtime competing for caching
resources with "real" traffic.

Sun: Does that summarize the problem you're having?

<large digression ahead>

Sun, also: Note that in general doing backups of live git repositories
with rsync is a bad idea, and will lead to corruption.

The most common cause of such corruption is that a tool like "rsync"
will iterate recursively through say "objects" followed by "refs".

So by the time it gets to the latter (or is doing a deep iteration
within those dirs) git's state has changed in such a way as to yield an
rsync backup in a state that the repository was never in.

(As an aside, I've often wondered what it is about git exactly makes
people who'd never think of doing the same thing with the FS part of an
RDMBS's data store think that implementing such an ad-hoc backup
solution for git would be a good idea, but I digress. Perhaps we need
more scarier looking BerkeleyDB-looking names in the .git directory :)

Even if you do FS snapshots of live git repositories you're likely to
get corruption, search this mailing list for references to fsync(),
e.g. [1].

In short, git's historically (and still) been sloppy about rsync, and
relied on non-standard behavior such as "if I do N updates for N=1..100,
and fsync just "100", then I can assume 1..99 are fsynced (spoiler: you
can't assume that).

Our use of fsync is still broken in that sense today, git is not a safe
place to store your data in the POSIXLY pedantic sense (and no, I don't
just mean that core.fsyncObjectFiles is `false` by default, it only
covers a small part of this, e.g. we don't fsync dir entries even with
that).

On a real live filesystem this is usually not an issue, because if
you're not dealing with yanked power cords (and even then, journals
might save you), then even if you fsync a file but don't fsync the dir
entry it's in, the FS is usually forgiving about such cases.

I.e. if someone does a concurrent request for the could-be-outdated dir
entry they'll service the up-to-date one, even without that having been
fsync'd, because the VFS layer isn't going to the synced disk, it's
checking it's current state and servicing your request from that.

But at least some FS snapshot implementations have a habit of exposing
the most pedantic interpretation possible of FS semantics, and one that
you wouldn't ever get on a live FS. I.e. you might be hooking into the
equivalent of the order in which things are written to disk, and end up
with a state that would never have been exposed to a running program
(there would be a 1=1 correspondence if we fsync'd properly, which we
don't).

The best way to get backups of git repositories you know are correct are
is to use git's own transport mechanisms, i.e. fetch/pull the data, or
create bundles from it. This would be the case even if we fixed all our
fsync issues, because doing so wouldn't help you in the case of a
bit-flip, but an "index-pack" on the other end will spot such issues.

1. https://lore.kernel.org/git/20200917112830.26606-2-avarab@xxxxxxxxx/