Re: [PATCH v2] packfile: freshen the mtime of packfile by configuration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> 2021年7月15日 01:04,Taylor Blau <ttaylorr@xxxxxxxxxx> 写道:
> 
> [...]
>> 
>> However we find the mtime of ".pack" files changes over time which makes the
>> file system always reload the big files, that takes a lot of IO time and result
>> in lower speed of git upload-pack and even further the disk IOPS is exhausted.
> 
> That's surprising behavior to me. Are you saying that calling utime(2)
> causes the *page* cache to be invalidated and that most reads are
> cache-misses lowering overall IOPS?
> 
> If so, then I am quite surprised ;). The only state that should be
> dirtied by calling utime(2) is the inode itself, so the blocks referred
> to by the inode corresponding to a pack should be left in-tact.
> 
> If you're on Linux, you can try observing the behavior of evicting
> inodes, blocks, or both from the disk cache by changing "2" in the
> following:
> 
>    hyperfine 'git pack-objects --all --stdout --delta-base-offset >/dev/null'
>      --prepare='sync; echo 2 | sudo tee /proc/sys/vm/drop_caches'
> 
> where "1" drops the page cache, "2" drops the inodes, and "3" evicts
> both.
> 
> I wonder if you could share the results of running the above varying
> the value of "1", "2", and "3", as well as swapping the `--prepare` for
> `--warmup=3` to warm your caches (and give us an idea of what your
> expected performance is probably like).
> 
> Thanks,
> Taylor

I'm sorry to reply so late, I work long hours during the day, and the company
network can not send external mail, so I can only go home late at night to reply to you.

Thanks for your reply again, My explaination for 'why the mtime is so important' lost some
informations and it is not clear enough, I will tell the details here:

Servers:
- We maintain a number of servers, each mounting some NFS disks that hold our git
  repositories, some of them are so large (cannot reduce the size now), they are > 10GB
- There are too many objects and large files in the git history which result in some
  large '.pack' files in the '.git/objects/pack' directires
- We created the '.keep' files for each large '.pack' file, wish the disk cache can reduce
  the NFS IOPS and just load contents from caches.

Clients:
- There are too many CI systems are keep downloading the git repositories in a very
  high frequency, e.g. we find different CI systems make 600 download requests in a short
  period of time by 'git fetch'.
- Some developers are doing 'git push' at the same time, create Pull Requests after that
  (which trigger the CI then), so git servers will do some update tasks which may cause the
  mtime of '.pack' file freshend.

So, in this case there will be many 'git-upload-pack' processes running on the git servers,
they all need to load the big '.pack' files. The 'git-upload-pack' will be faster if the
disk cache is warmed up and the NFS server will be not so busy.

However we find the IOPS of the NFS server always be exhausted and the 'git-upload-pack' will
runs for a very long time. We noticed the mtime of '.pack' changes over time, one of my
colleagues who is familiar with the file system tell me it's the mtime who invalidate the
disk caches.

So we want the caches to be valid for a long time which can speed up the 'git-upload-pack'
processes.

I don't known if the `/proc/sys/vm/drop_caches` can help or not, but thanks for your tips,
I will try to check them and see if there are some differences.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux