Re: [PATCH] precious-files.txt: new document proposing new precious file type

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I didn't know where I would best reply to give an update on my work
on precious file support, but here I go.

On my journey to daring implementing precious files in Git, I decided
to implement it in Gitoxide first to ease myself into it.

After what felt like months of work on the Gitoxide-equivalent of
dir.c, it just took 2 days to cobble together a 'gix clean' with
precious files support.

You might say that something as destructive as a 'clean' subcommand
would better not be rushed, but it was surprisingly straightforward
to implement. It was so inviting even that I could spend the second
day, today, entirely on polishing, yielding a 'gix clean' which is
fun to use, with some extras I never knew I wanted until I had full
control over it and could play around easily.

What I found myself do immediately by the way is adjust `.gitignore`
files of the project to have precious declarations right after
their non-precious counterparts for backwards compatibility.

It works perfectly, from what I can tell, and it is truly wonderful
to be able to wipe a repo clean without fear of destroying anything
valuable. And I am aware that we all know that, but wanted to write
it to underline how psychologically valuable this feature is.

Without further ado, I invite you all to give it a go yourself
for first experiences with precious files maybe.

    git clone https://github.com/Byron/gitoxide
    cd gitoxide
    cargo build --release --bin gix --no-default-features --features max-pure
	target/release/gix clean

This should do the trick - from there the program should guide the
user.

If you want to see some more interesting features besides precious
files, you can run 'cargo test -p gix' and follow the 'gix clean -xd'
instructions along with the `--debug` flag.

A word about performance: It is slower.
It started out to be only about 1% slower even on the biggest repositories
and under optimal conditions (i.e. precomposeUnicode and ignoreCase off
and skipHash true). But as I improved correctness and added features,
that was lost and it's now about 15% slower on bigger repositories.

I appended a benchmark run on the Linux kernel at the end, and it shows
that Gitoxide definitely spends more time in userland. I can only
assume that some performance was lost when I started to deviate from
the 'only do the work you need' recipe that I learned from Git to
'always provide a consistent set of information about directory entries'.

On top of that, there is multiple major shortcomings in this realm:

- Gitoxide doesn't actually get faster when reading indices with multiple
  threads for some reason.
- the icase-hashtable is created only with a single thread.
- the precompose-unicode conversion is very slow and easily costs 25%
  performance.

But that's details, some of which you can see yourself when running
'gix --trace -v clean'.

Now I hope you will have fun trying 'gix clean' with precious files in your
repositories. Also, I am particularly interested in learning how it fares
in situations where you know 'git clean' might have difficulties.
I tried very hard to achieve correctness, and any problem you find
will be fixed ASAP.

With this experience, I think I am in a good position to get precious
files support for 'git clean' implemented, once I get to make the start.

Cheers,
Sebastian

----

Here is the benchmark result (and before I forget, Gitoxide also uses about 25% more memory
for some reason, so really has some catchup to do, eventually)

linux (ffc2532) +369 -819 [!] took 2s
❯ hyperfine -N -w1 -r4  'gix clean -xd --skip-hidden-repositories=non-bare' 'gix -c index.skipHash=1 -c core.ignoreCase=0 -c core.precomposeUnicode=0 clean -xd --skip-hidden-repositories=non-bare' 'git clean -nxd'
Benchmark 1: gix clean -xd --skip-hidden-repositories=non-bare
  Time (mean ± σ):     171.7 ms ±   3.0 ms    [User: 70.4 ms, System: 101.4 ms]
  Range (min … max):   167.4 ms … 174.2 ms    4 runs

Benchmark 2: gix -c index.skipHash=1 -c core.ignoreCase=0 -c core.precomposeUnicode=0 clean -xd --skip-hidden-repositories=non-bare
  Time (mean ± σ):     156.3 ms ±   3.1 ms    [User: 56.9 ms, System: 99.3 ms]
  Range (min … max):   154.1 ms … 160.8 ms    4 runs

Benchmark 3: git clean -nxd
  Time (mean ± σ):     138.4 ms ±   2.7 ms    [User: 40.5 ms, System: 103.7 ms]
  Range (min … max):   136.1 ms … 142.0 ms    4 runs

Summary
  git clean -nxd ran
    1.13 ± 0.03 times faster than gix -c index.skipHash=1 -c core.ignoreCase=0 -c core.precomposeUnicode=0 clean -xd --skip-hidden-repositories=non-bare
    1.24 ± 0.03 times faster than gix clean -xd --skip-hidden-repositories=non-bare


On 27 Dec 2023, at 6:28, Junio C Hamano wrote:

> "Elijah Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>
>> From: Elijah Newren <newren@xxxxxxxxx>
>>
>> We have traditionally considered all ignored files to be expendable, but
>> users occasionally want ignored files that are not considered
>> expendable.  Add a design document covering how to split ignored files
>> into two types: 'trashable' (what all ignored files are currently
>> considered) and 'precious' (the new type of ignored file).
>
> The proposed syntax is a bit different from what I personally prefer
> (which is Phillip's [P14] or something like it), but I consider that
> the more valuable parts of this document is about how various
> commands ought to interact with precious paths, which shouldn't
> change regardless of the syntax.
>
> Thanks for putting this together.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux