Fast enumeration of objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a re-casting of my previous filter-objects command but without
any of the filtering so it is now just "list-all-objects".

I have retained the "--verbose" option which outputs the same format as
the default "cat-file --batch-check" as it provides a useful performance
gain to filtering though "cat-file" if this basic information is all
that is needed.

The motivating use case is to enable a script to quickly scan a large
number of repositories for any large objects.

I performed some test timings of some different commands on a clone of
the Linux kernel which was completely packed.

	$ time git rev-list --all --objects |
		cut -d" " -f1 |
		git cat-file --batch-check |
		awk '{if ($3 >= 512000) { print $1 }}' |
		wc -l
	958

	real    0m30.823s
	user    0m41.904s
	sys     0m7.728s

list-all-objects gives a significant improvement:

	$ time git list-all-objects |
		git cat-file --batch-check |
		awk '{if ($3 >= 512000) { print $1 }}' |
		wc -l
	958

	real    0m9.585s
	user    0m10.820s
	sys     0m4.960s

skipping the cat-filter filter is a lesser but still significant
improvement:

	$ time git list-all-objects -v |
		awk '{if ($3 >= 512000) { print $1 }}' |
		wc -l
	958

	real    0m5.637s
	user    0m6.652s
	sys     0m0.156s

The old filter-objects could do the size filter a little be faster, but
not by much:

	$ time git filter-objects --min-size=500k |
		wc -l
	958

	real    0m4.564s
	user    0m4.496s
	sys     0m0.064s
--
To unsubscribe from this list: send the line "unsubscribe git" in



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]