Re: git --archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 24.09.22 um 15:19 schrieb René Scharfe:
> It could be done by relying on the randomness of the object IDs and
> partitioning by a sub-string.  Or perhaps using pseudo-random numbers
> is sufficient:
>
>    git ls-tree -r HEAD |
>    awk '{print $3}' |

No need for awk here, of course; "git ls-tree -r --object-only HEAD"
does the same.  Just saying.

> Here's an idea after all: Using "git ls-tree" without "-r" and handling
> recursing in the prefetch script would allow traversing trees in a
> different order and even in parallel.  Not sure how to limit parallelism
> to a sane degree.

How about something like this?  xargs -P provides a controlled degree of
parallelism.  Sorting by object ID (i.e. hash value) should provide a
fairly random order.  Does this thing work for you?


treeish=HEAD
parallelism=8

dir=$(mktemp -d)
echo "$treeish" >"$dir/trees"

# Traverse all sub-trees in randomized order and collect all blob IDs.
while test -s "$dir/trees"
do
        sort <"$dir/trees" >"$dir/trees.current"
        rm "$dir/trees"
        xargs -P "$parallelism" -L 1 git ls-tree <"$dir/trees.current" |
        awk -v dir="$dir" -v pieces="$parallelism" '
                $2 == "tree" {print $3 > (dir "/trees")}
                $2 == "blob" {print $3 >> (dir "/blobs" int(rand() * pieces))}
        '
done

# Prefetch all blobs in randomized order.
replstr="%"
command="sort $replstr | git cat-file --batch >/dev/null"
ls "$dir/blobs"* | xargs -P "$parallelism" -I "$replstr" -L 1 sh -c "$command"

rm "$dir/trees.current" "$dir/blobs"*
rmdir "$dir"




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux