Am 24.09.22 um 15:19 schrieb René Scharfe: > It could be done by relying on the randomness of the object IDs and > partitioning by a sub-string. Or perhaps using pseudo-random numbers > is sufficient: > > git ls-tree -r HEAD | > awk '{print $3}' | No need for awk here, of course; "git ls-tree -r --object-only HEAD" does the same. Just saying. > Here's an idea after all: Using "git ls-tree" without "-r" and handling > recursing in the prefetch script would allow traversing trees in a > different order and even in parallel. Not sure how to limit parallelism > to a sane degree. How about something like this? xargs -P provides a controlled degree of parallelism. Sorting by object ID (i.e. hash value) should provide a fairly random order. Does this thing work for you? treeish=HEAD parallelism=8 dir=$(mktemp -d) echo "$treeish" >"$dir/trees" # Traverse all sub-trees in randomized order and collect all blob IDs. while test -s "$dir/trees" do sort <"$dir/trees" >"$dir/trees.current" rm "$dir/trees" xargs -P "$parallelism" -L 1 git ls-tree <"$dir/trees.current" | awk -v dir="$dir" -v pieces="$parallelism" ' $2 == "tree" {print $3 > (dir "/trees")} $2 == "blob" {print $3 >> (dir "/blobs" int(rand() * pieces))} ' done # Prefetch all blobs in randomized order. replstr="%" command="sort $replstr | git cat-file --batch >/dev/null" ls "$dir/blobs"* | xargs -P "$parallelism" -I "$replstr" -L 1 sh -c "$command" rm "$dir/trees.current" "$dir/blobs"* rmdir "$dir"