On 14/06/2022 15:14, Derrick Stolee wrote: > On 6/14/2022 9:36 AM, Richard Oliver wrote: >> Do not use oid_object_info() to determine object type in mktree_line() >> as this can cause promised objects to be dynamically faulted-in one at a >> time which has poor performance. Instead, use a combination of >> oid_object_info_extended() (with OBJECT_INFO_SKIP_FETCH_OBJECT option), >> and the newly introduced promisor_object_type() to determine object type >> before defaulting to fetch from remote. > > Have you run some performance tests on this? It seems like this will > scan all packed objects, which is probably much slower than just asking > the remote for the object in most cases. > > Thanks, > -Stolee Hi Stolee, I've put together a synthetic experiment below (adding a new blob to anexisting tree) to show you the behaviour that we've been seeing. Our actual use-case (where we first encountered this behaviour) is updating submodules to a known hash. As you can see, the round-trip time of fetching objects one-by-one is very expensive. Before, using system git (git version 2.32.0 (Apple Git-132)): > $ git init > # Fetch a recent tree > $ git fetch --filter=tree:0 --depth 1 https://github.com/git/git cdb48006b0ec7fe19794daf7b5363ab42d9d9371 > remote: Enumerating objects: 1, done. > remote: Counting objects: 100% (1/1), done. > remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0 > Receiving objects: 100% (1/1), 13.12 KiB | 13.12 MiB/s, done. > From https://github.com/git/git > * branch cdb48006b0ec7fe19794daf7b5363ab42d9d9371 -> FETCH_HEAD > > $ NEW_BLOB=$(echo zzz | git hash-object --stdin -w) > > $ cat <(git ls-tree FETCH_HEAD) <(printf "100644 blob ${NEW_BLOB}\tzzz") | time git mktree > remote: Enumerating objects: 1, done. > remote: Counting objects: 100% (1/1), done. > remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0 > Receiving objects: 100% (1/1), 334 bytes | 334.00 KiB/s, done. > remote: Enumerating objects: 1, done. > remote: Counting objects: 100% (1/1), done. > remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0 > Receiving objects: 100% (1/1), 2.01 KiB | 2.01 MiB/s, done. > remote: Enumerating objects: 1, done. > remote: Counting objects: 100% (1/1), done. > remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0 > Receiving objects: 100% (1/1), 256 bytes | 256.00 KiB/s, done. > # ... > # SOME TIME LATER > # ... > e26c7ce7357b1649da7b4200d4e80d0b668db8d4 > 286.49 real 15.66 user 15.59 sys Repeated experiment, but using modified mktree: > $ cat <(git ls-tree FETCH_HEAD) <(printf "100644 blob ${NEW_BLOB}\tzzz") | time git mktree > e26c7ce7357b1649da7b4200d4e80d0b668db8d4 > 0.06 real 0.01 user 0.03 sys Did you have any other sort of performance test in mind? The remotes we typically deal with are geographically far away and deal with a high volume of traffic so we're keen to move behaviour to the client where it makes sense to do so. Thanks, Richard