On Tue, Dec 14, 2021 at 4:37 PM Jeff King <peff@xxxxxxxx> wrote: > > It gets about 2.3GB/s with the tip of 'master' and 3.2GB/s with the > equivalent of your patch (using LARGE_PACKET_DATA_MAX). So definitely an > improvement. > > Without the cached case (so actually running pack-objects, albeit a > pretty quick one because of bitmaps and pack-reuse), the timings are > about the same (171MB/s versus 174MB/s, but really it's just pegging a > CPU running pack-objects). So it would be fine to just do this > unconditionally, I think. Thank you for validating this! > Looking at strace, the other thing I notice is that we write() the > packet header separately in send_sideband(), which doubles the number of > syscalls. I hackily re-wrote this to use writev() instead (patch below), > but it doesn't seem to actually help much (maybe a curiosity to explore > further, but definitely not something to hold up your patch). Those double writes bug me too. Getting rid of them was one of the things I liked about using stdio here. Curious now if writev will make an impact in my benchmarks. As your experiments indicate, the double writes may not be a problem in real life. Either way, I also feel it is not something to rope into this patch set.