On Thu, Aug 10, 2017 at 11:47 AM, Kevin Willford <kcwillford@xxxxxxxxx> wrote: > String formatting can be a performance issue when there are > hundreds of thousands of trees. When changing this for the sake of performance, could you give an example (which kind of repository you need for this to become a bottleneck? I presume the large Windows repo? Or can I reproduce it with a small repo such as linux.git or even git.git?) and some numbers how this improves the performance? > Change to stop using the strbuf_addf and just add the strings > or characters individually. > > There are a limited number of modes so added a switch for the > known ones and a default case if something comes through that > are not a known one for git. > > Signed-off-by: Kevin Willford <kewillf@xxxxxxxxxxxxx> > --- > cache-tree.c | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > diff --git a/cache-tree.c b/cache-tree.c > index 2440d1dc89..41744b3db7 100644 > --- a/cache-tree.c > +++ b/cache-tree.c > @@ -390,7 +390,29 @@ static int update_one(struct cache_tree *it, > continue; > > strbuf_grow(&buffer, entlen + 100); > - strbuf_addf(&buffer, "%o %.*s%c", mode, entlen, path + baselen, '\0'); > + > + switch (mode) { > + case 0100644: > + strbuf_add(&buffer, "100644 ", 7); > + break; > + case 0100664: > + strbuf_add(&buffer, "100664 ", 7); > + break; > + case 0100755: > + strbuf_add(&buffer, "100755 ", 7); > + break; > + case 0120000: > + strbuf_add(&buffer, "120000 ", 7); > + break; > + case 0160000: > + strbuf_add(&buffer, "160000 ", 7); > + break; Maybe it is worth spelling out the modes in non-numeric, but e.g. S_IFGITLINK. > + default: > + strbuf_addf(&buffer, "%o ", mode); Given the repository you are measuring, maybe we could get away with fewer entries here and only take the 2 or 3 most used entries and special case them? Or in case this is assumed to be the exhaustive list, we could issue a warning here? Thanks, Stefan