Re: [PATCH 4/5] strmap: add strdup_strings option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 21, 2020 at 01:41:44PM -0700, Elijah Newren wrote:

> > This is actually one of the ugliest parts of string_list, IMHO, and I'd
> > prefer if we can avoid duplicating it. Yes, sometimes we can manage to
> > avoid an extra copy of a string. But the resulting ownership and
> > lifetime questions are often very error-prone. In other data structures
> > we've moved towards just having the structure own its data (e.g.,
> > strvec does so, and things like oidmap store their own oids). I've been
> > happy with the simplicity of it.
> >
> > It also works if you use a flex-array for the key storage in the
> > strmap_entry. :)
> 
> I can see how it's easier, but that worries me about the number of
> extra copies for my usecase.  In order to minimize actual computation,
> I track an awful lot of auxiliary data in merge-ort so that I know
> when I can safely perform many different case-specific optimizations.
> Among other things, this means 15 strmaps.  1 of those stores a
> mapping from all paths that traverse_trees() walks over (file or
> directory) to metadata about the content on the three different sides.
> 9 of the remaining 14 simply share the strings in the main strmap,
> because I don't need extra copies of the paths in the repository.  I
> could (and maybe should) extend that to 11 of the 14.  Only 3 actually
> do need to store a copy of the paths (because they store data used
> beyond the end of an inner recursive merge or can be used to
> accelerate subsequent commits in a rebase or cherry-pick sequence).

I'd have to see the code, of course, but:

  - keep in mind you're allocating 8 bytes for a pointer (plus 24 for
    the rest of the strmap entry). If you use a flex-array you get those
    8 bytes back. Full paths do tend to be longer than that, so it's
    probably net worse than a pointer to an existing string. But how
    much worse, and does it matter?

  - That sounds like a lot of maps. :) I guess you've looked at
    compacting some of them into a single map-to-struct?

> So, in most my cases, I don't want to duplicate strings.  I actually
> started my implementation using FLEX_ALLOC_STR(), as you suggested
> earlier in this thread, but tossed it because of this same desire to
> not duplicate strings but just share them between the strmaps.
> 
> Granted, I made that decision before I had a complete implementation,
> so I didn't measure the actual costs.  It's possible that was a
> premature optimization.

I'm just really concerned that it poisons the data structure with
complexity that many of the other callers will have to deal with. We've
had several "oops, strdup_strings wasn't what I expected it to be" bugs
with string-list (in both directions: leaks and use-after-free). It
would be nice to have actual numbers and see if it's worth the cost.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux