Re: [PATCH 4/5] strmap: add strdup_strings option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 21, 2020 at 2:03 PM Jeff King <peff@xxxxxxxx> wrote:
>
> On Fri, Aug 21, 2020 at 01:41:44PM -0700, Elijah Newren wrote:
>
> > > This is actually one of the ugliest parts of string_list, IMHO, and I'd
> > > prefer if we can avoid duplicating it. Yes, sometimes we can manage to
> > > avoid an extra copy of a string. But the resulting ownership and
> > > lifetime questions are often very error-prone. In other data structures
> > > we've moved towards just having the structure own its data (e.g.,
> > > strvec does so, and things like oidmap store their own oids). I've been
> > > happy with the simplicity of it.
> > >
> > > It also works if you use a flex-array for the key storage in the
> > > strmap_entry. :)
> >
> > I can see how it's easier, but that worries me about the number of
> > extra copies for my usecase.  In order to minimize actual computation,
> > I track an awful lot of auxiliary data in merge-ort so that I know
> > when I can safely perform many different case-specific optimizations.
> > Among other things, this means 15 strmaps.  1 of those stores a
> > mapping from all paths that traverse_trees() walks over (file or
> > directory) to metadata about the content on the three different sides.
> > 9 of the remaining 14 simply share the strings in the main strmap,
> > because I don't need extra copies of the paths in the repository.  I
> > could (and maybe should) extend that to 11 of the 14.  Only 3 actually
> > do need to store a copy of the paths (because they store data used
> > beyond the end of an inner recursive merge or can be used to
> > accelerate subsequent commits in a rebase or cherry-pick sequence).
>
> I'd have to see the code, of course, but:

>   - keep in mind you're allocating 8 bytes for a pointer (plus 24 for
>     the rest of the strmap entry). If you use a flex-array you get those
>     8 bytes back. Full paths do tend to be longer than that, so it's
>     probably net worse than a pointer to an existing string. But how
>     much worse, and does it matter?

I'll investigate; it may take a while...

>   - That sounds like a lot of maps. :) I guess you've looked at
>     compacting some of them into a single map-to-struct?

Oh, map-to-struct is the primary use.  But compacting them won't work,
because the reason for the additional maps is that they have different
sets of keys (this set of paths meet a certain condition...).  Only
one map contains all the paths involved in the merge.

Also, several of those maps don't even store a value; and are really
just a set implemented via strmap (thus meaning the only bit of data I
need for some conditions is whether any given path meets it).  It
seems slightly ugly to have to call strmap_put(map, string, NULL) for
those.  I wonder if I should have another strset type much like your
suggesting for strintmap.  Hmm...

Also, one thing that inflates the number of strmaps I use is that
several of those conditions are specific to a certain side of the
merge, thus requiring two strmaps for each of those special
conditions.

> > So, in most my cases, I don't want to duplicate strings.  I actually
> > started my implementation using FLEX_ALLOC_STR(), as you suggested
> > earlier in this thread, but tossed it because of this same desire to
> > not duplicate strings but just share them between the strmaps.
> >
> > Granted, I made that decision before I had a complete implementation,
> > so I didn't measure the actual costs.  It's possible that was a
> > premature optimization.
>
> I'm just really concerned that it poisons the data structure with
> complexity that many of the other callers will have to deal with. We've
> had several "oops, strdup_strings wasn't what I expected it to be" bugs
> with string-list (in both directions: leaks and use-after-free). It
> would be nice to have actual numbers and see if it's worth the cost.

I'll go get some and find out what the impact is.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux