Re: [PATCH v2] Avoid reusing string buffer when doing string expansion

Luc Van Oostenryck <luc.vanoostenryck@xxxxxxxxx> · Thu, 5 Feb 2015 00:38:03 +0100

On Wed, Feb 04, 2015 at 12:01:39AM -0800, Christopher Li wrote:
> On Tue, Feb 3, 2015 at 10:22 PM, Luc Van Oostenryck
> <luc.vanoostenryck@xxxxxxxxx> wrote:
> >> Are you sure about this behavior? You mean you see "b" has the string
> >> size as 2. I haven't understand how this can happen.
> >
> >
> > But if the macro is used several times:
> > ===
> > #define BACKSLASH "\\"
> > const char a[] = BACKSLASH;
> > const char b[] = BACKSLASH;
> > const char c[] = "<" BACKSLASH ">";
> > ===
> >
> > the, we get:
> > ===
> > symbol a:
> >         char const [addressable] [toplevel] a[0]
> >         bit_size = 16
> >         val = "\0"
> > symbol b:
> >         char const [addressable] [toplevel] b[0]
> >         bit_size = 16
> 
> The value buffer is corrupted. But the bit_size is still 16, which
> is correct. I just think that in your example it shouldn't corrupt
> the size. Your test case seems confirm that.
> 
> > Is it only with macros that the string structure is so shared?
> 
> That is right. I haven't see it can happen any other way.
> The tokenizer always construct new token and string structure
> from the C source file.
> 
> It is the preprocessor using macro expand which copy and duplicate
> the token list. The token has a pointer point to the string which
> is shared across different invocation of macro.

Fine.
I was affraid that there was other possibilities, like, for exemple,
if the identical string litterals are put in an hash table, like it is done
for identifiers.

> > And have we a way to test if the string is coming from a macro?
> 
> Not right now. But we can add it.
> 
> >
> > A simpler and safer way would be to directly do the string expansion just after
> > a string token is recognized, or even better in the lexer itself.
> > So the string buffer, macro or not, will always directly contain the right values.
> > But maybe there was good reasons to not do it this way.
> 
> I have an counter example that will not work. Let say
> 
> #define b(a, d) a##d
> wchar_t s[] = b(L, "\xabcdabc");
> 
> When the lexer process the escape char, you did not know the string
> is wide char or not. That can be changed after the macro expansion.
> 
> Chris

Yes, I see.

BTW, I've checked and there is a lot of problems with wide strings.
I'll send some test case later.

Regards,
Luc
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html