Re: [PATCH] c2xml

Rob Taylor <rob.taylor@xxxxxxxxxxxxxxx> · Mon, 02 Jul 2007 13:32:26 +0100

This is a multi-part message in MIME format.
Josh Triplett wrote:
> On Wed, 2007-06-27 at 14:51 +0100, Rob Taylor wrote:
>> Here's something I've hacked up for my work on gobject-introspection
>> [1]. It basically dumps the parse tree for a given file as simplistic
>> xml, suitable for further transformation by something else (in my case,
>> some python).
>>
>> I'd expect this to also be useful for code navigation in editors and c
>> refactoring tools, but I've really only focused on my needs for c api
>> description.
>>
>> There are 3 patches here. The first introduces a field in the symbol
>> struct for the end position of the symbol. I've added this in my case
>> for documentation generation, but again I think it'd be useful in other
>> cases. The next introduces a sparse_keep_tokens, which parses a file,
>> but doesn't free the tokens after parsing. The final one adds c2xml and
>> the DTD for the xml format. It builds conditionally on whether libxml2
>> is available.
>>
>> All feedback appreciated!
> 
> Wow.  Very nice.  I can already think of several other uses for this.

Glad you like it :) OOI, what other uses are you thinking of?

> A few suggestions:
> 
>       * Please sign off your patches.  See
>         http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;hb=HEAD;f=Documentation/SubmittingPatches , section "Sign your work", for details on the Developer's Certificate of Origin and the Signed-off-by convention.  I really need to include some documentation in the Sparse source tree, though.

Ah, I did wonder what the 'signed-off-by' signified.

>       * Rather than specifying start="line:col" end="line:col", how
>         about splitting those up into start-line, start-col, end-line,
>         and end-col?  That would avoid the need to do string parsing
>         after reading the XML.

Yes. I originally had a more human-readable form, and this is a hangover
from that approach.

>       * Positions have file information associated with them.  A symbol
>         might potentially start in one file and end in another, if
>         people play crazy games with #include.  start-file and end-file?

Yes, optional end-file would be sensible. Hopefully it wouldn't occur
very often ;)

>       * Typo in examine_namespace: "Unregonized namespace".
yes.

>       * get_type_name seems generally useful, and several other parts of
>         Sparse (such as in evaluate.c and show-parse.c) could become
>         simpler by using it.  How about putting it in symbol.c and
>         exposing it via symbol.h?  Can you do that in a separate patch,
>         please?

Sure.
>       * Also, should get_type_name perhaps look up the string in an
>         array rather than using switch?  (I don't know which makes more
>         sense.)

Yeah, an array lookup would be better.

>       * I don't know how much work this would require, but it doesn't
>         seem like c2xml gets much value out of using libxml, so would it
>         make things very painful to just print XML directly?  It would
>         certainly make things like BAD_CAST and having to snprintf to
>         local buffers go away.  If you count on libxml for some form of
>         escaping or similar, please ignore this; however, as far as I
>         can tell, all of the strings that c2xml works with (such as
>         identifiers) can't have unusual characters in them.

Well, I'm using the tree builder. It would be non-trivial to rewrite
without it - see in examine_symbol where I add new nodes to the root
node and recurse from there.

>       * Please don't include vim modelines in source files.  (Same goes
>         for emacs and similar.)

Sure

>       * Please explicitly limit the possible values of the type
>         attribute to those that Sparse produces, rather than allowing
>         any arbitrary CDATA.  The same goes for a few other 

Ah, yes, good idea.

<snip>

>       * In examine_modifiers, please use C99-style designated assignment
>         for the modifiers array, for clarity and robustness.

Hmm, not sure how best to do this. Redefine MOD_* in terms of shifts of
some linearly assigned constants?

>       * I suspect several of the modifiers in examine_modifiers don't
>         need to generate output; I think you want to ignore everything
>         in MOD_IGNORE.

Do we really want to not emit any from MOD_STORAGE? I guess if we have
scoping info at a later date, we can certainly drop MOD_TOPLEVEL, but
that seems useful ATM. MOD_ADDRESSABLE seems useful. MOD_ASSIGNED,
MOD_USERTYPE, MOD_FORCE, MOD_ACCESSED and MOD_EXPLICTLY_SIGNED don't
seem very useful though.

I think MOD_TYPEDEF would be useful,but I never actually see it. Do you
know what's going on here?

Attached you should find the updated patchset with all the changes
discussed apart from the modifiers stuff discussed above.

<snip>

> 
> Note that you don't need to address all of these before resending.  In
> particular, I'd love to merge the first patch, and I just need a signoff
> for it.
> 
> Thanks again for this work; it looks great, and highly useful.

Thanks to you too!

Rob Taylor