This is a multi-part message in MIME format.
Josh Triplett wrote:
> On Wed, 2007-06-27 at 14:51 +0100, Rob Taylor wrote:
>> Here's something I've hacked up for my work on gobject-introspection
>> [1]. It basically dumps the parse tree for a given file as simplistic
>> xml, suitable for further transformation by something else (in my case,
>> some python).
>>
>> I'd expect this to also be useful for code navigation in editors and c
>> refactoring tools, but I've really only focused on my needs for c api
>> description.
>>
>> There are 3 patches here. The first introduces a field in the symbol
>> struct for the end position of the symbol. I've added this in my case
>> for documentation generation, but again I think it'd be useful in other
>> cases. The next introduces a sparse_keep_tokens, which parses a file,
>> but doesn't free the tokens after parsing. The final one adds c2xml and
>> the DTD for the xml format. It builds conditionally on whether libxml2
>> is available.
>>
>> All feedback appreciated!
>
> Wow. Very nice. I can already think of several other uses for this.
Glad you like it :) OOI, what other uses are you thinking of?
> A few suggestions:
>
> * Please sign off your patches. See
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;hb=HEAD;f=Documentation/SubmittingPatches , section "Sign your work", for details on the Developer's Certificate of Origin and the Signed-off-by convention. I really need to include some documentation in the Sparse source tree, though.
Ah, I did wonder what the 'signed-off-by' signified.
> * Rather than specifying start="line:col" end="line:col", how
> about splitting those up into start-line, start-col, end-line,
> and end-col? That would avoid the need to do string parsing
> after reading the XML.
Yes. I originally had a more human-readable form, and this is a hangover
from that approach.
> * Positions have file information associated with them. A symbol
> might potentially start in one file and end in another, if
> people play crazy games with #include. start-file and end-file?
Yes, optional end-file would be sensible. Hopefully it wouldn't occur
very often ;)
> * Typo in examine_namespace: "Unregonized namespace".
yes.
> * get_type_name seems generally useful, and several other parts of
> Sparse (such as in evaluate.c and show-parse.c) could become
> simpler by using it. How about putting it in symbol.c and
> exposing it via symbol.h? Can you do that in a separate patch,
> please?
Sure.
> * Also, should get_type_name perhaps look up the string in an
> array rather than using switch? (I don't know which makes more
> sense.)
Yeah, an array lookup would be better.
> * I don't know how much work this would require, but it doesn't
> seem like c2xml gets much value out of using libxml, so would it
> make things very painful to just print XML directly? It would
> certainly make things like BAD_CAST and having to snprintf to
> local buffers go away. If you count on libxml for some form of
> escaping or similar, please ignore this; however, as far as I
> can tell, all of the strings that c2xml works with (such as
> identifiers) can't have unusual characters in them.
Well, I'm using the tree builder. It would be non-trivial to rewrite
without it - see in examine_symbol where I add new nodes to the root
node and recurse from there.
> * Please don't include vim modelines in source files. (Same goes
> for emacs and similar.)
Sure
> * Please explicitly limit the possible values of the type
> attribute to those that Sparse produces, rather than allowing
> any arbitrary CDATA. The same goes for a few other
Ah, yes, good idea.
<snip>
> * In examine_modifiers, please use C99-style designated assignment
> for the modifiers array, for clarity and robustness.
Hmm, not sure how best to do this. Redefine MOD_* in terms of shifts of
some linearly assigned constants?
> * I suspect several of the modifiers in examine_modifiers don't
> need to generate output; I think you want to ignore everything
> in MOD_IGNORE.
Do we really want to not emit any from MOD_STORAGE? I guess if we have
scoping info at a later date, we can certainly drop MOD_TOPLEVEL, but
that seems useful ATM. MOD_ADDRESSABLE seems useful. MOD_ASSIGNED,
MOD_USERTYPE, MOD_FORCE, MOD_ACCESSED and MOD_EXPLICTLY_SIGNED don't
seem very useful though.
I think MOD_TYPEDEF would be useful,but I never actually see it. Do you
know what's going on here?
Attached you should find the updated patchset with all the changes
discussed apart from the modifiers stuff discussed above.
<snip>
>
> Note that you don't need to address all of these before resending. In
> particular, I'd love to merge the first patch, and I just need a signoff
> for it.
>
> Thanks again for this work; it looks great, and highly useful.
Thanks to you too!
Rob Taylor