> > I am not sure I understand your range representation yet. > You need to view it with a fixed width font. Its not rocket science, token lists (or arrays) are viewed as dotted lists. The token.pos field is listed below each token as p[x] or as the file-location in file-scope. I'll come up with a patch to implement this scheme when I have time to and send it, it might take a while. -- Konrad > To be continue... > > Chris > > >> >> Note that a reference to p[] in p[x] notation only references >> the "start" of the PP_struct.copy. An uique identification >> of the "source" token might not always be possible because >> of disambiguities, so when doing a copy of the tokens in >> PP_struct.copy I might use an extended version of struct token >> to also include an offset. >> >> ----- file a.h start ----- >> #define D0(d0a0,d0a1) 1 D1(d0a0) 2 D2(d0a1) 3 >> #define D1(d1a0) 4 d1a0 5 >> #define D2(d2a0) 6 d2a0 7 >> #define D3(d3a0) 8 d3a0 9 >> D0(D3(10),11) >> ----- file a.h end ..... >> >> Preprocessor output (gcc -E a.h): "1 4 8 10 9 5 2 6 11 7 3" >> >> PreProcessor macro trace on p[]: >> >> p[0]:mdefn_body[D0] :1.D1.(.d0a0.).2.D2.(.d0a1.).3 >> [ a.h:1:23 .. a.h:1:45] >> p[1]:mdefn_body[D1] :4 . d1a0 . 5 >> [ a.h:2:18..a.h:2:25] >> p[2]:mdefn_body[D2] :6 . d2a0 . 7 >> [ a.h:3:18..a.h:3:25] >> p[3]:mdefn_body[D3] :8 . d3a0 . 9 >> [ a.h:4:18..a.h:4:25] >> p[4]:minst_arg0[D0] :D3 . ( . 10 . ) >> [ a.h:5:4..a.h:5:9] >> p[5]:minst_arg1[D0] :11 >> [a.h:5:11] >> p[6]:minst_arg0[D3] :10 >> p[4] >> p[7]:(args)expand[p[3]] :8 . 10 . 9 >> p[3] p[4] p[3] >> p[8]:minst_arg0[d2] :11 >> p[5] >> p[9]:(body)expand[p[2]] :6 . 11 . 7 >> p[2] p[5] p[2] >> p[10]:(body)expand[p[0]]:1 .4 .8 .10 .9 .5 .2 .6 .11 .7 .3 >> p[0]p[1]p[7]p[7]p[7]p[1]p[0]p[9]p[9]p[9]p[0] >> >> >> p[0]-p[3] are build up when the macro is defined. >> A p[] entry is needed to destinguish between >> the different sources of tokens. >> p[4],p[5] is build in collect_arguments() for D0(D3(10),11) >> p[6] is build in collect_arguments() for D3(10) >> p[7] is build in call to macro_expand() hook with flag that >> it is a (args)expand >> p[8] is build in collect_arguments() for D2(11) >> (inside D0's expansion >> p[9] is build in call to macro_expand() hook with flag that >> it is a (body)expand (of D2) >> p[10] is build in call to macro_expand() hook with flag that >> it is a (body)expand (of D0) >> >> PP_struct { >> enum {minst_arg, expand_body, expand_arg, mdef_body} typ; >> uint argidx; >> struct symbol *macro; >> struct token copy[]; >> }; >> >> Conclusion: >> ----------- >> Apart from the macro_expand() hook I also need hooks >> in macro definition and also in collect_arguments() or expand(). >> >> >> Concerning (3) How to connect (1) and (2) to the AST >> ---------------------------------------------------- >> >> can maybe wait for later iteration. There are more complex parts >> involved... >> >> >> >>> >>> Now how to connect the AST tree with those information is a >>> very good question. Notice the symbol->aux pointer? That is >>> the place to attach extra context or back end related data >>> to symbols. >>> >>> Because each symbol has "pos" and "endpos". If the symbol >>> is expand from macro, using the previous scheme, the pos >>> should point to a line in the "<pre-processor>" stream. >>> >>> However, if the macro expand is happen between "pos" and >>> "endpos", you will not able to access the token that contain >>> the macro expand "pos" easily. >>> >>> For that, we could, just thinking it out loud, add a parser >>> hook for declares when a symbol is complete building. >>> That would a very small and straight forward change. >>> If the hook is not NULL, the call back function will be call >>> with the symbol that just get defined, and the start and end >>> token of that symbol. >>> >>> So your dependence program just need to register the >>> symbol parsing hook. In side the call back function, walk >>> the token from start to end. Look up macro expand information >>> is needed. Build up the dependency struct and store that in >>> symbol->aux. >>> >>> BTW, unrelated to this patch, I can see other program might >>> be able to use the same parser hook to perform source code >>> transformations as well. >>> >>> Make sense? In this way, you don't even need the hash >>> table to attach a context into the token. You can get it directly >>> from symbol->aux. >>> >>>> In my patch I have modeled (2) using 2 structs: >>>> struct macro_expansion { >>>> int nargs; >>>> struct symbol *sym; >>>> struct token *m; >>>> struct arg args[0]; >>>> }; >>>> struct tok_macro_dep { >>>> struct macro_expansion *m; >>>> unsigned int argi; >>>> unsigned int isbody : 1; >>>> unsigned int visited : 1; >>>> }; >>>> Each token from a macro expansion gets tagged with >>>> tok_macro_dep. If it is an macro argument,<argi> shows the >>>> index, if it is from the macro body<isbody> is 1. >>>> Now, I didnt already think about special cases like >>>> token concaternation, even more data is needed to >>>> model this. Also when an macro argument is again used as an >>>> macro argument inside the body expansion, then I kindof >>>> loose the chain: I would also need a "token *dup_of" pointer >>>> to point to the original token that the token is a copy >>>> of (when arguments are created...) etc. >>>> >>>> I have read your macro_expand() hook idea, however >>>> when I understand it right you want to reuse position.stream and >>>> position.line as a kind of pointer (to save the extra 4 bytes). >>>> (Your goal is to minimize codebase change, however I wonder >>>> weather you dont change semantic of struct position and then >>>> need to change the code that uses struct position anyway...) >>> >>> >>> Nope, because the position.stream change is only happen on >>> your dependency analyse program. It is the dependency program >>> register the hook to it. This behaviour is private to the dependency >>> analyse program. Other program that use sparse library don't see >>> it at all, because they don't register macro_expand hooks to perform >>> those stream manipulations. It will receive the exact AST as before. >>> >>>> Maybe it is possible like this...I doubt it, where should >>>> all the extra context, that each token has, be saved and >>>> extracted from? using that sheme... >>> >>> >>> Two places, one is symbol->aux. Also the macro_expand >>> can be lookup by pos->line. That will index into the macro_expand >>> array which store the context. >>> >>> Having this two should be enough to put the exact same >>> dependency result as you are doing right now. >>> >>>> Maybe it is possible but I dont want to have as a design >>>> goal to save 4 bytes (I'd use the void *custom sheme to >>>> save all my extra data, also the pointers to tokens to >>>> "sit around") and adujust everything else to >>>> that. The consequence is that the code-complexity would >>>> grow on the other end. >>> >>> >>> It is not only about saving 4 bytes. It is about other program >>> don't have to suck in the full token struct if they don't need to. >>> It is about re-usable macro hooks and parser hooks that >>> external program can do more fancy stuff like source code transformations >>> without impacting the other user of the sparse lib. >>> >>>> Here is my compromise then: >>>> Keep the orignial "pos". But still grant me for >>>> each struct a "void *custom" pointer that I can use >>>> to store extradata i.e. pointer to token. >>> >>> >>> symbol->aux. >>> >>> Chris >>> >> -- To unsubscribe from this list: send the line "unsubscribe linux-sparse" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html