On Mon, Nov 14, 2022 at 1:13 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > On Sun, 2022-11-13 at 23:52 -0800, Yonghong Song wrote: > > > > On 11/11/22 1:55 PM, Eduard Zingerman wrote: > > > On Fri, 2022-10-28 at 11:56 -0700, Yonghong Song wrote: > > > > > > [...] > > > > > > > > > > Ok, could we change the problem to detecting if some type is defined. > > > > > Would it be possible to have something like > > > > > > > > > > #if !__is_type_defined(struct abc) > > > > > struct abc { > > > > > }; > > > > > #endif > > > > > > > > > > I think we talked about this and there were problems with this > > > > > approach, but I don't remember details and how insurmountable the > > > > > problem is. Having a way to check whether some type is defined would > > > > > be very useful even outside of -target bpf parlance, though, so maybe > > > > > it's the problem worth attacking? > > > > > > > > Yes, we discussed this before. This will need to add additional work > > > > in preprocessor. I just made a discussion topic in llvm discourse > > > > > > > > https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268 > > > > > > > > Let us see whether we can get some upstream agreement or not. > > > > > > I did a small investigation of this feature. > > > > > > The main pre-requirement is construction of the symbol table during > > > source code pre-processing, which implies necessity to parse the > > > source code at the same time. It is technically possible in clang, as > > > lexing, pre-processing and AST construction happens at the same time > > > when in compilation mode. > > > > > > The prototype is available here [1], it includes: > > > - Change in the pre-processor that adds an optional callback > > > "IsTypeDefinedFn" & necessary parsing of __is_type_defined > > > construct. > > > - Change in Sema module (responsible for parsing/AST & symbol table) > > > that installs the appropriate "IsTypeDefinedFn" in the pre-processor > > > instance. > > > > > > However, this prototype builds a backward dependency between > > > pre-processor and semantic analysis. There are currently no such > > > dependencies in the clang code base. > > > > > > This makes it impossible to do pre-processing and compilation > > > separately, e.g. consider the following example: > > > > > > $ cat test.c > > > > > > struct foo { int x; }; > > > > > > #if __is_type_defined(foo) > > > const int x = 1; > > > #else > > > const int x = 2; > > > #endif > > > > > > $ clang -cc1 -ast-print test.c -o - > > > > > > struct foo { > > > int x; > > > }; > > > const int x = 1; > > > > > > $ clang -E test.c -o - > > > > > > # ... some line directives ... > > > struct foo { int x; }; > > > const int x = 2; > > > > Is it any chance '-E' could output the same one as '-cc1 -ast-print'? > > That is, even with -E we could do some semantics analysis > > as well, using either current clang semantics analysis or creating > > an minimal version of sema analysis in preprocessor itself? > > Sema drives consumption of tokens from Preprocessor. Calls to > Preprocessor are done on a parsing recursive descent. Extracting a > stream of tokens would require an incremental parser instead. > > A minimal version of such parser is possible to implement for C. > It might be the case that matching open / closing braces and > identifiers following 'struct' / 'union' / 'enum' keywords might be > almost sufficient but I need to try to be sure (e.g. it is more > complex for 'typedef'). > > I can work on it but I don't think there is a chance to upstream this work. Right. It's going to be C only. C++ with namespaces and nested class decls won't work with simple type parser. On the other side if we're asking preprocessor to look for 'struct foo' and remember that 'foo' is a type maybe we can add a regex-search instead? It would be a bit more generic and will work for basic union/struct foo definition? Something like instead of: #if __is_type_defined(foo) use: #if regex(struct[\t]+foo) enums are harder in this approach, but higher chance to land? regex() would mean "search for this pattern in the file until this line. Or some other preprocessor "language" tricks? For example: The preprocessor would grep for 'struct *' in a single line while processing a file and emit #define __secret_prefix_##$1 where $1 would be a capture from "single line regex". Then later in the same file instead of: #if __is_type_defined(foo) use: #ifdef __secret_prefix_foo This "single line regex" may look like: #if regex_in_any_later_line(struct[\t]+[a-zA-Z_]+) define __secret_prefix_$2