RE: [PATCH 5/5] MIPS: LLVMLinux: Silence unicode warnings when preprocessing assembly.

Toma Tabacu <Toma.Tabacu@xxxxxxxxxx> · Thu, 5 Feb 2015 10:25:14 +0000

On Wed, 4 Feb 2015, Maciej W. Rozycki wrote:
> 2. It considers these character pairs to be unicode escapes in the first 
>    place given that they do not follow the syntax required for such 
>    escapes, that is `\unnnn', where `n' are hex digits.
> 

It doesn't actually treat them as unicode escapes, but it still warns the user,
in case they were meant to be unicode escapes. Here's the warning message:

arch/mips/include/asm/asmmacro.h:197:51: warning: \u used with no following hex digits; treating as '\' followed by identifier [-Wunicode]
         .word  0x41000000 | (\rt << 16) | (\rd << 11) | (\u << 5) | (\sel)
                                                          ^
I'll add it to the summary in v2.

> Of course it may be reasonable for us to work this bug around as we've 
> been doing for years with GCC, but has the issue been reported back to 
> clang maintainers?  What was their response?
> 

It hasn't been reported, but I don't think they would agree with removing
unicode escape sequences from the assembler-with-cpp mode because it is
currently being used for other languages as well, not just assembly.

One such language is Haskell (ghc, to be more specific), for which the clang
developers had to actually stop the preprocessor from enforcing the C universal
character name restrictions in assembler-with-cpp mode, which suggests that ghc
wants the preprocessor to check for unicode escape sequences.

At the moment, we can either disable -Wunicode for asmmacro.h or refrain from
using '\u' as an identifier.