On Sun, Mar 3, 2024 at 5:31 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > On Sat, Mar 02 2024 at 14:49, Linus Torvalds wrote: > > On Sat, 2 Mar 2024 at 14:00, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > >> > >> I had commented out both. But the real reason is the EXPORT_SYMBOL, > >> which obviously wants to be EXPORT_PER_CPU_SYMBOL_GPL... > > > > Side note: while it's nice to hear that sparse kind of got this right, > > I wonder what gcc does when we start using the named address spaces > > for percpu variables. > > > > We actively make EXPORT_PER_CPU_SYMBOL_XYZ be a no-op for sparse > > exactly because sparse ended up warning about the regular > > EXPORT_SYMBOL, and we didn't have any "real" per-cpu export model. > > Right. > > > So EXPORT_PER_CPU_SYMBOL_GPL() is kind of an artificial "shut up > > sparse". > > Aside of that it's also making it clear what this is about. So I don't > think it's purely artifical. > > > But with __seg_gs/fs support for native percpu symbols with > > gcc, I wonder if we'll hit the same thing. Or is there something that > > makes gcc not warn about the named address spaces? > > Right now the pending code in tip does not complain about the > EXPORT_PER_CPU_SYMBOL_GPL() part because our current macro maze is > hideous. Here is the preprocessor output. > > This is DECLARE_PER_CPU() in the header: > > extern __attribute__((section(".data..percpu" ""))) __typeof__(u64) x86_spec_ctrl_current; > > Here is DEFINE_PER_CPU(): > > __attribute__((section(".data..percpu" ""))) __typeof__(u64) x86_spec_ctrl_current; > > And the EXPORT: > > extern typeof(x86_spec_ctrl_current) x86_spec_ctrl_current; > > static void * __attribute__((__used__)) > __attribute__((__section__(".discard.addressable"))) > __UNIQUE_ID___addressable_x86_spec_ctrl_current804 = (void *)(uintptr_t)&x86_spec_ctrl_current; > > asm(".section \".export_symbol\",\"a\" ; > __export_symbol_x86_spec_ctrl_current: ; > .asciz \"GPL\" ; .asciz \"\" ; .balign 8 ; .quad x86_spec_ctrl_current ; .previous"); > > And the __seg_gs magic happens only in the per CPU accessor itself: > > __attribute__((__noinline__)) __attribute__((no_instrument_function)) > __attribute((__section__(".noinstr.text"))) > __attribute__((__no_sanitize_address__)) > __attribute__((__no_profile_instrument_function__)) > u64 spec_ctrl_current(void) > { > return ({ > // this_cpu_read(x86_spec_ctrl_current) > > typeof(x86_spec_ctrl_current) pscr_ret__; > > do { const void *__vpp_verify = (typeof((&(x86_spec_ctrl_current)) + 0))((void *)0); (void)__vpp_verify; > } while (0); > > switch(sizeof(x86_spec_ctrl_current)) > { > case 1: pscr_ret__ = ({ > *(volatile typeof(x86_spec_ctrl_current) __seg_gs *)(typeof(*&(x86_spec_ctrl_current)) __seg_gs *)(uintptr_t)(&(x86_spec_ctrl_current)); }); > break; > case 2: pscr_ret__ = ({ > *(volatile typeof(x86_spec_ctrl_current) __seg_gs *)(typeof(*&(x86_spec_ctrl_current)) __seg_gs *)(uintptr_t)(&(x86_spec_ctrl_current)); }); > break; > case 4: pscr_ret__ = ({ > *(volatile typeof(x86_spec_ctrl_current) __seg_gs *)(typeof(*&(x86_spec_ctrl_current)) __seg_gs *)(uintptr_t)(&(x86_spec_ctrl_current)); }); > break; > case 8: pscr_ret__ = ({ > *(volatile typeof(x86_spec_ctrl_current) __seg_gs *)(typeof(*&(x86_spec_ctrl_current)) __seg_gs *)(uintptr_t)(&(x86_spec_ctrl_current)); }); > break; > > default: __bad_size_call_parameter(); break; > } > > pscr_ret__; > }); > } > > So all the export etc. just works because it all operates on a plain > data type and the __seg_gs is only bolted on via type casts in the > accessors. > > As the per cpu variables are in the .data..percpu section the linker > puts them at address 0 and upwards. So the cast to a __seg_gs pointer > makes it end up at the real kernel address because of GSBASE + "offset". > > The compiler converts this to RIP relative addressing: > > movq $0x0,%gs:0x7e14169f(%rip) # 1ba08 <fpu_fpregs_owner_ctx> > > This obviously has a downside. If I do: > > u64 foo; > > this_cpu_read(foo); > > the compiler is just happy to build that w/o complaining and it will > only explode at runtime because foo is a kernel data address which added > to GSBASE will result in accessing some random address: > > mov %gs:0x15d08d4(%rip),%rax # ffffffff834aac60 <x86_spec_ctrl_base> > > This is not at all different from the inline ASM based version which is > in your tree. The only difference is that the macro maze is pure C and > the __set_gs cast allows the compiler to (micro) optimize, e.g. 'mov > %gs:...; movzbl' into a single 'movzbl'. > > IOW, right now the only defense against such a mistake is actually the > sparse check. Maybe one of the coccinelle scripts has something similar, > I don't know. > > I did not follow the __set_gs work closely, so I don't know whether Uros > ever tried to actually mark the per CPU variable __set_gs right away, > which would obviously catch the above 'foo' nonsense. No, because [1]: "gcc does not provide a way to remove segment qualifiers, which is needed to use typeof() to create local instances of the per-cpu variable. For this reason, do not use the segment qualifier for per-cpu variables, and do casting using the segment qualifier instead." [1] https://lore.kernel.org/lkml/20190823224424.15296-3-namit@xxxxxxxxxx/ Uros. > > I think this should just work, but that would obviously require to do > the type cast magic at the EXPORT_SYMBOL side and in some other places. > > Thanks, > > tglx > >