On Mon, Jun 7, 2021 at 10:45 AM Segher Boessenkool <segher@xxxxxxxxxxxxxxxxxxx> wrote: > > On Sun, Jun 06, 2021 at 03:38:06PM -0700, Linus Torvalds wrote: > > > > Example: variable_test_bit(), which generates a "bt" instruction, does > > > > : "m" (*(unsigned long *)addr), "Ir" (nr) : "memory"); > > > > and the memory clobber is obviously wrong: 'bt' only *reads* memory, > > but since the whole reason we use it is that it's not just that word > > at address 'addr', in order to make sure that any previous writes are > > actually stable in memory, we use that "memory" clobber. > > You can split the "I" version from the "r" version, it does not need > the memory clobber. If you know the actual maximum bit offset used you > don't need the clobber for "r" either. Or you could even write > "m"(((unsigned long *)addr)[nr/32]) > That should work for all cases. Note that the bit test thing really was just an example. And some other cases don't actually have an address range at all, because they affect arbitrary ranges, not - like that bit test - just one particular range. To pick a couple of examples of that, think of (a) write memory barrier. On some architectures it's an explicit instruction, on x86 it's just a compiler barrier, since writes are ordered on the CPU anyway, and we only need to make sure that the compiler doesn't re-order writes around the barrier Again, we currently use that same "barrier()" macro for that: #define __smp_wmb() barrier() but as mentioned, the barrier() thing has a "memory" clobber, and that means that this write barrier - which is really really cheap on x86 - also unnecessarily ends up causing pointless reloads from globals. It obviously doesn't actually *change* memory, but it very much requires that writes are not moved around it. (b) things like cache flush and/or invalidate instructions, eg asm volatile("wbinvd": : :"memory"); Again, this one doesn't actually *modify* memory, and honestly, this one is not performance critical so the memory clobber is not actually a problem, but I'm pointing it out as an example of the exact same issue: the notion of an instruction that we don't want _writes_ to move around, but reads can happily be moved and/or cached around it. (c) this whole "volatile_if()" situation: we want to make sure writes can't move around it, but there's no reason to re-load memory values, because it doesn't modify memory, and we only need to make sure that any writes are delayed to after the conditional. We long long ago (over 20 years by now) used to do things like this: struct __dummy { unsigned long a[100]; }; #define ADDR (*(volatile struct __dummy *) addr) __asm__ __volatile__( "btl %2,%1\n\tsbbl %0,%0" :"=r" (oldbit) :"m" (ADDR),"ir" (nr)); for that test-bit thing. Note how the above doesn't need the memory clobber, because for gcc that ADDR thing (access to a big struct) ends up being a "BLKmode" read, and then gcc at least used to treat it as an arbitrary read. I forget just why we had to stop using that trick, I think it caused some reload confusion for some gcc version at some point. Probably exactly because inline asms had issues with some BLKmode thing. That change happened before 2001, we didn't have nice changelogs with detailed commit messages back then, so > > Anybody have ideas or suggestions for something like that? > > Is it useful in general for the kernel to have separate "read" and > "write" clobbers in asm expressions? And for other applications? See above. It's actually not all that uncommon that you have a "this doesn't modify memory, but you can't move writes around it". It's usually very much about cache handling or memory ordering operations, and that bit test example was probably a bad example exactly because it made it look like it's about some controlled range. The "write memory barroer" is likely the best and simplest example, but it's in not the only one. Linus