On Mon, 8 Apr 2024 at 09:02, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > What annoys me is that 'volatile' accesses have (at least) two distinct > meanings: > - Make this access untorn > - Prevent various optimisations (code motion, > common-subexpression-elimination, ...) Oh, I'm not at all trying to say that "volatile" is great. My argument was that the C (and C++, and Rust) model of attaching memory ordering to objects is actively bad. and limiting. Because the whole "the access rules are context-dependent" is really fundamental. Anybody who designs an atomic model around the object is simply not doing it right. Now, the "volatile" rules actually make sense in a historical "hardware access" context. So I do not think "volatile" is great, but I also don't think K&R were incompetent. "volatile" makes perfect sense in the historical setting of "direct hardware access". It just so happens that there weren't other tools, so then you end up using "volatile" for cached memory too when you want to get "access once" semantics, and then it isn't great. And then you have *too* many tools on the standards bodies, and they don't understand atomics, and don't understand volatile, and they have been told that "volatile" isn't great for atomics because it doesn't have memory ordering semantics, but do not understand the actual problem space. So those people - who in some cases spent decades arguing about (and sometimes against) "volatile" think that despite all the problems, the solution for atomics is to make the *same* mistake, and tie it to the data and the type system, not the action. Which is honestly just plain *stupid*. What made sense for 'volatile' in a historical setting, absolutely does not make sense for atomics. > As an example, folio_migrate_flags() (in mm/migrate.c): > > if (folio_test_error(folio)) > folio_set_error(newfolio); > if (folio_test_referenced(folio)) > folio_set_referenced(newfolio); > if (folio_test_uptodate(folio)) > folio_mark_uptodate(newfolio); > > ... which becomes... [ individual load and store code generation removed ] > In my ideal world, the compiler would turn this into: > > newfolio->flags |= folio->flags & MIGRATE_MASK; Well, honestly, we should just write the code that way, and not expect too much of the compiler. We don't currently have a "generic atomic or" operation, but we probably should have one. For our own historical reasons, while we have a few generic atomic operations: bit operations, cmpxchg, etc, most of our arithmetic and logical ops all rely on a special "atomic_t" type (later extended with "atomic_long_t"). The reason? The garbage that is legacy Sparc atomics. Sparc historically basically didn't have any atomics outside of the 'test and set byte' one, so if you wanted an atomic counter thing, and you cared about sparc, you had to play games with "some bits of the counter are the atomic byte lock". And we do not care about that Sparc horror any *more*, but we used to. End result: instead of having "do atomic ops on a normal type" - which would be a lot more powerful - we have this model of "do atomic ops on atomic_t". We could fix that now. Instead of having architectures define arch_atomic_or(int i, atomic_t *v) operations, we could - and should - make the 'arch' atomics be arch_atomic_or(int i, unsigned int *v) and then we'd still keep the "atomic_t" around for type safety reasons, but code that just wants to act on an "int" (or a "long") atomically could just do so. But in your case, I don't think you actually need it: > Part of that is us being dumb; folio_set_foo() should be __folio_set_foo() > because this folio is newly allocated and nobody else can be messing > with its flags word yet. I failed to spot that at the time I was doing > the conversion from SetPageFoo to folio_set_foo. This is part of my "context matters" rant and why I do *not* think atomics should be tied to the object, but to the operation. The compiler generally doesn't know the context rules (insert "some languages do know in some cases" here), which is why we as programmers should just use different operations when we do. In this case, since it's a new folio that hasn't been exposed to anybody, you should just have done exactly that kind of newfolio->flags |= folio->flags & MIGRATE_MASK; which we already do in the page initialization code when we know we own the flags (set_page_zone, set_page_zone, set_page_section). We've generally avoided doing this in general, though - even the buddy allocator seldom does it. The only case of manual "I know I own the flags" I know if (apart from the initialization itself) is ->flags &= ~PAGE_FLAGS_CHECK_AT_FREE; ... ->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; kinds of things at free/alloc time. > But if the compiler people could give us something a little more > granular than "scary volatile access disable everything", that would > be nice. Also hard, because now you have to figure out what this new > thing interacts with and when is it safe to do what. I think it would be lovely to have some kind of "atomic access" operations that the compiler could still combine when it can see that "this is invisible at a cache access level". But as things are now, we do have most of those in the kernel, and what you ask for can either be done today, or could be done (like that "arch_atomic_or()") with a bit of re-org. Linus