On Sat, Apr 18, 2020 at 1:52 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Sat, Apr 18, 2020 at 1:30 PM Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > > > > Maybe I’m missing something obvious, but what’s the alternative? The _mcsafe variants don’t just avoid the REP mess — they also tell the kernel that this particular access is recoverable via extable. > > .. which they could easily do exactly the same way the user space > accessors do, just with a much simplified model that doesn't even care > about multiple sizes, since unaligned accesses weren't valid anyway. > > The thing is, all of the MCS code has been nasty. There's no reason > for it what-so-ever that I can tell. The hardware has been so > incredibly broken that it's basically unusable, and most of the > software around it seems to have been about testing. > > So I absolutely abhor that thing. Everything about that code has > screamed "yeah, we completely mis-designed the hardware, we're pushing > the problems into software, and nobody even uses it or can test it so > there's like 5 people who care". > > And I'm pushing back on it, because I think that the least the code > can do is to at least be simple. > > For example, none of those optimizations should exist. That function > shouldn't have been inline to begin with. And if it really really > matters from a performance angle that it was inline (which I doubt), > it shouldn't have looked like a memory copy, it should have looked > like "get_user()" (except without all the complications of actually > having to test addresses or worry about different sizes). > > > And it almost certainly shouldn't have been done in low-level asm > either. It could have been a single "read aligned word" interface > using an inline asm, and then everything else could have been done as > C code around it. Do we have examples of doing exception handling from C? I thought all the exception handling copy routines were assembly routines? > > But no. The software side is almost as messy as the hardware side is. > I hate it. And since nobody sane can test it, and the broken hardware > is _so_ broken than nobody should ever use it, I have continually > pushed back against this kind of ugly nasty special code. > > We know the writes can't fault, since they are buffered. So they > aren't special at all. The writes can mmu-fault now that memcpy_mcsafe() is also used by _copy_to_iter_mcsafe(). This allows a clean bypass of the block layer in fs/dax.c in addition to the pmem driver access of poisoned memory. Now that the fallback is a sane rep; movs; it can be considered for plain copy_to_iter() for other user copies so you get exception handling on kernel access of poison outside of persistent memory. To Andy's point I think a recoverable copy (for exceptions or faults) is generally useful. > We know the acceptable reads for the broken hardware basically boil > down to a single simple word-size aligned read, so you need _one_ > special inline asm for that. The rest of the cases can be handled by > masking and shifting if you really really need to - and done better > that way than with byte accesses anyway. > > Then you have _one_ C file that implements everything using that > single operation (ok, if people absolutely want to do sizes, I guess > they can if they can just hide it in that one file), and you have one > header file that exposes the interfaces to it, and you're done. > > And you strive hard as hell to not impact anything else, because you > know that the hardware is unacceptable until all those special rules > go away. Which they apparently finally have. I understand the gripes about the mcsafe_slow() implementation, but how do I implement mcsafe_fast() any better than how it is currently organized given that, setting aside machine check handling, memcpy_mcsafe() is the core of a copy_to_iter*() front-end that can mmu-fault on either source or destination access?