On Sun, Apr 19, 2020 at 10:08 PM Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > Do we have examples of doing exception handling from C? I thought all > the exception handling copy routines were assembly routines? You need assembler for the actual access, but that's a _single_ instruction - best done as inline asm. The best example of something that does *exactly* what you want to do is likely unsafe_get_user(); unsafe_put_user(); which basically turns into a single instruction with exception handling, with the exception hander jumping directly to an error label. Ok, so right now gcc can't do that for inline asm with outputs, so it generates fairly nasty code (a secondary register with the error state that then causes a conditional branch on it), but that's a compiler limitation that will eventually go away (where "eventially" means that it already works in LLVM with experimental patches). You could literally mis-use those helpers as-is (please don't - the code generation is correct, but at the very least we'd have to re-organize a bit to make it a better interface, ie have an alternative name like "unsafe_get_kernel()" for kernel pointer accesses). You'd have to do the alignment guarantees yourself, but there are examples of that in this area too (strnlen_user() does exactly that - aligned word accesses). So the point here is that the current interfaces are garbage, _if_ the whole "access a single value" is actually performance-critical. And if that is *not* the case, then the best thing to do is likely to just use a static call. No inlining of single instructions at all, just always use a function call, and then pick the function appropriately. Honestly, I can't imagine that the "single access" case is so timing-critical that the static call isn't the right model. Your use case is _not_ as important or common as doing user accesses. Finally, the big question is whether the completely broken hardware even matters. Are there actual customers that actually use the garbage "we can crash the machine" stuff? Because when it comes to things like nvdimms etc, the _primary_ performance target would be getting the kernel entirely out of the way, and allowing databases etc to just access the damn thing directly. And if you allow user space to access it directly, then you just have to admit that it's not a software issue any more - it's the hardware that is terminally broken and unusable garbage. It's not even interesting to work around things in the kernel, because user space can just crash the machine directly. This is why I absolutely detest that thing so much. The hardware is _so_ fundamentally broken that I have always considered the kernel workarounds to basically be "staging" level stuff - good enough for some random testing of known-broken stuff, but not something that anybody sane should ever use. So my preference would actually be to just call the broken cases to be largely ignored, at least from a performance angle. If you can only access it through the kernel, the biggest performance limitation is that you cannot do any DAX-like thing at all safely, so then the performance of some kernel accessors is completely secondary and meaningless. When a kernel entry/exit takes a few thousand cycles on the broken hardware (due to _other_ bugs), what's the point about worrying about trying to inline some single access to the nvdimm? Did the broken hardware ever spread out into the general public? Because if not, then the proper thing to do might be to just make it a compile-time option for the (few) customers that signed up for testing the initial broken stuff, and make the way _forward_ be a clean model without the need to worry about any exceptions at all. > The writes can mmu-fault now that memcpy_mcsafe() is also used by > _copy_to_iter_mcsafe(). This allows a clean bypass of the block layer > in fs/dax.c in addition to the pmem driver access of poisoned memory. > Now that the fallback is a sane rep; movs; it can be considered for > plain copy_to_iter() for other user copies so you get exception > handling on kernel access of poison outside of persistent memory. To > Andy's point I think a recoverable copy (for exceptions or faults) is > generally useful. I think that's completely independent. If we have good reasons for having targets with exception handling, then that has absolutely nothing to do with machine checks or buggy hardware. And it sure shouldn't be called "mcsafe", since it has nothing to do with that situation any more. > I understand the gripes about the mcsafe_slow() implementation, but > how do I implement mcsafe_fast() any better than how it is currently > organized given that, setting aside machine check handling, > memcpy_mcsafe() is the core of a copy_to_iter*() front-end that can > mmu-fault on either source or destination access? So honestly, once it is NOT about the broken machine check garbage, then it should be sold on its own independent reasons. Do we want to have a "copy_to_iter_safe" that can handle page faults? Because we have lots of those kinds of things, we have - load_unaligned_zeropad() This loads a single word knowing that the _first_ byte is valid, but can take an exception and zero-pad if it crosses a page boundary - probe_kernel_read()/write() This is a kernel memcpy() with the source/destination perhaps being unmapped. - various perf and tracing helpers that have special semantics. but once it's about some generic interface, then it also needs to take other architectures into account. Linus