On Fri, May 1, 2020 at 7:09 AM Luck, Tony <tony.luck@xxxxxxxxx> wrote: > > > Now maybe copy_to_user() should *always* work this way, but I’m not convinced. > > Certainly put_user() shouldn’t — the result wouldn’t even be well defined. And I’m > > unconvinced that it makes much sense for the majority of copy_to_user() callers > > that are also directly accessing the source structure. > > One case that might work is copy_to_user() that's copying from the kernel page cache > to the user in response to a read(2) system call. Action would be to check if we could > re-read from the file system to a different page. If not, return -EIO. Either way ditch the > poison page from the page cache. > I think that, before we do too much design of the semantics of just the copy function, we need a design for the whole system. Specifically: When the kernel finds out that a kernel page is bad (via #MC or via any other mechanism), what does the kernel do? Does it unmap it? Does it replace it with a dummy page? Does it leave it there? When a copy function hits a bad page and the page is not yet known to be bad, what does it do? (I.e. the page was believed to be fine but the copy function gets #MC.) Does it unmap it right away? What does it return? When a copy function hits a page that is already known to be bad because the kernel got the "oh crap, bad page" notification earlier, what does it do? Return -EIO? Take some fancier action under the assumption that it's called in a preemptible, IRQs-on context, whereas the original #MC or other hardware notification may have come at a less opportune time?