On Thu, May 05, 2022 at 02:39:43PM +0800, Tong Tiangen wrote: > 在 2022/5/4 18:26, Catalin Marinas 写道: > > On Wed, Apr 20, 2022 at 03:04:15AM +0000, Tong Tiangen wrote: > > > Add copy_{to, from}_user() to machine check safe. > > > > > > If copy fail due to hardware memory error, only the relevant processes are > > > affected, so killing the user process and isolate the user page with > > > hardware memory errors is a more reasonable choice than kernel panic. > > > > Just to make sure I understand - we can only recover if the fault is in > > a user page. That is, for a copy_from_user(), we can only handle the > > faults in the source address, not the destination. > > At the beginning, I also thought we can only recover if the fault is in a > user page. > After discussion with a Mark[1], I think no matter user page or kernel page, > as long as it is triggered by the user process, only related processes will > be affected. According to this > understanding, it seems that all uaccess can be recovered. > > [1]https://patchwork.kernel.org/project/linux-arm-kernel/patch/20220406091311.3354723-6-tongtiangen@xxxxxxxxxx/ We can indeed safely skip this copy and return an error just like pretending there was a user page fault. However, my point was more around the "isolate the user page with hardware memory errors". If the fault is on a kernel address, there's not much you can do about. You'll likely trigger it later when you try to access that address (maybe it was freed and re-allocated). Do we hope we won't get the same error again on that kernel address? -- Catalin