在 2024/1/30 1:43, Mark Rutland 写道:
On Mon, Jan 29, 2024 at 09:46:49PM +0800, Tong Tiangen wrote:
If user process access memory fails due to hardware memory error, only the
relevant processes are affected, so it is more reasonable to kill the user
process and isolate the corrupt page than to panic the kernel.
Signed-off-by: Tong Tiangen <tongtiangen@xxxxxxxxxx>
---
arch/arm64/lib/copy_from_user.S | 10 +++++-----
arch/arm64/lib/copy_to_user.S | 10 +++++-----
arch/arm64/mm/extable.c | 8 ++++----
3 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
index 34e317907524..1bf676e9201d 100644
--- a/arch/arm64/lib/copy_from_user.S
+++ b/arch/arm64/lib/copy_from_user.S
@@ -25,7 +25,7 @@
.endm
.macro strb1 reg, ptr, val
- strb \reg, [\ptr], \val
+ USER(9998f, strb \reg, [\ptr], \val)
.endm
This is a store to *kernel* memory, not user memory. It should not be marked
with USER().
This does cause some misconceptions, and my original idea was to reuse
the fixup capability of USER().
I understand that you *might* want to handle memory errors on these stores, but
the commit message doesn't describe that and the associated trade-off. For
example, consider that when a copy_form_user fails we'll try to zero the
remaining buffer via memset(); so if a STR* instruction in copy_to_user
faulted, upon handling the fault we'll immediately try to fix that up with some
more stores which will also fault, but won't get fixed up, leading to a panic()
anyway...
When copy_from_user() triggers a memory error, there are two cases: ld
user memory error and st kernel memory error. The former can clear the
remaining kernel memory, and the latter cannot be cleared because the
page is poison.
The purpose of memset() is to keep the data consistency of the kernel
memory (or multiple subsequent pages) (the data that is not copied
should be set to 0). My consideration here is that since our ultimate
goal is to kill the owner thread of the kernel memory data, the
"consistency" of the kernel memory data is not so important, but
increases the processing complexity.
The trade-offs do need to be added to commit message after agreement
is reached :)
Further, this change will also silently fixup unexpected kernel faults if we
pass bad kernel pointers to copy_{to,from}_user, which will hide real bugs.
I think this is better than the panic kernel, because the real bugs
belongs to the user process. Even if the wrong pointer is
transferred, the page corresponding to the wrong pointer has a memroy
error. In addition, the panic information contains necessary information
for users to check.
So NAK to this change as-is; likewise for the addition of USER() to other ldr*
macros in copy_from_user.S and the addition of USER() str* macros in
copy_to_user.S.
If we want to handle memory errors on some kaccesses, we need a new EX_TYPE_*
separate from the usual EX_TYPE_KACESS_ERR_ZERO that means "handle memory
errors, but treat other faults as fatal". That should come with a rationale and
explanation of why it's actually useful.
This makes sense. Add kaccess types that can be processed properly.
[...]
diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
index 478e639f8680..28ec35e3d210 100644
--- a/arch/arm64/mm/extable.c
+++ b/arch/arm64/mm/extable.c
@@ -85,10 +85,10 @@ bool fixup_exception_mc(struct pt_regs *regs)
if (!ex)
return false;
- /*
- * This is not complete, More Machine check safe extable type can
- * be processed here.
- */
+ switch (ex->type) {
+ case EX_TYPE_UACCESS_ERR_ZERO:
+ return ex_handler_uaccess_err_zero(ex, regs);
+ }
Please fold this part into the prior patch, and start ogf with *only* handling
errors on accesses already marked with EX_TYPE_UACCESS_ERR_ZERO. I think that
change would be relatively uncontroversial, and it would be much easier to
build atop that.
OK, the two patches will be merged in the next release.
Many thanks.
Tong.
Thanks,
Mark.
.