在 2022/3/22 22:41, Arnd Bergmann 写道:
On Tue, Mar 22, 2022 at 1:55 PM chenjiahao (C) <chenjiahao16@xxxxxxxxxx> wrote:
在 2022/3/18 15:44, Arnd Bergmann 写道:
This should not result in any user visible difference, in both cases
user process will see a -EFAULT return code from its system call.
Are you able to come up with a test case that shows an observable
difference in behavior?
Actually, this patch do comes from a testcase failure, the code is
pasted below:
Thank you for the test case!
#define TMPFILE "__1234567890"
#define BUF_SIZE 1024
int main()
{
char buf[BUF_SIZE] = {0};
int fd;
int ret;
int err;
fd = open(TMPFILE, O_CREAT | O_RDWR);
if(-1 == fd)
{
perror("open");
return 1;
}
ret = pread64(fd, buf, -1, 1);
if((-1 == ret) && (EFAULT == errno))
{
close(fd);
unlink(TMPFILE);
printf("PASS\n");
return 0;
}
err = errno;
perror("pread64");
printf("err = %d\n", err);
close(fd);
unlink(TMPFILE);
printf("FAIL\n");
return 1;
}
The expected result is:
PASS
but the result of 32-bit testcase running in x86-64 kernel with compat
mode is:
pread64: Success
err = 0
FAIL
In my explanation, pread64 is called with count '0xffffffffull' and
offset '1', which might still not trigger
page fault in 64-bit kernel.
This patch uses TASK_SIZE as the addr_limit to performance a stricter
address check and intercepts
I see. So while the kernel behavior was not meant to change from
my patch, it clearly did, which may cause problems. However, I'm
not sure if the changed behavior is actually wrong.
the illegal pointer address from 32-bit userspace at a very early time.
Which is roughly the same
address limit check as __access_ok in arch/ia64.
This is why this fixes my testcase failure above, or have I missed
anything else?
My interpretation of what is going on here is that the pread64() call
still behaves according to the documented behavior, returning a small
number of bytes from the file, up to the first faulting address.
As the manual page for pread64() describes,
On success, pread() returns the number of bytes read
(a return of zero indicates end of file) and pwrite() returns
the number of bytes written.
Note that it is not an error for a successful call to transfer
fewer bytes than requested (see read(2)
and write(2)).
I have really missed this point. The behavior here is and should
be aligned with the manual definition.
The difference after my patch is that originally it returned
-EFAULT because part of the buffer is outside of the
addressable memory, but now it returns success because
part of the buffer is inside the addressable memory ;-)
I'm also not sure about which patch caused the change in
behavior, can you double-check that? The one you cited,
967747bbc084 ("uaccess: remove CONFIG_SET_FS"), does
not actually touch the x86 implementation, and commit
36903abedfe8 ("x86: remove __range_not_ok()") does touch
x86 but the limit was already TASK_SIZE_MAX since commit
47058bb54b57 ("x86: remove address space overrides using
set_fs()") back in linux-5.10.
I have performed the testcase on the same environment with
several old LTS kernel versions, all the results are "fault".
The behavior before and after your patches should be consistent.
So the fault should due to the disagreement between the
testcase's intention and the real pread64 definition. I have been
misled by the former one. Thanks for your interpretation.
Jiahao
Arnd
.