On 08.11.23 12:47, David Wang wrote:
Hi,
According to https://lwn.net/Articles/865256/,
the memory address got from memfd_secret/ftruncate/mmap should not be used by syscalls, since it is not accessible even by kernel.
But my test result shows that the "secret" memory could be used in syscall write, is this expected behavior?
This is my test code:
CCing Mike.
According to the man page:
"The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
I'm not sure if the last part is actually true, if the syscalls end up
walking user page tables to copy data in/out.
int main() {
int fd = syscall(__NR_memfd_secret, 0);
if (fd < 0) {
perror("Fail to create secret");
return -1;
}
if (ftruncate(fd, 1024) < 0) {
perror("Fail to size the secret");
return -1;
}
char *key = mmap(NULL, 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (key == MAP_FAILED) {
perror("Fail to mmap");
return -1;
}
// should be some secure channel
strcpy(key, "ThisIsAKey");
// printf("[%d]key(%s) ready: %p\n", getpid(), key, key);
// getchar();
// make syscall, should err
write(STDOUT_FILENO, key, strlen(key)); //<-- Here the key shows up on stdout.
What probably happens here is that the kernel reads the data via the
user page tables, and can, therefore, access that memory just fine.
Looking at the selftest (tools/testing/selftests/mm/memfd_secret.c) we
test that we cannot read from the memfd and cannot write to the memfd.
We don't test if other syscalls can access that user-provided buffer
that is backed by a memfd.
--
Cheers,
David / dhildenb