On 1/10/25 11:01 AM, Matthew Wilcox wrote:
On Fri, Jan 10, 2025 at 10:04:42AM -0800, Yang Shi wrote:
On 1/9/25 8:31 PM, Matthew Wilcox wrote:
On Thu, Jan 09, 2025 at 09:00:24AM -0800, Yang Shi wrote:
Thanks for catching this. It sounds a little bit weird to have vm_file for
an anonymous VMA. I'm not sure why we should keep such special case. It
seems shared mapping is treated as shmem file mapping. So can we set vm_file
to NULL when mmap'ing /dev/zero for private mapping? Something like:
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 169eed162a7f..fc332efc5c11 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -527,6 +527,7 @@ static int mmap_zero(struct file *file, struct
vm_area_struct *vma)
if (vma->vm_flags & VM_SHARED)
return shmem_zero_setup(vma);
vma_set_anonymous(vma);
+ vma->vm_file = NULL;
return 0;
}
I'm wary this might cause other bugs somewhere. rc6 is a bit late to be
introducing such a subtle change.
Thanks for the extra caution. Applying the proposed fix in khugepaged code
is fine to me either. We can try to kill the special case later.
Looking at the code further, I think we should do more to make private
/dev/zero mapping an anonymous mapping:
I'm still nervous about this. We map device inodes in a lot of places.
Yes, we do. But I don't think this change actually changes the semantic
of /dev/zero. Shared /dev/zero mapping is still treated as shmem
mapping, private /dev/zero mapping is treated as anonymous mapping, but
the current implementation is actually half-baked. It has NULL
vma->vm_ops which is used to tell kernel whether it is an anonymous vma
or not, but it also has valid vma->vm_file and vma->vm_pgoff as in file
offset.
So this special case makes kernel has 3 types of VMA:
- anonymous VMA: vm_ops is NULL, vm_file is NULL, vm_pgoff is the
linear address pgoff
- file VMA: vm_ops is *NOT* NULL, valid vm_file and vm_pgoff is
index in file
- private /dev/zero mapping VMA