On Tue, Apr 13, 2021 at 5:26 AM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote: > > ---- 在 星期四, 2021-04-08 23:03:39 Miklos Szeredi <miklos@xxxxxxxxxx> 撰写 ---- > > On Thu, Apr 8, 2021 at 1:40 PM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote: > > > > > > ---- 在 星期四, 2021-04-08 19:29:55 Miklos Szeredi <miklos@xxxxxxxxxx> 撰写 ---- > > > > On Thu, Apr 8, 2021 at 1:28 PM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote: > > > > > > > > > > ---- 在 星期四, 2021-04-08 19:20:42 Chengguang Xu <cgxu519@xxxxxxxxxxxx> 撰写 ---- > > > > > > In overlayfs copy-up, if open flag has O_TRUNC then upper > > > > > > file will truncate to zero size, in this case we should check > > > > > > VM_DENYWRITE mappings to keep compatibility with other filesystems. > > > > > > > > Can you provide a test case for the bug that this is fixing? > > > > > > > > > > Execute binary file(keep running until open) in overlayfs which only has lower && open the binary file with flag O_RDWR|O_TRUNC > > > > > > Expected result: open fail with -ETXTBSY > > > > > > Actual result: open success > > > > Worse, it's possible to get a "Bus error" with just execute and write > > on an overlayfs file, which i_writecount is supposed to protect. > > > > The reason is that the put_write_access() call in __vma_link_file() > > assumes an already negative writecount, but because of the vm_file > > shuffle in ovl_mmap() that's not guaranteed. There's even a comment > > about exactly this situation in mmap(): > > > > /* ->mmap() can change vma->vm_file, but must guarantee that > > * vma_link() below can deny write-access if VM_DENYWRITE is set > > * and map writably if VM_SHARED is set. This usually means the > > * new file must not have been exposed to user-space, yet. > > */ > > > > The attached patch fixes this, but not your original bug. > > > > That could be addressed by checking the writecount on *both* lower and > > upper for open for write/truncate. Note: this could be checked before > > copy-up, but that's not reliable alone, because the copy up could > > happen due to meta-data update, for example, and then the > > open/truncate wouldn't trigger the writecount check. > > > > Something like the second attached patch? > > > > Yeah, I noticed that too just after posted my previous patch. > However, rethink these two cases, in practice we share lower layers > in most use cases especially in container use case. So if we check > VM_DENYWRITE of lower file strictly, it may cause interferes between > container instances. Maybe only checking upper file will be better > option? Yes. My patch to fix the SIGBUS is also incomplete as there's still a race window between releasing the temporary writecount and the __vma_link() that acquires the final count. This requires major surgery to fix properly :( Thanks, Miklos