On Fri, Jun 13, 2014 at 8:27 AM, David Herrmann <dh.herrmann@xxxxxxxxx> wrote: > Hi > > On Fri, Jun 13, 2014 at 5:06 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >> On Fri, Jun 13, 2014 at 3:36 AM, David Herrmann <dh.herrmann@xxxxxxxxx> wrote: >>> When setting SEAL_WRITE, we must make sure nobody has a writable reference >>> to the pages (via GUP or similar). We currently check references and wait >>> some time for them to be dropped. This, however, might fail for several >>> reasons, including: >>> - the page is pinned for longer than we wait >>> - while we wait, someone takes an already pinned page for read-access >>> >>> Therefore, this patch introduces page-isolation. When sealing a file with >>> SEAL_WRITE, we copy all pages that have an elevated ref-count. The newpage >>> is put in place atomically, the old page is detached and left alone. It >>> will get reclaimed once the last external user dropped it. >>> >>> Signed-off-by: David Herrmann <dh.herrmann@xxxxxxxxx> >> >> Won't this have unexpected effects? >> >> Thread 1: start read into mapping backed by fd >> >> Thread 2: SEAL_WRITE >> >> Thread 1: read finishes. now the page doesn't match the sealed page > > Just to be clear: you're talking about read() calls that write into > the memfd? (like my FUSE example does) Your language might be > ambiguous to others as "read into" actually implies a write. > > No, this does not have unexpected effects. But yes, your conclusion is > right. To be clear, this behavior would be part of the API. Any > asynchronous write might be cut off by SEAL_WRITE _iff_ you unmap your > buffer before the write finishes. But you actually have to extend your > example: > > Thread 1: p = mmap(memfd, SIZE); > Thread 1: h = async_read(some_fd, p, SIZE); > Thread 1: munmap(p, SIZE); > Thread 2: SEAL_WRITE > Thread 1: async_wait(h); > > If you don't do the unmap(), then SEAL_WRITE will fail due to an > elevated i_mmap_writable. I think this is fine. In fact, I remember > reading that async-IO is not required to resolve user-space addresses > at the time of the syscall, but might delay it to the time of the > actual write. But you're right, it would be misleading that the AIO > operation returns success. This would be part of the memfd-API, > though. And if you mess with your address space while running an > async-IO operation on it, you're screwed anyway. Ok, I missed the part where you had to munmap to trigger the oddity. That seems fine to me. > > Btw., your sealing use-case is really odd. No-one guarantees that the > SEAL_WRITE happens _after_ you schedule your async-read. In case you > have some synchronization there, you just have to move it after > waiting for your async-io to finish. > > Does that clear things up? I think so. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html