Re: [RFC v3 7/7] shm: isolate pinned pages when sealing files

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Fri, 13 Jun 2014 10:23:23 -0700

On Fri, Jun 13, 2014 at 8:27 AM, David Herrmann <dh.herrmann@xxxxxxxxx> wrote:
> Hi
>
> On Fri, Jun 13, 2014 at 5:06 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>> On Fri, Jun 13, 2014 at 3:36 AM, David Herrmann <dh.herrmann@xxxxxxxxx> wrote:
>>> When setting SEAL_WRITE, we must make sure nobody has a writable reference
>>> to the pages (via GUP or similar). We currently check references and wait
>>> some time for them to be dropped. This, however, might fail for several
>>> reasons, including:
>>>  - the page is pinned for longer than we wait
>>>  - while we wait, someone takes an already pinned page for read-access
>>>
>>> Therefore, this patch introduces page-isolation. When sealing a file with
>>> SEAL_WRITE, we copy all pages that have an elevated ref-count. The newpage
>>> is put in place atomically, the old page is detached and left alone. It
>>> will get reclaimed once the last external user dropped it.
>>>
>>> Signed-off-by: David Herrmann <dh.herrmann@xxxxxxxxx>
>>
>> Won't this have unexpected effects?
>>
>> Thread 1:  start read into mapping backed by fd
>>
>> Thread 2:  SEAL_WRITE
>>
>> Thread 1: read finishes.  now the page doesn't match the sealed page
>
> Just to be clear: you're talking about read() calls that write into
> the memfd? (like my FUSE example does) Your language might be
> ambiguous to others as "read into" actually implies a write.
>
> No, this does not have unexpected effects. But yes, your conclusion is
> right. To be clear, this behavior would be part of the API. Any
> asynchronous write might be cut off by SEAL_WRITE _iff_ you unmap your
> buffer before the write finishes. But you actually have to extend your
> example:
>
> Thread 1: p = mmap(memfd, SIZE);
> Thread 1: h = async_read(some_fd, p, SIZE);
> Thread 1: munmap(p, SIZE);
> Thread 2: SEAL_WRITE
> Thread 1: async_wait(h);
>
> If you don't do the unmap(), then SEAL_WRITE will fail due to an
> elevated i_mmap_writable. I think this is fine. In fact, I remember
> reading that async-IO is not required to resolve user-space addresses
> at the time of the syscall, but might delay it to the time of the
> actual write. But you're right, it would be misleading that the AIO
> operation returns success. This would be part of the memfd-API,
> though. And if you mess with your address space while running an
> async-IO operation on it, you're screwed anyway.

Ok, I missed the part where you had to munmap to trigger the oddity.
That seems fine to me.

>
> Btw., your sealing use-case is really odd. No-one guarantees that the
> SEAL_WRITE happens _after_ you schedule your async-read. In case you
> have some synchronization there, you just have to move it after
> waiting for your async-io to finish.
>
> Does that clear things up?

I think so.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html