On Saturday, May 29th, 2021 at 10:15 PM, Hugh Dickins <hughd@xxxxxxxxxx> wrote: > And IIUC it would have to be the recipient (Wayland compositor) doing > the NOFAULT business, because (going back to the original mail) we are > only considering this so that Wayland might satisfy clients who predate > or refuse Linux-only APIs. So, an ioctl (or fcntl, as sealing chose) > at the client end cannot be expected; and could not be relied on anyway. Yes, that is correct. > NOFAULT? Does BSD use "fault" differently, and in Linux terms we > would say NOSIGBUS to mean the same? > > Can someone point to a specification of BSD's __MAP_NOFAULT? > Searching just found me references to bugs. __MAP_NOFAULT isn't documented, sadly. The commit that introduces the flag [1] is the best we're going to get, I think. > What mainly worries me about the suggestion is: what happens to the > zero page inserted into NOFAULT mappings, when later a page for that > offset is created and added to page cache? Not 100% sure exactly this means what I think it means, but from my PoV, it's fine if the contents of an expanded shm file aren't visible from the process that has mapped it with MAP_NOFAULT/MAP_NOSIGBUS. In other words, it's fine if: - The client sets up a 1KiB shm file and sends it to the compositor. - The compositor maps it with MAP_NOFAULT/MAP_NOSIGBUS. - The client expands the file to 2KiB and writes interesting data in it. - The compositor still sees zeros past the 1KiB mark. The compositor needs to unmap and re-map the file to see the data past the 1KiB mark. If the MAP_NOFAULT/MAP_NOSIGBUS flag only affects the mapping itself and nothing else, this should be fine? > Treating it as an opaque blob of zeroes, that stays there ever after, > hiding the subsequent data: easy to implement, but a hack that we would > probably regret. (And I notice that even the quote from David Herrmann > in the original post allows for the possibility that client may want to > expand the object.) > > I believe the correct behaviour would be to unmap the nofault page > then, allowing the proper page to be faulted in after. That is > certainly doable (the old mm/filemap_xip.c used to do so), but might > get into some awkward race territory, with filesystem dependence > (reminiscent of hole punch, in reverse). shmem could operate that > way, and be the better for it: but I wouldn't want to add that, > without also cleaning away all the shmem_recalc_inode() stuff. [1]: https://github.com/openbsd/src/commit/37f480c7e4870332b7ffb802fa6578f547c8a19f