On Tue, 17 Jun 2014, Andy Lutomirski wrote: > On Tue, Jun 17, 2014 at 9:51 AM, David Herrmann <dh.herrmann@xxxxxxxxx> wrote: > > On Tue, Jun 17, 2014 at 6:41 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > >> On Tue, Jun 17, 2014 at 9:36 AM, David Herrmann <dh.herrmann@xxxxxxxxx> wrote: > >>> On Tue, Jun 17, 2014 at 6:20 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > >>>> Can you summarize why holes can't be reliably backed by the zero page? > >>> > >>> To answer this, I will quote Hugh from "PATCH v2 1/3": > >>> > >>>> We do already use the ZERO_PAGE instead of allocating when it's a > >>>> simple read; and on the face of it, we could extend that to mmap > >>>> once the file is sealed. But I am rather afraid to do so - for > >>>> many years there was an mmap /dev/zero case which did that, but > >>>> it was an easily forgotten case which caught us out at least > >>>> once, so I'm reluctant to reintroduce it now for sealing. > >>>> > >>>> Anyway, I don't expect you to resolve the issue of sealed holes: > >>>> that's very much my territory, to give you support on. > >>> > >>> Holes can be avoided with a simple fallocate(). I don't understand why > >>> I should make SEAL_WRITE do the fallocate for the caller. During the > >>> discussion of memfd_create() I was told to drop the "size" parameter, > >>> because it is redundant. I don't see how this implicit fallocate() > >>> does not fall into the same category? > >>> > >> > >> I'm really confused now. > >> > >> If I SEAL_WRITE a file, and then I mmap it PROT_READ, and then I read > >> it, is that a "simple read"? If so, doesn't that mean that there's no > >> problem? > > > > I assumed Hugh was talking about read(). So no, this is not about > > memory-reads on mmap()ed regions. > > > > Looking at shmem_file_read_iter() I can see a ZERO_PAGE(0) call in > > case shmem_getpage_gfp(SGP_READ) tells us there's a hole. I cannot see > > anything like that in the mmap_region() and shmem_fault() paths. > > Would it be easy to fix this just for SEAL_WRITE files? Hugh? > > This would make the interface much nicer, IMO. I do agree with you, Andy. I agree with David that a fallocate (of the fill-in-holes variety) does not have to be prohibited on a sealed file, that detection of holes is not an issue with respect to sealing, and that fallocate by the recipient could be used to "post-seal" the object to safety. But it doesn't feel right, and we shall be re-explaining and apologizing for it for months to come, until we just fix it. I suspect David didn't want to add a dependency upon me to fix it, and I didn't want to be rushed into fixing it (nor is it a job I'd be comfortable to delegate). I'll give it more thought. The problem is that there may be a variety of codepaths, in mm/shmem.c but more seriously outside it, which expect an appropriate page->mapping and page->index on any page of a shared mapping, and will be buggily surprised to find a ZERO_PAGE instead. I'll have to go through carefully. Splice may be more difficult to audit than fault, I don't very often have to think about it. And though I'd prefer to do the same for non-sealed as for sealed, it may make more sense in the short term just to address the sealed case, as you suggest. In the unsealed case, first write to a page entails locating all the places where the ZERO_PAGE had previously been mapped, and replacing it there by the newly allocated page; might depend on VM_NONLINEAR removal, and might entail page_mkwrite(). Doing just the sealed is easier, though the half-complete job will annoy me. I did refresh my memory of the /dev/zero case that had particularly worried me: it was stranger than I'd thought, that reading from /dev/zero could insert ZERO_PAGEs into mappings of other files. Nick put an end to that in 2.6.24, but perhaps its prior existence helps give assurance that ZERO_PAGE in surprising places is less trouble than I fear (it did force XIP into having its own zero_page, but I don't remember other complications). Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html