PAGE_SIZE isn't accurate on architectures which do multiple page sizes, like 8k, 64k, 512k, 4M, 32M, 256M on SPARC64 and same on PPC64/Power. Ced On 16 September 2016 at 00:29, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Thu, Sep 15, 2016 at 06:23:24AM -0400, Mike Marshall wrote: >> If you squeeze out every byte won't you still have a short >> write? And the written data wouldn't be cut at the bad >> place, but it would have a weird hole or discontinuity there. > > ??? > > What I mean is that if we have an invalid address in the middle of a buffer > (unmapped, for example), we do not attempt to write every byte prior to that > invalid address. Of course what we write is going to be contiguous. > > Suppose we have a buffer spanning 10 pages (amd64, so these are 4K ones) - > 7 valid, 3 invalid: > VVVVIIIVV > and it starts 100 bytes into the first page. And write goes into a regular > file on e.g. tmpfs, starting at offset 31. We _can't_ write more than > 4*4096-100 bytes, no matter what. It will be a short write. As the matter > of fact, it will be even shorter than that - it will be 3*4096-31 bytes, > up to the last pagecache boundary we can cover completely. That obviously > depends upon the filesystem - not everything uses pagecache, for starters. > However, the caller is *not* guaranteed that write() with an invalid page > in the middle of a buffer would write everything up to the very beginning > of the invalid page. A short write will happen, but the amount written > might be up to page size less than the actual length of valid part in the > beginning of the buffer. > > Now, for writev() we could have invalid pages in any iovec; again, we > obviously can't write anything past the first invalid page - we'll get > either a short write or -EFAULT (if nothing got written). That's fine; > the question is what the caller can count upon wrt shortening. > > Again, we are *not* guaranteed writing up to exact boundary. However, the > current implementation will end up shortening no more than to the iovec > boundary. I.e. if the first iovec contains only valid pages and there's > an invalid one in the second iovec, the current implementation will write > at least everything in the first iovec. That's _not_ promised by POSIX > or our manpages; moreover, I'm not sure if it's even true for each filesystem. > And keeping that property is actually inconvenient - if we could discard it, > we could make partial-copy ->write_end() calls a lot more infrequent. > > Unfortunately, some of LTP writev tests end up checking that writev() does > behave that way - they feed it a three-element iovec with shorter-than-page > segments, the second of which is all invalid. And they check that the > entire first segment had been written. > > I would really like to drop that property, making it "if some addresses > in the buffer(s) we are asked to write are invalid, the write will be > shortened by up to a PAGE_SIZE from the first such invalid address", making > writev() rules exactly the same as write() ones. Does anybody have objections > to it? > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Cedric Blancher <cedric.blancher@xxxxxxxxx> [https://plus.google.com/u/0/+CedricBlancher/] Institute Pasteur -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html