On Thu, Sep 15, 2016 at 06:23:24AM -0400, Mike Marshall wrote: > If you squeeze out every byte won't you still have a short > write? And the written data wouldn't be cut at the bad > place, but it would have a weird hole or discontinuity there. ??? What I mean is that if we have an invalid address in the middle of a buffer (unmapped, for example), we do not attempt to write every byte prior to that invalid address. Of course what we write is going to be contiguous. Suppose we have a buffer spanning 10 pages (amd64, so these are 4K ones) - 7 valid, 3 invalid: VVVVIIIVV and it starts 100 bytes into the first page. And write goes into a regular file on e.g. tmpfs, starting at offset 31. We _can't_ write more than 4*4096-100 bytes, no matter what. It will be a short write. As the matter of fact, it will be even shorter than that - it will be 3*4096-31 bytes, up to the last pagecache boundary we can cover completely. That obviously depends upon the filesystem - not everything uses pagecache, for starters. However, the caller is *not* guaranteed that write() with an invalid page in the middle of a buffer would write everything up to the very beginning of the invalid page. A short write will happen, but the amount written might be up to page size less than the actual length of valid part in the beginning of the buffer. Now, for writev() we could have invalid pages in any iovec; again, we obviously can't write anything past the first invalid page - we'll get either a short write or -EFAULT (if nothing got written). That's fine; the question is what the caller can count upon wrt shortening. Again, we are *not* guaranteed writing up to exact boundary. However, the current implementation will end up shortening no more than to the iovec boundary. I.e. if the first iovec contains only valid pages and there's an invalid one in the second iovec, the current implementation will write at least everything in the first iovec. That's _not_ promised by POSIX or our manpages; moreover, I'm not sure if it's even true for each filesystem. And keeping that property is actually inconvenient - if we could discard it, we could make partial-copy ->write_end() calls a lot more infrequent. Unfortunately, some of LTP writev tests end up checking that writev() does behave that way - they feed it a three-element iovec with shorter-than-page segments, the second of which is all invalid. And they check that the entire first segment had been written. I would really like to drop that property, making it "if some addresses in the buffer(s) we are asked to write are invalid, the write will be shortened by up to a PAGE_SIZE from the first such invalid address", making writev() rules exactly the same as write() ones. Does anybody have objections to it? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html