On Tue 26-08-14 14:22:24, Anton Altaparmakov wrote: > Hi Al, > > I have been looking at __generic_file_write_iter() and think I may have found a bug when O_DIRECT is enabled and the write falls through to generic_perform_write() after the generic_file_direct_write() and then the filemap_write_and_wait_range() fails for some reason. > > The relevant code is: > > status = generic_perform_write(file, from, pos); > [...] > iocb->ki_pos = pos + status; > [...] > err = filemap_write_and_wait_range(file->f_mapping, pos, endbyte); > if (err == 0) { > written += status; > > Now consider the case where filemap_write_and_wait_range() returned an error. > > In that case we return with iocb->ki_pos already advanced to "pos + status" when in fact (some of) the write to disk actually failed. > > We then end up returning "written" which does not include "status" in its value but we set file->f_pos to iocb->ki_pos in the write() system call (via file_pos_write(f.file, pos); where "pos" was set to iocb->ki_pos). > > Now imagine a userspace application doing a simple write loop: > > do { > written = write(f, buf, bytes); > buf += written; > [...] > } while (...); > > In the above example we will advance "buf" by "written" bytes and then write "buf + written" at the incorrect file offset as the previous write() system call advanced file->f_pos by "written + status" instead of just "written". > > If this happens during a FTP upload for example you end up with a destination file which is larger than the source file and the data is shifted/duplicated wherever this error occurs by a shift of "status" bytes or a duplication of "status" bytes. > > Assuming you agree with what I am saying above I would propose that the iocb->ki_pos be moved down to the written += status, i.e. this: > > Signed-off-by: Anton Altaparmakov <aia21@xxxxxxxxxx> Looks good to me. You can add: Reviewed-by: Jan Kara <jack@xxxxxxx> Honza > > diff --git a/mm/filemap.c b/mm/filemap.c > index 90effcd..65e5df4 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2606,7 +2606,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > err = status; > goto out; > } > - iocb->ki_pos = pos + status; > /* > * We need to ensure that the page cache pages are written to > * disk and invalidated to preserve the expected O_DIRECT > @@ -2615,6 +2614,7 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > endbyte = pos + status - 1; > err = filemap_write_and_wait_range(file->f_mapping, pos, endbyte); > if (err == 0) { > + iocb->ki_pos = pos + status; > written += status; > invalidate_mapping_pages(mapping, > pos >> PAGE_CACHE_SHIFT, > > > Best regards, > > Anton > -- > Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) > University of Cambridge Information Services, Roger Needham Building > 7 JJ Thomson Avenue, Cambridge, CB3 0RB, UK > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html