Long ago vfs read/write operations were passing ppos=&file->f_pos directly to .read / .write file_operations methods. That changed in 2004 in 55f09ec0087c ("read/write: pass down a copy of f_pos, not f_pos itself.") which started to pass ppos=&local_var trying to avoid simultaneous read/write/lseek stepping onto each other toes and overwriting file->f_pos racily. That measure was not complete and in 2014 commit 9c225f2655e36 ("vfs: atomic f_pos accesses as per POSIX") added file->f_pos_lock to completely disable simultaneous read/write/lseek runs. After f_pos_lock was introduced the reason to avoid passing ppos=&file->f_pos directly due to concurrency vanished. Linus explains[1]: In fact, we *used* to (long ago) pass in the address of "file->f_pos" itself to the low-level read/write routines. We then changed it to do that indirection through a local copy of pos (and file_pos_read/file_pos_write) because we didn't do the proper locking, so different read/write versions could mess with each other (and with lseek). But one of the things that commit 9c225f2655e36 ("vfs: atomic f_pos accesses as per POSIX") did was to add the proper locking at least for the cases that we care about deeply, so we *could* say that we have three cases: - FMODE_ATOMIC_POS: properly locked, - FMODE_STREAM: no pos at all - otherwise a "mostly don't care - don't mix!" and so we could go back to not copying the pos at all, and instead do something like loff_t *ppos = f.file->f_mode & FMODE_STREAM ? NULL : &file->f_pos; ret = vfs_write(f.file, buf, count, ppos); and perhaps have a long-term plan to try to get rid of the "don't mix" case entirely (ie "if you use f_pos, then we'll do the proper locking") (The above is obviously surrounded by the fdget_pos()/fdput_pos() that implements the locking decision). Currently for regular files we always set FMODE_ATOMIC_POS and change that to FMODE_STREAM if stream_open is used explicitly on open. That leaves other files, like e.g. sockets and pipes, for "mostly don't care - don't mix!" case. Sockets, for example, always check that on read/write the initial pos they receive is 0 and don't update it. And if it is !0 they return -ESPIPE. That suggests that we can do the switch into passing &file->f_pos directly now and incrementally convert to FMODE_STREAM files that were doing the stream-like checking manually in their low-level .read/.write handlers. Note: it is theoretically possible that a driver updates *ppos inside even if read/write returns error. For such cases the conversion will change IO semantic a bit. The semantic that is changing here was introduced in 2013 in commit 5faf153ebf61 "don't call file_pos_write() if vfs_{read,write}{,v}() fails". [1] https://lore.kernel.org/linux-fsdevel/CAHk-=whJtZt52SnhBGrNMnuxFn3GE9X_e02x8BPxtkqrfyZukw@xxxxxxxxxxxxxx/ Suggested-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Kirill Smelkov <kirr@xxxxxxxxxx> --- fs/read_write.c | 36 ++++-------------------------------- 1 file changed, 4 insertions(+), 32 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index d62556be6848..13550b65cb2c 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -571,14 +571,7 @@ ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count) ssize_t ret = -EBADF; if (f.file) { - loff_t pos, *ppos = file_ppos(f.file); - if (ppos) { - pos = *ppos; - ppos = &pos; - } - ret = vfs_read(f.file, buf, count, ppos); - if (ret >= 0 && ppos) - f.file->f_pos = pos; + ret = vfs_read(f.file, buf, count, file_ppos(f.file)); fdput_pos(f); } return ret; @@ -595,14 +588,7 @@ ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count) ssize_t ret = -EBADF; if (f.file) { - loff_t pos, *ppos = file_ppos(f.file); - if (ppos) { - pos = *ppos; - ppos = &pos; - } - ret = vfs_write(f.file, buf, count, ppos); - if (ret >= 0 && ppos) - f.file->f_pos = pos; + ret = vfs_write(f.file, buf, count, file_ppos(f.file)); fdput_pos(f); } @@ -1018,14 +1004,7 @@ static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec, ssize_t ret = -EBADF; if (f.file) { - loff_t pos, *ppos = file_ppos(f.file); - if (ppos) { - pos = *ppos; - ppos = &pos; - } - ret = vfs_readv(f.file, vec, vlen, ppos, flags); - if (ret >= 0 && ppos) - f.file->f_pos = pos; + ret = vfs_readv(f.file, vec, vlen, file_ppos(f.file), flags); fdput_pos(f); } @@ -1042,14 +1021,7 @@ static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec, ssize_t ret = -EBADF; if (f.file) { - loff_t pos, *ppos = file_ppos(f.file); - if (ppos) { - pos = *ppos; - ppos = &pos; - } - ret = vfs_writev(f.file, vec, vlen, ppos, flags); - if (ret >= 0 && ppos) - f.file->f_pos = pos; + ret = vfs_writev(f.file, vec, vlen, file_ppos(f.file), flags); fdput_pos(f); } -- 2.21.0.593.g511ec345e1