On Sun, Aug 27, 2023 at 08:41:22PM +0100, Al Viro wrote: > That part is somewhat fishy - there's a case where you return a positive value > and advance ->ki_pos by more than that amount. I really wonder if all callers > of ->write_iter() are OK with that. Speaking of which, in case of negative return value we'd better *not* use ->ki_pos; consider e.g. generic_file_write_iter() with O_DSYNC and vfs_fsync_range() failure. An error gets returned, but ->ki_pos is left advanced. Normal write(2) is fine - it will only update file->f_pos if ->write_iter() has returned a non-negative. However, io_uring kiocb_done() starts with if (req->flags & REQ_F_CUR_POS) req->file->f_pos = rw->kiocb.ki_pos; if (ret >= 0 && (rw->kiocb.ki_complete == io_complete_rw)) { if (!__io_complete_rw_common(req, ret)) { /* * Safe to call io_end from here as we're inline * from the submission path. */ io_req_io_end(req); io_req_set_res(req, final_ret, io_put_kbuf(req, issue_flags)); return IOU_OK; } } else { io_rw_done(&rw->kiocb, ret); } Note that ->f_pos update is *NOT* conditional upon ret >= 0 - it happens no matter what, provided that original request had ->kiocb.ki_pos equal to -1 (on a non-FMODE_STREAM file). Jens, is there any reason for doing that unconditionally? IMO it's a bad idea - there's a wide scope for fuckups that way, especially since write(2) is not sensitive to that and this use of -1 ki_pos is not particularly encouraged on io_uring side either, AFAICT. Worse, it's handling of failure exits in the first place, which already gets little testing...