On Tue, Mar 19, 2013 at 10:10:32PM +0000, Al Viro wrote: > OK, it's going to be an interesting series - aforementioned tentative patch > was badly incomplete ;-/ The interesting question is how far do we want to lift that. ->aio_write() part is trivial - see vfs.git#experimental; the trouble begins with ->splice_write(). For *everything* except default_file_splice_write(), lifting into the caller (do_splice_from()) is the right thing to do. default_file_splice_write(), however, it trickier; there we end up calling vfs_write() (via an ugly callchain). And _that_ is a real bitch. Granted, vfs_write() is somewhat an overkill there (we'd already done rw_verify_area() and access_ok() is pointless due to set_fs() we do around vfs_write() there) and we'd already lifted it up to do_sync_write(). But if we lift it any further, we'll need to deal with ->write() callers in the tree. Current situation: fs/coredump.c:662: return access_ok(VERIFY_READ, addr, nr) && file->f_op->write(file, addr, nr, &file->f_pos) == nr; arch/powerpc/platforms/cell/spufs/coredump.c:63: written = file->f_op->write(file, addr, nr, &file->f_pos); for these guys we might actually want to lift all way up to do_coredump() drivers/staging/comedi/drivers/serial2002.c:91: result = f->f_op->write(f, buf, count, &f->f_pos); fs/autofs4/waitq.c:73: (wr = file->f_op->write(file,data,bytes,&file->f_pos)) > 0) { not regular files, unless I'm seriously misreading the code. kernel/acct.c:553: file->f_op->write(file, (char *)&ac, BTW, this is probably where we want to deal with your acct deadlock. fs/compat.c:1103: fn = (io_fn_t)file->f_op->write; fs/read_write.c:435: ret = file->f_op->write(file, buf, count, pos); fs/read_write.c:732: fn = (io_fn_t)file->f_op->write; syscalls - the question here is whether we lift it up to vfs_write/vfs_writev/ compat_writev, or actually take it further. fs/cachefiles/rdwr.c:967: ret = file->f_op->write( cachefiles_write_page(); no fucking idea what locks might be held by caller and potentially that's a rather nasty source of PITA fs/coda/file.c:84: ret = host_file->f_op->write(host_file, buf, count, ppos); coda writing to file in cache on local fs. Potentially a nasty bugger, since it's hard to lift any further - the caller has no idea that the thing is on CODA, let alone what happens to hold the local cache. drivers/block/loop.c:234: bw = file->f_op->write(file, buf, len, &pos); do_bio_filebacked(), with some ugliness between that and callsite. Note, BTW, that we have a pair of possible vfs_fsync() calls in there; how do those interact with freeze? This does *not* touch the current callers of vfs_write()/vfs_writev(); any of those called while holding ->i_mutex on a directory (or mnt_want_write(), for that matter) is a deadlock right now. And we'd better start thinking about how we'll backport that crap - deadlock in e.g. xfs ->splice_write() had been there since last summer ;-/ -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html