On Wed, Dec 17, 2014 at 10:03:13PM +0000, Al Viro wrote: > On Wed, Dec 17, 2014 at 10:52:56AM -0800, Christoph Hellwig wrote: > > On Wed, Dec 17, 2014 at 06:58:32AM -0800, Omar Sandoval wrote: > > > See my previous message. If we use O_DIRECT on the original open, then > > > filesystems that implement bmap but not direct_IO will no longer work. > > > These are the ones that I found in my tree: > > > > In the long run I don't think they are worth keeping. But to keep you > > out of that discussion you can just try an open without O_DIRECT if the > > open with the flag failed. > > Umm... That's one possibility, of course (and if swapon(2) is on someone's > hotpath, I really would like to see what the hell they are doing - it has > to be interesting in a sick way). If this is the approach you'd prefer, I'll go ahead and do that for v2. I personally think it looks pretty kludgey, but I'm fine either way: diff --git a/mm/swapfile.c b/mm/swapfile.c index 63f55cc..c1b3073 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2379,7 +2379,16 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) name = NULL; goto bad_swap; } - swap_file = file_open_name(name, O_RDWR|O_LARGEFILE, 0); + swap_file = file_open_name(name, O_RDWR | O_LARGEFILE | O_DIRECT, 0); + if (IS_ERR(swap_file) && PTR_ERR(swap_file) == -EINVAL) + swap_file = file_open_name(name, O_RDWR | O_LARGEFILE, 0); if (IS_ERR(swap_file)) { error = PTR_ERR(swap_file); swap_file = NULL; > BTW, speaking of read/write vs. swap - what's the story with e.g. AFS > write() checking IS_SWAPFILE() and failing with -EBUSY? Note that > * it's done before acquiring i_mutex, so it isn't race-free > * it's dubious from the POSIX POV - EBUSY isn't in the error > list for write(2). > * other filesystems generally don't have anything of that sort. > NFS does, but local ones do not... > Besides, do we even allow swapfiles on AFS? AFS doesn't implement ->bmap or ->swap_activate, so that code is dead, probably cargo-culted from the NFS code. It seems pretty pointless, not only because it's inconsistent with the local filesystems like you mentioned, but also because it's trivial to bypass with O_DIRECT on NFS: ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; struct inode *inode = file_inode(file); unsigned long written = 0; ssize_t result; size_t count = iov_iter_count(from); loff_t pos = iocb->ki_pos; result = nfs_key_timeout_notify(file, inode); if (result) return result; if (file->f_flags & O_DIRECT) return nfs_file_direct_write(iocb, from, pos); dprintk("NFS: write(%pD2, %zu@%Ld)\n", file, count, (long long) pos); result = -EBUSY; if (IS_SWAPFILE(inode)) goto out_swapfile; I think it's safe to scrap that code. However, this also led me to find that NFS doesn't prevent truncates on an active swapfile. I'm submitting a patch for that now. -- Omar -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>