From: Jeff Layton <jlayton@xxxxxxxxxx> Usually we suggest that applications call fsync when they want to ensure that all data written to the file has made it to the backing store, but that can be inefficient when there are a lot of open files. Calling syncfs on the filesystem is more efficient, but the error reporting doesn't currently work the way most people expect. If a single inode on a filesystem reports a writeback error, syncfs won't return an error. syncfs only returns an error if __sync_blockdev fails. It would be better if it reported an error if there were any writeback failures. Then applications could call syncfs to see if there are any errors on any open files, and could then call fsync on all of the other descriptors to figure out which one failed. This patch implements a suggestion from Willy to remedy this. It adds a new errseq_t to struct super_block, and has mapping_set_error also record writeback errors there. For reporting, we also need to keep an errseq_t for every struct file, but growing struct file for this purpose is undesirable. We could just reuse f_wb_err, but someone could mix calls to fsync and syncfs and that would break things. As an alternative, this patch only has syncfs report errors recorded in s_wb_err when the file has been opened with O_PATH. Any file opened with O_PATH will not have its fsync field defined in its file_operations so we can be sure that nothing else will be using its f_wb_err field. Note that calling syncfs on an O_PATH descriptor today will return -EBADF, so this scheme gives userland a way to tell whether this mechanism will work at runtime. Cc: Andres Freund <andres@xxxxxxxxxxx> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> --- fs/open.c | 6 +++--- fs/sync.c | 13 ++++++++++--- include/linux/fs.h | 3 +++ include/linux/pagemap.h | 5 ++++- 4 files changed, 20 insertions(+), 7 deletions(-) diff --git a/fs/open.c b/fs/open.c index c5ee7cd60424..3e8c7b16abb8 100644 --- a/fs/open.c +++ b/fs/open.c @@ -739,15 +739,15 @@ static int do_dentry_open(struct file *f, f->f_inode = inode; f->f_mapping = inode->i_mapping; - /* Ensure that we skip any errors that predate opening of the file */ - f->f_wb_err = filemap_sample_wb_err(f->f_mapping); - if (unlikely(f->f_flags & O_PATH)) { f->f_mode = FMODE_PATH; f->f_op = &empty_fops; + f->f_wb_err = errseq_sample(&f->f_path.dentry->d_sb->s_wb_err); goto done; } + f->f_wb_err = filemap_sample_wb_err(f->f_mapping); + if (f->f_mode & FMODE_WRITE && !special_file(inode->i_mode)) { error = get_write_access(inode); if (unlikely(error)) diff --git a/fs/sync.c b/fs/sync.c index b54e0541ad89..f092fa458f6a 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -159,18 +159,25 @@ void emergency_sync(void) */ SYSCALL_DEFINE1(syncfs, int, fd) { - struct fd f = fdget(fd); + struct fd f = fdget_raw(fd); struct super_block *sb; - int ret; + int ret, wberr; - if (!f.file) + if (!f.file) { + printk("fd %d is NULL!\n", fd); return -EBADF; + } sb = f.file->f_path.dentry->d_sb; down_read(&sb->s_umount); ret = sync_filesystem(sb); up_read(&sb->s_umount); + if (f.file->f_flags & O_PATH) { + wberr = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_wb_err); + if (!ret) + ret = wberr; + } fdput(f); return ret; } diff --git a/include/linux/fs.h b/include/linux/fs.h index 760d8da1b6c7..77c388868a82 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1416,6 +1416,9 @@ struct super_block { /* Being remounted read-only */ int s_readonly_remount; + /* per-sb errseq_t for reporting writeback errors via syncfs */ + errseq_t s_wb_err; + /* AIO completions deferred from interrupt context */ struct workqueue_struct *s_dio_done_wq; struct hlist_head s_pins; diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index b1bd2186e6d2..2de87c5a2718 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -51,7 +51,10 @@ static inline void mapping_set_error(struct address_space *mapping, int error) return; /* Record in wb_err for checkers using errseq_t based tracking */ - filemap_set_wb_err(mapping, error); + __filemap_set_wb_err(mapping, error); + + /* Record it in superblock */ + errseq_set(&mapping->host->i_sb->s_wb_err, error); /* Record it in flags for now, for legacy callers */ if (error == -ENOSPC) -- 2.14.3