On Mon, Sep 23, 2019 at 06:21:15PM +0200, Jan Kara wrote: > On Thu 12-09-19 21:04:00, Matthew Bobrowski wrote: > > @@ -233,12 +234,90 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) > > return iov_iter_count(from); > > } > > > > +static int ext4_handle_inode_extension(struct inode *inode, loff_t offset, > > + ssize_t len, size_t count) > > Traditionally, we call one of the length arguments 'copied' or 'written' to > denote actual amount of data processed and the original length is called > 'len' or 'length' in iomap code. Can you please rename the arguments to > follow this convention? Sure, I will go with 'written'. > > +/* > > + * The inode may have been placed onto the orphan list or has had > > + * blocks allocated beyond EOF as a result of an extension. We need to > > + * ensure that any necessary cleanup routines are performed if the > > + * error path has been taken for a write. > > + */ > > +static int ext4_handle_failed_inode_extension(struct inode *inode, loff_t size) > > +{ > > + handle_t *handle; > > + > > + if (size > i_size_read(inode)) > > + ext4_truncate_failed_write(inode); > > + > > + if (!list_empty(&EXT4_I(inode)->i_orphan)) { > > + handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); > > + if (IS_ERR(handle)) { > > + if (inode->i_nlink) > > + ext4_orphan_del(NULL, inode); > > + return PTR_ERR(handle); > > + } > > + if (inode->i_nlink) > > + ext4_orphan_del(handle, inode); > > + ext4_journal_stop(handle); > > + } > > + return 0; > > +} > > + > > After some thought, I'd just drop this function and fold the functionality > into ext4_handle_inode_extension() by making it accept negative 'len' > argument indicating error. The code just happens to be the simplest in that > case (see below). > > > @@ -255,7 +334,18 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) > > if (ret) > > goto out; > > > > + offset = iocb->ki_pos; > > ret = dax_iomap_rw(iocb, from, &ext4_iomap_ops); > > + if (ret > 0 && iocb->ki_pos > i_size_read(inode)) > > + error = ext4_handle_inode_extension(inode, offset, ret, > > + iov_iter_count(from)); > > You need to sample iov_iter_count(from) before calling dax_iomap_rw(). At > this point iov_iter_count(from) is just what's left in the iter after > writing what we could. > > Also I don't think the condition iocb->ki_pos > i_size_read(inode) is > correct here. Because it may happen that offset + count > i_size so we > allocate some blocks beyond i_size but then we manage to copy only less so > offset + ret == iocb->ki_pos <= i_size and you will not call > ext4_handle_inode_extension() to truncate allocated blocks beyond i_size. Yeah, this makes sense. I will implement it this way. Once again, appreciate your suggestions. > So I'd just call ext4_handle_inode_extension() unconditionally like: > > error = ext4_handle_inode_extension(inode, offset, ret, len); > > and have a quick check at the beginning of that function to avoid starting > transaction when there isn't anything to do. Something like: > > /* > * DIO and DAX writes get exclusion from truncate (i_rwsem) and > * page writeback (i_rwsem and flushing all dirty pages). > */ > WARN_ON_ONCE(i_size_read(inode) != EXT4_I(inode)->i_disksize); > if (offset + count <= i_size_read(inode)) > return 0; > if (len < 0) > goto truncate; > > ... do the heavylifting with transaction start, inode size update, > and orphan handling... > > if (truncate) { > truncate: > ext4_truncate_failed_write(inode); > orphan_del: > /* > * If the truncate operation failed early the inode > * may still be on the orphan list. In that case, we > * need try remove the inode from the linked list in > * memory. > */ > if (inode->i_nlink) > ext4_orphan_del(NULL, inode); > } --<M>--