Re: [PATCH 4/5] ext4: introduce direct IO write code path using iomap infrastructure

Matthew Bobrowski <mbobrowski@xxxxxxxxxxxxxx> · Thu, 29 Aug 2019 21:45:17 +1000

On Wed, Aug 28, 2019 at 10:26:19PM +0200, Jan Kara wrote:
> On Mon 12-08-19 22:53:26, Matthew Bobrowski wrote:
> Overall this is very nice. Some smaller comments below.

Awesome, thanks for the review Jan!

> > @@ -235,6 +244,34 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from)
> >  	return iov_iter_count(from);
> >  }
> >  
> > +static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
> > +					struct iov_iter *from)
> > +{
> > +	ssize_t ret;
> > +	struct inode *inode = file_inode(iocb->ki_filp);
> > +
> > +	if (!inode_trylock(inode)) {
> > +		if (iocb->ki_flags & IOCB_NOWAIT)
> > +			return -EOPNOTSUPP;
> > +		inode_lock(inode);
> > +	}
> 
> Currently there's no support for IOCB_NOWAIT for buffered IO so you can
> replace this with "inode_lock(inode)".

Noted. I've also taken into consideration what Dave mentioned in the
other thread around explicitly checking for IOCB_NOWAIT and returning
EOPTNOTSUPP irrespective whether we can acquire the lock or not.

> > @@ -284,6 +321,128 @@ static int ext4_handle_inode_extension(struct inode *inode, loff_t size,
> >  	return ret;
> >  }
> >  
> 
> I'd mention here that for cases where inode size is extended,
> ext4_dio_write_iter() waits for DIO to complete and thus we are protected
> by inode_lock in that case.

Easy.

> > +static int ext4_dio_write_end_io(struct kiocb *iocb, ssize_t size,
> > +				 ssize_t error, unsigned int flags)
> 
> Here I'd expand the comment to explain that we wait in case inode is
> extended so that inode extension in ext4_dio_write_end_io() is properly
> covered by inode_lock.
>

Easy.

> > +	if (ret == -EIOCBQUEUED && (unaligned_aio || extend))
> > +		inode_dio_wait(inode);
> > +
> > +	if (ret >= 0 && iov_iter_count(from)) {
> > +		overwrite ? inode_unlock_shared(inode) : inode_unlock(inode);
> > +		return ext4_buffered_write_iter(iocb, from);
> > +	}
> > +out:
> > +	overwrite ? inode_unlock_shared(inode) : inode_unlock(inode);
> > +	return ret;
> > +}
> > +
> >  #ifdef CONFIG_FS_DAX
> >  static ssize_t
> >  ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from)
> 
> ...
> 
> > @@ -3581,10 +3611,10 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> >  		iomap->type = delalloc ? IOMAP_DELALLOC : IOMAP_HOLE;
> >  		iomap->addr = IOMAP_NULL_ADDR;
> >  	} else {
> > -		if (map.m_flags & EXT4_MAP_MAPPED) {
> > -			iomap->type = IOMAP_MAPPED;
> > -		} else if (map.m_flags & EXT4_MAP_UNWRITTEN) {
> > +		if (map.m_flags & EXT4_MAP_UNWRITTEN) {
> >  			iomap->type = IOMAP_UNWRITTEN;
> > +		} else if (map.m_flags & EXT4_MAP_MAPPED) {
> > +			iomap->type = IOMAP_MAPPED;
> >  		} else {
> >  			WARN_ON_ONCE(1);
> >  			return -EIO;
> 
> Possibly this hunk should go into a separate patch (since this is not
> directly related with iomap conversion) with a changelog / comment
> explaining why we need to check EXT4_MAP_UNWRITTEN first.

But wouldn't doing so break bisection? Seeing as though we needed to
change this statement specifically to accommodate for the weirdness
being returned from ext4_map_blocks()? i.e. map.m_flags being set to
either of the following:

	- (EXT4_MAP_NEW | EXT4_MAP_MAPPED)
        or
        - (EXT4_MAP_NEW | EXT4_MAP_MAPPED | EXT4_MAP_UNWRITTEN)

So, if we left the statement in its original form, we'd allocate
unwritten extents but never actually get around to converting them in
ext4_dio_write_end_io() as IOMAP_DIO_UNWRITTEN would never be set?

--M