On Thu 05-12-19 12:16:23, Ritesh Harjani wrote: > Earlier there was no shared lock in DIO read path. But this patch > (16c54688592ce: ext4: Allow parallel DIO reads) > simplified some of the locking mechanism while still allowing for parallel DIO > reads by adding shared lock in inode DIO read path. > > But this created problem with mixed read/write workload. It is due to the fact > that in DIO path, we first start with exclusive lock and only when we determine > that it is a ovewrite IO, we downgrade the lock. This causes the problem, since > we still have shared locking in DIO reads. > > So, this patch tries to fix this issue by starting with shared lock and then > switching to exclusive lock only when required based on ext4_dio_write_checks(). > > Other than that, it also simplifies below cases:- > > 1. Simplified ext4_unaligned_aio API to ext4_unaligned_io. Previous API was > abused in the sense that it was not really checking for AIO anywhere also it > used to check for extending writes. So this API was renamed and simplified to > ext4_unaligned_io() which actully only checks if the IO is really unaligned. > > Now, in case of unaligned direct IO, iomap_dio_rw needs to do zeroing of partial > block and that will require serialization against other direct IOs in the same > block. So we take a exclusive inode lock for any unaligned DIO. In case of AIO > we also need to wait for any outstanding IOs to complete so that conversion from > unwritten to written is completed before anyone try to map the overlapping block. > Hence we take exclusive inode lock and also wait for inode_dio_wait() for > unaligned DIO case. Please note since we are anyway taking an exclusive lock in > unaligned IO, inode_dio_wait() becomes a no-op in case of non-AIO DIO. > > 2. Added ext4_extending_io(). This checks if the IO is extending the file. > > 3. Added ext4_dio_write_checks(). In this we start with shared inode lock and > only switch to exclusive lock if required. So in most cases with aligned, > non-extending, dioread_nolock & overwrites, it tries to write with a shared > lock. If not, then we restart the operation in ext4_dio_write_checks(), after > acquiring exclusive lock. > > Signed-off-by: Ritesh Harjani <riteshh@xxxxxxxxxxxxx> Cool, the patch looks good to me. You can add: Reviewed-by: Jan Kara <jack@xxxxxxx> Two small nits below: > -static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) > +static ssize_t ext4_generic_write_checks(struct kiocb *iocb, > + struct iov_iter *from) > { > struct inode *inode = file_inode(iocb->ki_filp); > ssize_t ret; > @@ -228,11 +235,21 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) > iov_iter_truncate(from, sbi->s_bitmap_maxbytes - iocb->ki_pos); > } > > + return iov_iter_count(from); > +} You return iov_iter_count() from ext4_generic_write_checks()... > +static ssize_t ext4_dio_write_checks(struct kiocb *iocb, struct iov_iter *from, > + bool *ilock_shared, bool *extend) > +{ > + struct file *file = iocb->ki_filp; > + struct inode *inode = file_inode(file); > + loff_t offset; > + size_t count; > + ssize_t ret; > + > +restart: > + ret = ext4_generic_write_checks(iocb, from); > + if (ret <= 0) > + goto out; > + > + offset = iocb->ki_pos; > + count = iov_iter_count(from); But you don't use the returned count here and just call iov_iter_count() again (which is cheap anyway but still it's strange). > + if (ext4_extending_io(inode, offset, count)) > + *extend = true; > + /* > + * Determine whether the IO operation will overwrite allocated > + * and initialized blocks. If so, check to see whether it is > + * possible to take the dioread_nolock path. > + * > + * We need exclusive i_rwsem for changing security info > + * in file_modified(). > + */ > + if (*ilock_shared && (!IS_NOSEC(inode) || *extend || > + !ext4_should_dioread_nolock(inode) || > + !ext4_overwrite_io(inode, offset, count))) { > + inode_unlock_shared(inode); > + *ilock_shared = false; > + inode_lock(inode); > + goto restart; > + } > + > + ret = file_modified(file); > + if (ret < 0) > + goto out; > + > + return count; And then you return count from ext4_dio_write_checks() here... > - ret = ext4_write_checks(iocb, from); > - if (ret <= 0) { > - inode_unlock(inode); > + ret = ext4_dio_write_checks(iocb, from, &ilock_shared, &extend); > + if (ret <= 0) > return ret; > - } > > - /* > - * Unaligned asynchronous direct I/O must be serialized among each > - * other as the zeroing of partial blocks of two competing unaligned > - * asynchronous direct I/O writes can result in data corruption. > - */ > offset = iocb->ki_pos; > count = iov_iter_count(from); And then again just don't use the value here... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR