Hi Ted, On Fri, Aug 19, 2011 at 1:57 PM, Ted Ts'o <tytso@xxxxxxx> wrote: > On Thu, Aug 18, 2011 at 06:28:45PM -0700, Jiaying Zhang wrote: >> @@ -800,12 +800,17 @@ ssize_t ext4_ind_direct_IO(int rw, struct kiocb *iocb, >> } >> >> retry: >> - if (rw == READ && ext4_should_dioread_nolock(inode)) >> + if (rw == READ && ext4_should_dioread_nolock(inode)) { >> + if (unlikely(!list_empty(&ei->i_completed_io_list))) { >> + mutex_lock(&inode->i_mutex); >> + ext4_flush_completed_IO(inode); >> + mutex_unlock(&inode->i_mutex); >> + } > > Doesn't this largely invalidate the reasons for using dioread_nolock > in the first place, which was to avoid taking the i_mutex for > performance reasons? If we are willing to solve the problem this way, > I wonder if we be better off just simply telling users to disable > dioread_nolock if they care about cache consistency between DIO reads > and buffered writes? > > (Yes, I do understand that in the hopefully common case where a user > is not trying to do buffered writes and DIO reads at the same time, > we'll just take and release the mutex very quickly, but still, it's > got to have a performance impact.) My hope is that in the hopefully common case, applications just do a lot of DIO reads, so the check on unlikely(!list_empty(&ei->i_completed_io_list) would fail and we don't need to take the i_mutex lock. It is certainly not an optimal solution because we only care about data consistency for the blocks to be read instead of all of uncompleted writes. But again, the more general solution would require some kind of extent tree that requires some development effort. Jiaying > > I seem to recall a conversation I had with Stephen Tweedie over a > decade ago, where he noted that many other Unix systems made > absolutely no cache consistency guarantees with respect to DIO and the > page cache, but he wanted to set a higher standard for Linux. Which > is fair enough, but I wonder if for the case of dioread_nolock, since > its raison d'etre is to avoid the i_mutex lock, to simply just say > that one of the side effects of dioread_nolock is that (for now) the > cache consistency guarantees are repealed if this mount option is > chosen. > > What do folks think? > > - Ted > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html