Hi Joseph! On Wed 18-09-19 14:35:15, Joseph Qi wrote: > On 19/9/17 18:32, Ritesh Harjani wrote: > > Hello, > > > > This patch series is based on the upstream discussion with Jan > > & Joseph @ [1]. > > It is based on top of Matthew's v3 ext4 iomap patch series [2] > > > > Patch-1: Adds the ext4_ilock/unlock APIs and also replaces all > > inode_lock/unlock instances from fs/ext4/* > > > > For now I already accounted for trylock/lock issue symantics > > (which was discussed here [3]) in the same patch, > > since the this whole patch was around inode_lock/unlock API, > > so I thought it will be best to address that issue in the same patch. > > However, kindly let me know if otherwise. > > > > Patch-2: Commit msg of this patch describes in detail about > > what it is doing. > > In brief - we try to first take the shared lock (instead of exclusive > > lock), unless it is a unaligned_io or extend_io. Then in > > ext4_dio_write_checks(), if we start with shared lock, we see > > if we can really continue with shared lock or not. If not, then > > we release the shared lock then acquire exclusive lock > > and restart ext4_dio_write_checks(). > > > > > > Tested against few xfstests (with dioread_nolock mount option), > > those ran fine (ext4 & generic). > > > > I tried testing performance numbers on my VM (since I could not get > > hold of any real h/w based test device). I could test the fact > > that earlier we were trying to do downgrade_write() lock, but with > > this patch, that path is now avoided for fio test case > > (as reported by Joseph in [4]). > > But for the actual results, I am not sure if VM machine testing could > > really give the reliable perf numbers which we want to take a look at. > > Though I do observe some form of perf improvements, but I could not > > get any reliable numbers (not even with the same list of with/without > > patches with which Joseph posted his numbers [1]). > > > > > > @Joseph, > > Would it be possible for you to give your test case a run with this > > patches? That will be really helpful. > > > > Branch for this is hosted at below tree. > > > > https://github.com/riteshharjani/linux/tree/ext4-ilock-RFC > > > I've tested your branch, the result is: > mounting with dioread_nolock, it behaves the same like reverting > parallel dio reads + dioread_nolock; > while mounting without dioread_nolock, no improvement, or even worse. > Please refer the test data below. > > fio -name=parallel_dio_reads_test -filename=/mnt/nvme0n1/testfile > -direct=1 -iodepth=1 -thread -rw=randrw -ioengine=psync -bs=$bs > -size=20G -numjobs=8 -runtime=600 -group_reporting > > w/ = with parallel dio reads > w/o = reverting parallel dio reads This is with 16c54688592ce8 "ext4: Allow parallel DIO reads" reverted, right? > w/o+ = reverting parallel dio reads + dioread_nolock > ilock = ext4-ilock-RFC > ilock+ = ext4-ilock-RFC + dioread_nolock > > bs=4k: > -------------------------------------------------------------- > | READ | WRITE | > -------------------------------------------------------------- > w/ | 30898KB/s,7724,555.00us | 30875KB/s,7718,479.70us | > -------------------------------------------------------------- > w/o | 117915KB/s,29478,248.18us | 117854KB/s,29463,21.91us | > -------------------------------------------------------------- I'm really surprised by the numbers here. They would mean that when DIO read takes i_rwsem exclusive lock instead of shared, it is a win for your workload... Argh, now checking code in fs/direct-io.c I think I can see the difference. The trick in do_blockdev_direct_IO() is: if (iov_iter_rw(iter) == READ && (dio->flags & DIO_LOCKING)) inode_unlock(dio->inode); if (dio->is_async && retval == 0 && dio->result && (iov_iter_rw(iter) == READ || dio->result == count)) retval = -EIOCBQUEUED; else dio_await_completion(dio); So actually only direct IO read submission is protected by i_rwsem with DIO_LOCKING. Actual waiting for sync DIO read happens with i_rwsem dropped. After some thought I think the best solution for this is to just finally finish the conversion of ext4 so that dioread_nolock is the only DIO path. With i_rwsem held in shared mode even for "unlocked" DIO, it should be actually relatively simple and most of the dances with unwritten extents shouldn't be needed anymore. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR