Re: [QUESTION] Long read latencies on mixed rw buffered IO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 25, 2019 at 8:22 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> On Mon, Mar 25, 2019 at 07:30:39PM +0200, Amir Goldstein wrote:
> > On Mon, Mar 25, 2019 at 6:41 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > >
> > > On Mon, Mar 25, 2019 at 08:47:31AM -0700, Darrick J. Wong wrote:
> > > > Hmmm.... so it looks like the rw_semaphore behavior has shifted over
> > > > time, then?
> > >
> > > Yes.
> > >
> > > > I thought rwsem was supposed to queue read and write waiters in order,
> > > > at least on x86?  Though I suppose that might not matter much since we
> > > > can only run one writer at a time vs. waking up all the readers at once.
> > > > Now I'm wondering if there ever was a time when the readers all got
> > > > batched to the front and starved the writers, but eh I haven't drank
> > > > enough coffee to remember things like that. :P
> > > >
> > > > (I wonder what would happen if rw_semaphore decided to wake up some
> > > > number of the readers in the rwsem wait_list, not just the ones at the
> > > > front...)
> > >
> > > rwsems currently allow a limited amount of queue-jumping; if a semaphore
> > > is currently not acquired (it's in transition between two owners), a
> > > running process can acquire it.
> > >
> > > I think it is a bug that we only wake readers at the front of the queue;
> > > I think we would get better performance if we wake all readers.  ie here:
> > >
> > >         /*
> > > -        * Grant an infinite number of read locks to the readers at the front
> > > -        * of the queue. We know that woken will be at least 1 as we accounted
> > > +        * Grant an infinite number of read locks.  We know that woken will
> > > +        * be at least 1 as we accounted
> > >          * for above. Note we increment the 'active part' of the count by the
> > >          * number of readers before waking any processes up.
> > >          */
> > >         list_for_each_entry_safe(waiter, tmp, &sem->wait_list, list) {
> > >                 struct task_struct *tsk;
> > >
> > > -               if (waiter->type == RWSEM_WAITING_FOR_WRITE)
> > > -                       break;
> > >
> > > Amir, it seems like you have a good test-case for trying this out ...
> >
> > Sure, but please explain. Why are you waking up the writers?
>
> Oh ... duh.  I didn't mean to wake up the writers.  I meant ...
>
>                if (waiter->type == RWSEM_WAITING_FOR_WRITE)
> -                       break;
> +                       continue;
>

So I have no access to the test machine of former tests right now,
but when running the same filebench randomrw workload
(8 writers, 8 readers) on VM with 2 CPUs and SSD drive, results
are not looking good for this patch:

--- v5.1-rc1 / xfs ---
rand-write1          852404ops    14202ops/s 110.9mb/s      0.6ms/op
[0.01ms - 553.45ms]
rand-read1           26117ops      435ops/s   3.4mb/s     18.4ms/op
[0.04ms - 632.29ms]
61.088: IO Summary: 878521 ops 14636.774 ops/s 435/14202 rd/wr
114.3mb/s   1.1ms/op

--- v5.1-rc1 / xfs + patch above ---
rand-write1          1117998ops    18621ops/s 145.5mb/s      0.4ms/op
[0.01ms - 788.19ms]
rand-read1           7089ops      118ops/s   0.9mb/s     67.4ms/op
[0.03ms - 792.67ms]
61.091: IO Summary: 1125087 ops 18738.961 ops/s 118/18621 rd/wr
146.4mb/s   0.8ms/op

--- v5.1-rc1 / xfs + remove XFS_IOLOCK_SHARED from
xfs_file_buffered_aio_read ---
rand-write1          1025826ops    17091ops/s 133.5mb/s      0.5ms/op
[0.01ms - 909.20ms]
rand-read1           115162ops     1919ops/s  15.0mb/s      4.2ms/op
[0.00ms - 157.46ms]
61.084: IO Summary: 1140988 ops 19009.369 ops/s 1919/17091 rd/wr
148.5mb/s   0.8ms/op

--- v5.1-rc1 / ext4 ---
rand-write1          867926ops    14459ops/s 113.0mb/s      0.6ms/op
[0.01ms - 886.89ms]
rand-read1           121893ops     2031ops/s  15.9mb/s      3.9ms/op
[0.00ms - 149.24ms]
61.102: IO Summary: 989819 ops 16489.132 ops/s 2031/14459 rd/wr
128.8mb/s   1.0ms/op

So rw_semaphore fix is not in the ballpark, not even looking in the
right direction...

Any other ideas to try?

Thanks,
Amir.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux