Re: [git pull] iov_iter fixes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/10/21 7:57 AM, Jens Axboe wrote:
> On 9/9/21 9:36 PM, Al Viro wrote:
>> On Thu, Sep 09, 2021 at 09:30:03PM -0600, Jens Axboe wrote:
>>
>>>> Again, we should never, ever modify the iovec (or bvec, etc.) array in
>>>> ->read_iter()/->write_iter()/->sendmsg()/etc. instances.  If you see
>>>> such behaviour anywhere, report it immediately.  Any such is a blatant
>>>> bug.
>>>
>>> Yes that was wrong, the iovec is obviously const. But that really
>>> doesn't change the original point, which was that copying the iov_iter
>>> itself unconditionally would be miserable.
>>
>> Might very well be true, but... won't your patch hit the reimport on
>> every short read?  And the cost of uaccess in there is *much* higher
>> than copying of 48 bytes into local variable...
>>
>> Or am I misreading your patch?  Note that short reads on reaching
>> EOF are obviously normal - it's not a rare case at all.
> 
> It was just a quick hack, might very well be too eager to go through
> those motions. But pondering this instead of sleeping, we don't need to
> copy all of iov_iter in order to restore the state, and we can use the
> same advance after restoring. So something like this may be more
> palatable. Caveat - again untested, and I haven't tested the performance
> impact of this at all.

Passes basic testing for me. I added a sysctl switch for this so I can
compare performance, running my usual peak-perf-single-core benchmark.
That one does ~3.5M IOPS, using polled IO. There's always a slight
variability between boots and builds, hence the sysctl so I could
toggle this behavior on the fly.

Did a few runs, and the differences are very stable. With this enabled,
we spend about 0.15% more time in io_read(). That's only worth about
5K IOPS at 3.5M, not enough to notice as the variation for the 1 second
reporting window usually swings more than that:

Old behavior:
IOPS=3536512, IOS/call=32/31, inflight=(75)
IOPS=3541888, IOS/call=32/32, inflight=(64)
IOPS=3529056, IOS/call=32/31, inflight=(119)
IOPS=3521184, IOS/call=32/32, inflight=(96)
IOPS=3527456, IOS/call=32/31, inflight=(128)
IOPS=3525504, IOS/call=32/32, inflight=(128)
IOPS=3524288, IOS/call=32/32, inflight=(128)
IOPS=3536192, IOS/call=32/32, inflight=(96)
IOPS=3535840, IOS/call=32/32, inflight=(96)
IOPS=3533728, IOS/call=32/31, inflight=(128)
IOPS=3528384, IOS/call=32/32, inflight=(128)
IOPS=3518400, IOS/call=32/32, inflight=(64)

Turning it on:
IOPS=3533824, IOS/call=32/31, inflight=(64)
IOPS=3541408, IOS/call=32/32, inflight=(32)
IOPS=3533024, IOS/call=32/31, inflight=(64)
IOPS=3528672, IOS/call=32/32, inflight=(35)
IOPS=3522272, IOS/call=32/31, inflight=(107)
IOPS=3517632, IOS/call=32/32, inflight=(57)
IOPS=3516000, IOS/call=32/31, inflight=(96)
IOPS=3513568, IOS/call=32/32, inflight=(34)
IOPS=3525600, IOS/call=32/31, inflight=(96)
IOPS=3527136, IOS/call=32/31, inflight=(101)

I think that's tolerable, it was never going to be absolutely free.

What do you think of this approach? Parts of iov_iter are going to
remain constant, like iter_type and data_source. io_uring already copies
iter->count, so that just leaves restoring iov (and unionized friends),
nr_segs union, and iov_offset;

We could pretty this up and have the state part be explicit in iov_iter,
and have the store/restore parts end up in uio.h. That'd tie them closer
together, though I don't expect iov_iter changes to be an issue. It
would make it more maintainable, though. I'll try and hack up this
generic solution, see if that looks any better.

-- 
Jens Axboe




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux