Re: [PATCH AUTOSEL 5.0 59/66] fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock

Kirill Smelkov <kirr@xxxxxxxxxx> · Fri, 26 Apr 2019 07:45:33 +0000

On Thu, Apr 25, 2019 at 10:04:34AM +0000, David Laight wrote:
> From: Kirill Smelkov
> > Sent: 24 April 2019 19:30
> > 
> > On Wed, Apr 24, 2019 at 10:26:55AM -0700, Linus Torvalds wrote:
> > > On Wed, Apr 24, 2019 at 10:19 AM Sasha Levin <sashal@xxxxxxxxxx> wrote:
> > > >
> > > > Hm, I might be confusing something here but I see a bunch of patches
> > > > that convert existing callers mentioned in this patch to use
> > > > stream_open() which was introduced here.
> > >
> > > The only use of stream_open() upstream right now is the xenbus
> > > conversion, and that isn't actually a bugfix, because xenbus used to
> > > manually do that
> > >
> > >         filp->f_mode &= ~FMODE_ATOMIC_POS; /* cdev-style semantics */
> > >
> > > that stream_open() does.
> > >
> > > So no, there isn't "a bunch of patches" anywhere.
> > >
> > > There are *future* cleanups for 5.2 that will happen, and that might
> > > have hit linux-next. And there is at least one FUSE patch (again -
> > > pending, not upstream) that may get marked for stable.
> > >
> > > But I see nothing right now that makes it stable material yet.
> > 
> > Linus, thanks for explaining. Sasha, Greg, there is a FUSE patch that
> > should be stable material that will need this stream_open() thing. That
> > patch has just entered fuse.git#for-next today:
> > 
> > 	https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git/commit/?id=bbd84f33652f
> > 
> > and will hopefully enter 5.2 when merge window opens. I agree we should
> > not blindly backport bulk stream_open conversions as performed by
> > stream_open.cocci, at least unless there is a bug report indicating that
> > it is actually required for a particular driver. On the other hand both
> > Xen and FUSE deadlocks were hit for real which justifies stable
> > propagation for their fixes.
> > 
> > You can read about the deadlock regression and the plan to fix it in
> > original "fs: stream_open - opener for stream-like files so that read
> > and write can run simultaneously without deadlock" patch (the 59/66
> > patch that was send in this thread), or here:
> > 
> > 	https://git.kernel.org/linus/10dce8af3422
> > 
> > 
> > Hope it clarifies things a bit,
> 
> I can also imagine drivers that expect accesses to be done using
> pread() and pwrite() - maybe only if the fd is shared.
> Provided accesses get the correct offset they can be concurrent.
> In fact they only need to update the offset in the file structure
> when they complete - they may do this already.
> 
> I know (I think) uclibc implementing pread() as lseek() + read()
> caused me grief - but that might just have been the extra system
> call overhead rather than any problems with the offset.

I'm not sure I understand your comment completely, but we convert to
stream_open only drivers that actually do _not_ use position at all, and
that were already using nonseekable_open, thus pread and pwrite were
already returning -ESPIPE for them (nonseekable_open clears
FMODE_{PREAD,PWRITE} and ksys_{pread,pwrite}64 check for that flag). We
also convert only drivers that use no_llseek for .llseek, so lseek
on those files is/was always returning -ESPIPE as well.

If a driver uses position in its read and write and has support for
pread/pwrite (FMODE_PREAD and FMODE_PWRITE), pread and pwrite are
already working _without_ file->f_pos locking - because those system
calls do not semantically update file->f_pos at all and thus do not take
file->f_pos_lock - i.e. pread/pwrite can be run simultaneously already.

If libc implements pread as lseek+read it will work for a single
user case  (single thread, or fd not shared between processes), but it
will break because of lseek+read non-atomicity if multiple preads are
simultaneously used from several threads. And also for such emulation
for multiple users case there is a chance for pread vs pwrite deadlock,
since those system calls are using read and write and read and write
take file->f_pos_lock.

Kirill