Re: [RFC 0/5] fix races in CDC-WDM

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Wed, 16 Sep 2020 20:14:47 +0900

On 2020/09/16 19:18, Oliver Neukum wrote:
> Am Dienstag, den 15.09.2020, 19:30 +0900 schrieb Tetsuo Handa:
>> On 2020/09/15 18:14, Oliver Neukum wrote
> 
>>> Is there something we can do in flush()?
>>
>> I consider that wdm_flush() is a wrong place to return an error. It is possible that
> 
> I am afraid that is a basic problem we need to resolve. As I understand
>  it, flush() as a method exists precisely to report errors. Otherwise
> you could implement it in release(). But this is not called for every
> close().

I think fsync() or ioctl() is a better method for reporting errors.

But anyway I consider that any error from N'th write() request should be
returned by N'th write() request.

> 
> Hence a driver is supposed to start IO upon write() and report the
> result to the next call, which can be either write() or close(), the
> latter corresponding to flush().

Whether N'th write() request succeeded remains unknown until (N+1)'th
write() request or close() request is issued? That sounds a strange design.

If there is nothing more to write(), how userspace process knows whether
N'th write() request succeeded? Wait for writability using poll() ?

How do we guarantee that N'th write() request already set desc->werr before
(N+1)'th next write() request is issued? If (N+1)'th write() request reached
memdup_user() before desc->werr is set by callback of N'th write() request,
(N+1)'th write() request will fail to report the error from N'th write() request.
Yes, that error would be reported by (N+2)'th write() request, but the userspace
process might have already discarded data needed for taking some actions (e.g.
print error messages, retry the write() request with same argument).

> 
>> a userspace process reaches wdm_flush() due to being killed by SIGKILL (rather than
>> via calling close() syscall). Then, that userspace process will never receive the error
> 
> If you are killed by a signal you are in a race condition
> anyway. It cannot be handled.

What is the purpose of sending the error to the userspace process via write() or close()?
Isn't the purpose to allow userspace process to do something (e.g. print error messages,
retry the write() request with same argument) ? If close() returned an error, it might be
too late to retry the write() request with same argument.

> 
>> fetched from wdm_flush(). Also, if that userspace process is killed by the OOM killer,
>> being able to terminate and release resources as soon as possible is more preferable
>> than try to wait for response.
> 
> Right, so should we just proceed in case of a dieing task? How do we
> do that?

We need to avoid waiting at wdm_flush() if killed.

And I think that wdm_flush() is a strange interface for reporting the error.