On 8/9/21 1:16 PM, Al Viro wrote:
On Mon, Aug 09, 2021 at 08:04:40PM +0000, Al Viro wrote:
On Mon, Aug 09, 2021 at 12:40:03PM -0700, Shoaib Rao wrote:
Page faults occur all the time, the page may not even be in the cache or the
mapping is not there (mmap), so I would not consider this a bug. The code
should complain about all other calls as they are also copying to user
pages. I must not be following some semantics for the code to be triggered
but I can not figure that out. What is the recommended interface to do user
copy from kernel?
What are you talking about? Yes, page faults happen. No, they
must not be triggered in contexts when you cannot afford going to sleep.
In particular, you can't do that while holding a spinlock.
There are things that can't be done under a spinlock. If your
commit is attempting that, it's simply broken.
... in particular, this
+#if IS_ENABLED(CONFIG_AF_UNIX_OOB)
+ mutex_lock(&u->iolock);
+ unix_state_lock(sk);
+
+ err = unix_stream_recv_urg(state);
+
+ unix_state_unlock(sk);
+ mutex_unlock(&u->iolock);
+#endif
is 100% broken, since you *are* attempting to copy data to userland between
spin_lock(&unix_sk(s)->lock) and spin_unlock(&unix_sk(s)->lock).
You can't do blocking operations under a spinlock. And copyout is inherently
a blocking operation - it can require any kind of IO to complete. If you
have the destination (very much valid - no bad addresses there) in the middle
of a page mmapped from a file and currently not paged in, you *must* read
the current contents of the page, at least into the parts of page that
are not going to be overwritten by your copyout. No way around that. And
that can involve any kind of delays and any amount of disk/network/whatnot
traffic.
You fundamentally can not do that kind of thing without giving the CPU up.
And under a spinlock you are not allowed to do that.
In the current form that commit is obviously broken.
I am quiet aware of spinlock and mutex and all the other kernel
structures etc... As I said the fact that Linux uses locks* for
spinlocks and mutexes is confusing unless you look at the details of the
lock. I will fix the issue, it is a simple fix, copy the byte to a
kernel variable, release the lock. copy the byte to userland.
Shoaib