On 2024/2/21 23:02, Antipov, Dmitriy wrote:
On Wed, 2024-02-21 at 21:09 +0800, Wen Gu wrote:
1. on = 1; ioctl(sock, FIOASYNC, &on), a fasync entry is added to
smc->sk.sk_socket->wq.fasync_list;
2. Then fallback happend, and swapped the socket:
smc->clcsock->file = smc->sk.sk_socket->file;
smc->clcsock->file->private_data = smc->clcsock;
smc->clcsock->wq.fasync_list = smc->sk.sk_socket->wq.fasync_list;
smc->sk.sk_socket->wq.fasync_list = NULL;
3. on = 0; ioctl(sock, FIOASYNC, &on), the fasync entry is removed
from smc->clcsock->wq.fasync_list,
(Is there a race between 2 and 3 ?)
1) IIUC yes. The following sequence from smc_switch_to_fallback():
smc->clcsock->file = smc->sk.sk_socket->file;
smc->clcsock->file->private_data = smc->clcsock;
is executed with locked smc->sk.sk_socket but unlocked smc->clcsock.
This way,
struct socket *sock = filp->private_data;
in sock_fasync() introduces an undefined behavior (because an
actual value of 'private_data' is unpredictable). So there are
two possible scenarios. When
on = 1; ioctl(sock, FIOASYNC, &on);
on = 0; ioctl(sock, FIOASYNC, &on);
actually modifies fasync list of the same socket, this works as
expected. If kernel sockets behind 'sock' gets swapped between
calls to ioctl(), fasync list of the first socket has an entry
which can't be removed by the second ioctl().
Thank you for the explanation, Dmitriy.
So the race could be:
sock_fasync | smc_switch_to_fallback
------------------------------------------------------------------
/* smc->sk.sk_socket->wq.fasync_list|
is empty at the beginning */ |
|
/* attempt to add fasync_struct */ |
sock = filp->private_data; |
(smc->sk.sk_socket) | /* fallback happens */
| lock_sock(sk);
| file->private_data = smc->clcsock
| smc->clcsock->wq.fasync_list = smc->sk.sk_socket->wq.fasync_list
| smc->sk.sk_socket->wq.fasync_list = NULL
| release_sock(sk);
lock_sock(sk); |
fasync_helper(on=1) |
(successed to add the entry in |
smc->sk.sk_socket->wq.fasync_list) |
release_sock(sk); |
|
/* the fasync_struct entry can't be |
removed later, since |
file->private_data has been changed |
to clcsock */ |
Now clcsock->wq.fasync_list is empty and the fasync_struct entry is
always left in smc->sk.sk_socket->wq.fasync_list.
If we only remove fasync_struct entries in smc->sk.sk_socket->wq.fasync_list
during smc_release, it indeed solves the leak, but it fails to address
the problem of fasync_struct entries being incorrectly inserted into the
old socket (they should have been placed in smc->clcsock->wq.fasync_list
if fallback happens).
One solution to this issue I can think of is to check whether
filp->private_data has been changed when the sock_fasync holds the sock lock,
but it inevitably changes the general code..
diff --git a/net/socket.c b/net/socket.c
index ed3df2f749bf..a28435195854 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1443,6 +1443,11 @@ static int sock_fasync(int fd, struct file *filp, int on)
return -EINVAL;
lock_sock(sk);
+ /* filp->private_data has changed */
+ if (on && unlikely(sock != filp->private_data)) {
+ release_sock(sk);
+ return -EAGAIN;
+ }
fasync_helper(fd, filp, on, &wq->fasync_list);
if (!wq->fasync_list)
Let's see if anyone else has a better idea.
Best regards,
Wen Gu
4. Then close the file, __fput() calls file->f_op->fasync(-1, file, 0),
then sock_fasync() calls fasync_helper(fd, filp, on, &wq->fasync_list)
and fasync_remove_entry() removes entries in smc->clcsock->wq.fasync_list.
Now smc->clcsock->wq.fasync_list is empty.
2) No. In the second (bad) scenario from the above, an attempt to remove
fasync entry from smc->clcsock->wq.fasync_list always fails because
fasync entry was actually linked to smc->sk.sk_socket->wq.fasync_list.
Note sock_fasync() doesn't check the value returned from fasync_helper().
How dumb.
5. __fput() calls file->f_op->release(inode, file), then sock_close calls
__sock_release, then ops->release calls smc_release(), and __smc_release()
calls smc_restore_fallback_changes() to restore socket:
if (smc->clcsock->file) { /* non-accepted sockets have no file yet */
smc->clcsock->file->private_data = smc->sk.sk_socket;
smc->clcsock->file = NULL;
smc_fback_restore_callbacks(smc);
}
3) Yes. And it's too late because __fput() calls file->f_op->fasync(-1, ...,
0) _before_ file->f_op->release(). So even if you have sockets swapped back,
no one will take care about freeing fasync list.
6. Then back to __sock_release, check if sock->wq.fasync_list (that is
smc->sk.sk_socket->wq.fasync_list) is empty and it is empty.
4) No. It's not empty because an attempt to remove fasync entry has failed
at [2] just because it was made against the wrong socket.
For your convenience, there is a human-readable reproducer loosely modeled
around the one generated by syzkaller. You can try it on any system running
recently enough kernel with CONFIG_SMC enabled (root is not required), and
receiving a few (or may be a lot of) "__sock_release: fasync list not empty"
messages clearly indicates an issue. NOTE: this shouldn't crash the system
and/or make it unusable, but remember that each message comes with a small
kernel memory leak.
Dmitry
#include <signal.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
int sock;
void *
loop0 (void *arg)
{
struct msghdr msg = { 0 };
while (1)
{
sock = socket (AF_SMC, SOCK_STREAM, 0);
sendmsg (sock, &msg, MSG_FASTOPEN);
close (sock);
}
return NULL;
}
void *
loop1 (void *arg)
{
int on;
while (1)
{
on = 1;
ioctl (sock, FIOASYNC, &on);
on = 0;
ioctl (sock, FIOASYNC, &on);
}
return NULL;
}
int
main (int argc, char *argv[])
{
pthread_t a, b;
struct sigaction sa = { 0 };
sa.sa_handler = SIG_IGN;
sigaction (SIGIO, &sa, NULL);
pthread_create (&a, NULL, loop0, NULL);
pthread_create (&b, NULL, loop1, NULL);
pause ();
pthread_join (a, NULL);
pthread_join (b, NULL);
return 0;
}