On Tue, Apr 12, 2022 at 6:26 PM Jens Axboe <axboe@xxxxxxxxx> wrote: > > On 4/12/22 6:40 PM, Eric Dumazet wrote: > > > > On 4/12/22 13:26, Jens Axboe wrote: > >> Hi, > >> > >> If we accept a connection directly, eg without installing a file > >> descriptor for it, or if we use IORING_OP_SOCKET in direct mode, then > >> we have a socket for recv/send that we can fully serialize access to. > >> > >> With that in mind, we can feasibly skip locking on the socket for TCP > >> in that case. Some of the testing I've done has shown as much as 15% > >> of overhead in the lock_sock/release_sock part, with this change then > >> we see none. > >> > >> Comments welcome! > >> > > How BH handlers (including TCP timers) and io_uring are going to run > > safely ? Even if a tcp socket had one user, (private fd opened by a > > non multi-threaded program), we would still to use the spinlock. > > But we don't even hold the spinlock over lock_sock() and release_sock(), > just the mutex. And we do check for running eg the backlog on release, > which I believe is done safely and similarly in other places too. So lets say TCP stack receives a packet in BH handler... it proceeds using many tcp sock fields. Then io_uring wants to read/write stuff from another cpu, while BH handler(s) is(are) not done yet, and will happily read/change many of the same fields Writing a 1 and a 0 in a bit field to ensure mutual exclusion is not going to work, even with the smp_rmb() and smp_wmb() you added (adding more costs for non io_uring users which already pay a high lock tax) If we want to optimize the lock_sock()/release_sock() for common cases (a single user thread per TCP socket), then maybe we can play games with some kind of cmpxchg() games, but that would be a generic change.