On 4/26/22 17:02, Jakub Kicinski wrote:
On Tue, 26 Apr 2022 17:29:03 +0300 Sagi Grimberg wrote:
Create the socket in user space, do all the handshakes you need there
and then pass it to the kernel. This is how NBD + TLS works. Scales
better and requires much less kernel code.
But we can't, as the existing mechanisms (at least for NVMe) creates the
socket in-kernel.
Having to create the socket in userspace would require a completely new
interface for nvme and will not be backwards compatible.
And we will still need the upcall anyways when we reconnect
(re-establish the socket)
That totally flew over my head, I have zero familiarity with in-kernel
storage network users :S
Count yourself lucky.
In all honesty the tls code in the kernel is a bit of a dumping ground.
People come, dump a bunch of code and disappear. Nobody seems to care
that the result is still (years in) not ready for production use :/
Until a month ago it'd break connections even under moderate memory
pressure. This set does not even have selftests.
Well, I'd been surprised that it worked, too.
And even more so that Boris Piskenny @ Nvidia is actively working on it.
(Thanks, Sagi!)
Plus there are more protocols being actively worked on (QUIC, PSP etc.)
Having per ULP special sauce to invoke a user space helper is not the
paradigm we chose, and the time as inopportune as ever to change that.
Which is precisely what we hope to discuss at LSF.
(Yes, I know, probably not the best venue to discuss network stuff ...)
Each approach has its drawbacks:
- Establishing sockets from userspace will cause issues during
reconnection, as then someone (aka the kernel) will have to inform
userspace that a new connection will need to be established.
(And that has to happen while the root filesystem is potentially
inaccessible, so you can't just call arbitrary commands here)
(Especially call_usermodehelper() is out of the game)
- Having ULP helpers (as with this design) mitigates that problem
somewhat in the sense that you can mlock() that daemon and having it
polling on an intermediate socket; that solves the notification problem.
But you have to have ULP special sauce here to make it work.
- Moving everything in kernel is ... possible. But then you have yet
another security-relevant piece of code in the kernel which needs to be
audited, CVEd etc. Not to mention the usual policy discussion whether it
really belongs into the kernel.
So I don't really see any obvious way to go; best we can do is to pick
the least ugly :-(
Cheers,
Hannes