On 2/29/20 2:27 AM, Thomas Gleixner wrote:
"Pierre-Loup A. Griffais" <pgriffais@xxxxxxxxxxxxxxxxx> writes:
On 2/28/20 1:25 PM, Thomas Gleixner wrote:
Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
Thomas mentioned something like that, the problem is, ofcourse, that we
then want to fix a whole bunch of historical ills, and the probmem
becomes much bigger.
We keep piling features on top of an interface and mechanism which is
fragile as hell and horrible to maintain. Adding vectoring, multi size
and whatever is not making it any better.
There is also the long standing issue with NUMA, which we can't address
with the current pile at all.
So I'm really advocating that all involved parties sit down ASAP and
hash out a new and less convoluted mechanism where all the magic new
features can be addressed in a sane way so that the 'F' in Futex really
only means Fast and not some other word starting with 'F'.
Are you specifically talking about the interface, or the mechanism
itself? Would you be OK with a new syscall that calls into the same code
as this patch? It does seem like that's what we want, so if we rewrote a
mechanism I'm not convinced it would come out any different. But, the
interface itself seems fair-game to rewrite, as the current futex
syscall is turning into an ioctl of sorts.
No, you are misreading what I said. How does a new syscall make any
difference? It still adds new crap to a maze which is already in a state
of dubious maintainability.
I was just going by the context added by Peter, which seemed to imply
your concerns were mostly around the interface, because I couldn't
understand a clear course of action to follow just from your email. And
frankly, still can't, but hopefully you can help us get there.
This solves a real problem with a real usecase; so I'd like to stay
practical and not go into deeper issues like solving NUMA support for
all of futex in the interest of users waiting at the other end. Can you
point us to your preferred approach just for the scope of what we're
trying to accomplish?
If we go by the argument that something solves a real use case and take
this as justification to proliferate existing crap, then we never get to
the point where things get redesigned from ground up. Quite the
contrary, we are going to duct tape it to death.
It does not matter at all whether the syscall is multiplexing or split
up into 5 different ones. That's a pure cosmetic exercise.
While all the currently proposed extensions (multiple wait, variable
size) make sense conceptually, I'm really uncomfortable to just cram
them into the existing code. They create an ABI which we have to
maintain forever.
From experience I just know that every time we extended the futex
interface we opened another can of worms which hunted us for years if
not for more then a decade. Guess who has to deal with that. Surely not
the people who drive by and solve their real world usecases. Just go and
read the changelog history of futexes very carefully and you might
understand what kind of complex beasts they are.
At some point we simply have to say stop, sit down and figure out which
kind of functionality we really need in order to solve real world user
space problems and which of the gazillion futex (mis)features are just
there as historical ballast and do not have to be supported in a new
implementation, REQUEUE is just the most obvious example.
I completely understand that you want to stay practical and just want to
solve your particular itch, but please understand that the people who
have to deal with the fallout and have dealt with it for 15+ years have
very practical reasons to say no.
Note that it would have been nice to get that high-level feedback on the
first version; instead we just received back specific feedback on the
implementation itself, and questions about usecase/motivation that we
tried to address, but that didn't elicit any follow-ups.
Please bear with me for a second in case you thought you were obviously
very clear about the path forward, but are you saying that:
1. Our usecase is valid, but we're not correct about futex being the
right fit for it, and we should design an implement a new primitive to
handle it?
2. Our usecase is valid, and our research showing that futex is the
optimal right fit for it might be correct, but futex has to be
significantly refactored before accepting this new feature. (or any new
feature?)
If it was 1., I think our new solution would either end up looking more
or less exactly like futex, just with some of the more exotic
functionality removed (although even that is arguable, since I wouldn't
be surprised if we ended up using eg. requeue for some of the more
complex migration scenarios). In which case I assume someone else would
ask the question on why we're doing this new thing instead of adding to
futex. OR, if intentionally made not futex-like, would end up not being
optimal, which would make it not the right solution and a non-started to
begin with. There's a reason we moved away from eventfd, even ignoring
the fd exhaustion problem that some problematic apps fall victim to.
If it's 2., then we'd be hard-pressed to proceed forward without your
guidance.
Conceptually it seems like multiple wait is an important missing feature
in futex compared to core threading primitives of other platforms. It
isn't the first time that the lack of it has come up for us and other
game developers. Due to futex being so central and important, I
completely understand it is tricky to get right and might be hard to
maintain if not done correctly. It seems worthwhile to undertake, at
least from our limited perspective. We'd be glad to help upstream get
there, if possible.
Thanks,
- Pierre-Loup
Thanks,
tglx