Andrey Ignatov <rdna@xxxxxx> writes: > The main challenges I see for applying this approach in fb case is the > need to recreate the dispatcher every time a new program has to be > added. > > Imagine there there are a few containers and every container wants to > run an application that attaches XDP program to the "dispatcher" via > freplace. Every application may have a "priority" reserved for it, but > recreating the dispatcher may have race condition, for example: Yeah, I did realise this is potentially racy, but so is any loading of XDP programs right now (i.e., two applications can both try loading a single XDP program at the same time, and end up stomping on each others' feet). So we'll need to solve that in any case. I've managed to come up with two possible ways to solve this: 1. Locking: Make it possible for a process to temporarily lock the XDP program loaded onto an interface so no other program can modify it until the lock is released. 2. A cmpxchg operation: Add a new field to the XDP load netlink message containing an fd of the old program that the load call is expecting to replace. I.e., instead of attach(ifindex, prog_fd, flags), you have attach(ifindex, prog_fd, old_fd, flags). The kernel can then check that the old_fd matches the program currently loaded before replacing anything, and reject the operation otherwise. With either of these mechanisms it should be possible for userspace to do the right thing if the kernel state changes underneath it. I'm leaning towards (2) because I think it is simpler to implement and doesn't require any new state be kept in the kernel. The drawback is that it may lead to a lot of retries if many processes are trying to load their programs at the same time. Some data would be good here: How often do you expect programs to be loaded/unloaded in your use case? As for your other suggestion: > Also I see at least one other way to do it w/o regenerating dispatcher > every time: > > It can be created and attached once with "big enough" number of slots, > for example with 100 and programs may use use their corresponding slot > to freplace w/o regenerating the dispatcher. Having those big number of > no-op slots should not be a big deal from what I understand and kernel > can optimize it. I thought about having the dispatcher stay around for longer, and just replacing more function slots as new programs are added/removed. The reason I didn't go with this is the following: Modifying the dispatcher while it is loaded means that the modifications will apply to traffic on the interface immediately. This is fine for simple add/remove of a single program, but it limits which operations you can do atomically. E.g., you can't switch the order of two programs, or add or remove more than one, in a way that is atomic from the PoV of the traffic on the interface. Since I expect that we will need to support atomic operations even for these more complex cases, that means we'll need to support rebuilding the dispatcher anyway, and solving the race condition problem for that. And once we've done that, the simple add/remove in the existing dispatcher becomes just an additional code path that we'll need to maintain, so why bother? :) I am also not sure it's as simple as you say for the kernel to optimise a more complex dispatcher: The current dead code elimination relies on map data being frozen at verification time, so it's not applicable to optimising a dispatcher as it is being changed later. Now, this could probably be fixed and/or we could try doing clever tricks with the flow control in the dispatcher program itself. But again, why bother if we have to support the dispatcher rebuild mode of operation anyway? I may have missed something, of course, so feel free to point out if you see anything wrong with my reasoning above! > This is the main thing so far, I'll likely provide more feedback when > have some more time to read the code .. Sounds good! You're already being very helpful, so thank you! :) -Toke