Re: [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> · Tue, 10 Mar 2020 13:22:34 +0100

Jakub Kicinski <kuba@xxxxxxxxxx> writes:

> On Mon, 09 Mar 2020 12:41:14 +0100 Toke Høiland-Jørgensen wrote:
>> > You said that like the library doesn't arbitrate access and manage
>> > resources.. It does exactly the same work the daemon would do.  
>> 
>> Sure, the logic is in the library, but the state (which programs are
>> loaded) and synchronisation primitives (atomic replace of attached
>> program) are provided by the kernel. 
>
> I see your point of view. The state in the kernel which the library has
> to read out every time is what I was thinking of as deserialization.

Ohh, right. I consider the BTF-embedded data as 'configuration data'
which is different to 'state' in my mind. So hence my confusion about
what you were talking about re: state :)

> The library has to take some lock, and then read the state from the
> kernel, and then construct its internal state based on that. I think
> you have some cleverness there to stuff everything in BTF so far, but
> I'd expect if the library grows that may become cumbersome and
> wasteful (it's pinned memory after all).
>
> Parsing the packet once could be an example of something that could be
> managed by the library to avoid wasted cycles. Then programs would have
> to describe their requirements, and library may need to do rewrites of
> the bytecode.

Hmm, I've been trying to make libxdp fairly minimal in scope. It seems
like you are assuming that we'll end up with lots of additional
functionality? Do you have anything in particular in mind, or are you
talking in general terms here?

> I guess everything can be stuffed into BTF, but I'm not 100% sure
> kernel is supposed to be a database either.

I actually started out with the BTF approach because I wanted something
that could be part of the program bytecode (instead of, say, an external
config file that had to be carried along with the .o file). That it
survives a round-trip into the kernel turned out to be a nice bonus :)

I do agree with you in general terms, though: There's probably a limit
to how much stuff we can stick into this. The obvious better-suited
storage mechanism for more data is a BPF map, isn't it? I'm not sure
there's any point in moving to that before we have actual use cases for
richer state/metadata, though?

> Note that the atomic replace may not sufficient for safe operation, as
> reading the state from the kernel is also not atomic.

Yeah, there's a potential for read-update-write races. However, assuming
that the dispatcher program itself is not modified after initial setup
(i.e., we build a new one every time), I think this can be solved with a
"cmpxchg" operation where userspace includes the fd of the program it
thinks it is replacing, and the kernel refuses the operation if this
doesn't match. Do you disagree that this would be sufficient?

-Toke