On Thu, May 20, 2021 at 1:22 AM Tianyin Xu <tyxu@xxxxxxxxxxxx> wrote: > > On Mon, May 17, 2021 at 12:08 PM Sargun Dhillon <sargun@xxxxxxxxx> wrote: > > > > While I agree with you that this is the case right now, there's no reason it > > has to be the case. There's a variety of mechanisms that can be employed > > to significantly speed up the performance of the notifier. For example, right > > now the notifier is behind one large per-filter lock. That could be removed > > allowing for better concurrency. There are a large number of mechanisms > > that scale O(n) with the outstanding notifications -- again, something > > that could be improved. > > Thanks for the pointer! But, I don’t think this can fundamentally > eliminate the performance gap between the notifiers and the ebpf > filters. IMHO, the additional context switches of user notifiers make > the difference. > I mean, I still think it can be closed. Or at least get better. I've thought about working on performance improvements, but they're lower on the list than functionality changes. > > > > The other big improvement that could be made is being able to use something > > like io_uring with the notifier interface, but it would require a > > fairly significant > > user API change -- and a move away from ioctl. I'm not sure if people are > > excited about that idea at the moment. > > > > Apologize that I don’t fully understand your proposal. My > understanding about io_uring is that it allows you to amortize the > cost of context switch but not eliminate it, unless you are willing to > dedicate a core for it. I still believe that, even with io_uring, user > notifiers are going to be much slower than eBPF filters. The notifier gets significantly slower as a function of the notifications. If you have a large number of notifications in flight, or if you're trying to concurrently handle a large number of notifications, it gets slower. This is where something like io_uring is super useful in terms of reducing wakeups. Also, in the original futex2 patches, it had a mechanism to better handle (scheduling) of notifier like cases[1]. If the seccomp notifier did a similar thing, we could see better performance. > > Btw, our patches are based on your patch set (thank you!). Are you > using user notifiers (with your improved version?) these days? It will > be nice to hear your opinions on ebpf filters. > I'm so glad that someone is picking up the work on this. > > > > > > > > > > >> eBPF doesn't really have a privilege model yet. There was a long and > > > > >> disappointing thread about this awhile back. > > > > > > > > > > The idea is that “seccomp-eBPF does not make life easier for an > > > > > adversary”. Any attack an adversary could potentially utilize > > > > > seccomp-eBPF, they can do the same with other eBPF features, i.e. it > > > > > would be an issue with eBPF in general rather than specifically > > > > > seccomp’s use of eBPF. > > > > > > > > > > Here it is referring to the helpers goes to the base > > > > > bpf_base_func_proto if the caller is unprivileged (!bpf_capable || > > > > > !perfmon_capable). In this case, if the adversary would utilize eBPF > > > > > helpers to perform an attack, they could do it via another > > > > > unprivileged prog type. > > > > > > > > > > That said, there are a few additional helpers this patchset is adding: > > > > > * get_current_uid_gid > > > > > * get_current_pid_tgid > > > > > These two provide public information (are namespaces a concern?). I > > > > > have no idea what kind of exploit it could add unless the adversary > > > > > somehow side-channels the task_struct? But in that case, how is the > > > > > reading of task_struct different from how the rest of the kernel is > > > > > reading task_struct? > > > > > > > > Yes, namespaces are a concern. This idea got mostly shot down for kdbus > > > > (what ever happened to that?), and it likely has the same problems for > > > > seccomp. > > > > So, we actually have a case where we want to inspect an argument -- We want to look at the FD number that's passed to the sendmsg syscall, and then see if that's an AF_INET socket, and if it is, then pass back to notifier, otherwise allow it to continue through. This is an area where I can see eBPF being very useful. > > > > >> > > > > >> What is this for? > > > > > > > > > > Memory reading opens up lots of use cases. For example, logging what > > > > > files are being opened without imposing too much performance penalty > > > > > from strace. Or as an accelerator for user notify emulation, where > > > > > syscalls can be rejected on a fast path if we know the memory contents > > > > > does not satisfy certain conditions that user notify will check. > > > > > > > > > > > > > This has all kinds of race conditions. > > > > > > > > > > > > I hate to be a party pooper, but this patchset is going to very high bar > > > > to acceptance. Right now, seccomp has a couple of excellent properties: > > > > > > > > First, while it has limited expressiveness, it is simple enough that the > > > > implementation can be easily understood and the scope for > > > > vulnerabilities that fall through the cracks of the seccomp sandbox > > > > model is low. Compare this to Windows' low-integrity/high-integrity > > > > sandbox system: there is a never ending string of sandbox escapes due to > > > > token misuse, unexpected things at various integrity levels, etc. > > > > Seccomp doesn't have tokens or integrity levels, and these bugs don't > > > > happen. > > > > > > > > Second, seccomp works, almost unchanged, in a completely unprivileged > > > > context. The last time making eBPF work sensibly in a less- or > > > > -unprivileged context, the maintainers mostly rejected the idea of > > > > developing/debugging a permission model for maps, cleaning up the bpf > > > > object id system, etc. You are going to have a very hard time > > > > convincing the seccomp maintainers to let any of these mechanism > > > > interact with seccomp until the underlying permission model is in place. > > > > > > > > --Andy > > > > > > Thanks for pointing out the tradeoff between expressiveness vs. simplicity. > > > > > > Note that we are _not_ proposing to replace cbpf, but propose to also > > > support ebpf filters. There certainly are use cases where cbpf is > > > sufficient, but there are also important use cases ebpf could make > > > life much easier. > > > > > > Most importantly, we strongly believe that ebpf filters can be > > > supported without reducing security. > > > > > > No worries about “party pooping” and we appreciate the feedback. We’d > > > love to hear concerns and collect feedback so we can address them to > > > hit that very high bar. > > > > > > > > > ~t > > > > > > -- > > > Tianyin Xu > > > University of Illinois at Urbana-Champaign > > > https://urldefense.com/v3/__https://tianyin.github.io/__;!!DZ3fjg!o4__Ob32oapUDg9_f6hzksoFiX9517CJ5-w8qtG9i-WKFs_xWbGQfUHpLjHjCddw$ > [1]: https://lore.kernel.org/lkml/20210215152404.250281-1-andrealmeid@xxxxxxxxxxxxx/T/