On Tue, May 27, 2014 at 12:55 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: > On Tue, May 27, 2014 at 12:27 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >> On Tue, May 27, 2014 at 12:23 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: >>> On Tue, May 27, 2014 at 12:10 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >>>> On Tue, May 27, 2014 at 11:45 AM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: >>>>> On Tue, May 27, 2014 at 11:40 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >>>>>> On Tue, May 27, 2014 at 11:24 AM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: >>>>>>> On Mon, May 26, 2014 at 12:27 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >>>>>>>> On Fri, May 23, 2014 at 10:05 AM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: >>>>>>>>> On Thu, May 22, 2014 at 4:11 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >>>>>>>>>> On Thu, May 22, 2014 at 4:05 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: >>>>>>>>>>> Applying restrictive seccomp filter programs to large or diverse >>>>>>>>>>> codebases often requires handling threads which may be started early in >>>>>>>>>>> the process lifetime (e.g., by code that is linked in). While it is >>>>>>>>>>> possible to apply permissive programs prior to process start up, it is >>>>>>>>>>> difficult to further restrict the kernel ABI to those threads after that >>>>>>>>>>> point. >>>>>>>>>>> >>>>>>>>>>> This change adds a new seccomp extension action for synchronizing thread >>>>>>>>>>> group seccomp filters and a prctl() for accessing that functionality, >>>>>>>>>>> as well as a flag for SECCOMP_EXT_ACT_FILTER to perform sync at filter >>>>>>>>>>> installation time. >>>>>>>>>>> >>>>>>>>>>> When calling prctl(PR_SECCOMP_EXT, SECCOMP_EXT_ACT, SECCOMP_EXT_ACT_FILTER, >>>>>>>>>>> flags, filter) with flags containing SECCOMP_FILTER_TSYNC, or when calling >>>>>>>>>>> prctl(PR_SECCOMP_EXT, SECCOMP_EXT_ACT, SECCOMP_EXT_ACT_TSYNC, 0, 0), it >>>>>>>>>>> will attempt to synchronize all threads in current's threadgroup to its >>>>>>>>>>> seccomp filter program. This is possible iff all threads are using a filter >>>>>>>>>>> that is an ancestor to the filter current is attempting to synchronize to. >>>>>>>>>>> NULL filters (where the task is running as SECCOMP_MODE_NONE) are also >>>>>>>>>>> treated as ancestors allowing threads to be transitioned into >>>>>>>>>>> SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS, ...) has been set on the >>>>>>>>>>> calling thread, no_new_privs will be set for all synchronized threads too. >>>>>>>>>>> On success, 0 is returned. On failure, the pid of one of the failing threads >>>>>>>>>>> will be returned, with as many filters installed as possible. >>>>>>>>>> >>>>>>>>>> Is there a use case for adding a filter and synchronizing filters >>>>>>>>>> being separate operations? If not, I think this would be easier to >>>>>>>>>> understand and to use if there was just a single operation. >>>>>>>>> >>>>>>>>> Yes: if the other thread's lifetime is not well controlled, it's good >>>>>>>>> to be able to have a distinct interface to retry the thread sync that >>>>>>>>> doesn't require adding "no-op" filters. >>>>>>>> >>>>>>>> Wouldn't this still be solved by: >>>>>>>> >>>>>>>> seccomp_add_filter(final_filter, SECCOMP_FILTER_ALL_THREADS); >>>>>>>> >>>>>>>> the idea would be that, if seccomp_add_filter fails, then you give up >>>>>>>> and, if it succeeds, then you're done. It shouldn't fail unless out >>>>>>>> of memory or you've nested too deeply. >>>>>>> >>>>>>> I wanted to keep the case of being able to to wait for non-ancestor >>>>>>> threads to finish. For example, 2 threads start and set separate >>>>>>> filters. 1 does work and exits, 2 starts another thread (3) which adds >>>>>>> filters, does work, and then waits for 1 to finish by calling TSYNC. >>>>>>> Once 1 dies, TSYNC succeeds. In the case of not having direct control >>>>>>> over thread lifetime (say, when using third-party libraries), I'd like >>>>>>> to retain the flexibility of being able to do TSYNC without needing a >>>>>>> filter being attached to it. >>>>>> >>>>>> I must admit this strikes me as odd. What's the point of having a >>>>>> thread set a filter if it intends to be a short-lived thread? >>>>> >>>>> I was illustrating the potential insanity of third-party libraries. >>>>> There isn't much sense in that behavior, but if it exists, working >>>>> around it is harder without the separate TSYNC-only call. >>>>> >>>>>> In any case, I must have missed the ability for TSYNC to block. Hmm. >>>>>> That seems complicated, albeit potentially useful. >>>>> >>>>> Oh, no, I didn't mean to imply TSYNC should block. I meant that thread >>>>> 3 could do: >>>>> >>>>> while (TSYNC-fails) >>>>> wait-on-or-kill-unexpected-thread >>>>> >>>> >>>> Ok. >>>> >>>> I'm still not seeing the need for a separate TSYNC option, though -- >>>> just add-a-filter-across-all-threads would work if it failed >>>> harmlessly on error. FWIW, TSYNC is probably equivalent to adding an >>>> always-accept filter across all threads, although no one should really >>>> do the latter for efficiency reasons. >>> >>> Given the complexity of the locking, "fail" means "I applied the >>> change to all threads except for at least this one: *error code*", >>> which means looping with the "add-a-filter" method means all the other >>> threads keep getting filters added until there is full success. I >>> don't want that overhead, so this keeps TSYNC distinctly separate. >> >> Ugh, right. >> >>> >>> Because of the filter addition, when using add_filter-TSYNC, it's not >>> sensible to continue after a failure. However, using just-TSYNC allows >>> sensible re-trying. Since the environments where TSYNC intend to be >>> used in can be very weird, I really want to retain the retry ability. >> >> OK. So what's wrong with the other approach in which the >> add-to-all-threads option always succeeds? IOW, rather than requiring >> that all threads have the caller's filter as an ancestor, just add the >> requested filter to all threads. As an optimization, if the targetted >> thread has the current thread's filter as its filter, too, then the >> targetted thread can stay synchronized. >> >> That way the add filter call really is atomic. >> >> I'm not fundamentally opposed to TSYNC, but I think I'd be happier if >> the userspace interface could be kept as simple as possible. The fact >> that there's a filter hierarchy is sort of an implementation detail, I >> think. > > I'm totally on board with making this as simple as possible. :) The > corner cases are kind of horrible, though, but I think this is already > as simple as it can get. > > Externally, without the ancestry check, we can run the risk of have > unstable behavior out of a filter change. Imagine the case of a race > where a thread is adding a filter (via prctl), and the other thread > attempts to TSYNC a filter that blocks prctl. > > In the "always take the new filter" case, sometimes we get two filters > (original and TSYNCed) on the first thread, and sometimes it blows up > when it calls prctl (TSYNCed filter blocks the prctl). There's no way > for the TSYNC caller to detect who won the race. > > With the patch series as-is, losing the race results in a TSYNC > failure (ancestry doesn't match). This is immediately detectable and > the caller can the decide how to handle the situation directly. > > Regardless, I don't think the filter hierarchy is an implementation > detail -- complex filters take advantage of the hierarchies. And > keeping this hierarchy stable means the filters are simpler to > validate for correctness, etc. Hmm. Another issue: unless I'm not looking at the right version of this, I don't think that while_each_thread does what you think it does. I'm generally in favor of making the kernel interfaces easy to use, even at the cost of a bit of implementation complexity. Would this be simpler if you hijacked siglock to protect seccomp state? Then I think you could just make everything atomic. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html