Re: [RFC,PATCH 1/2] seccomp_filters: system call filtering using BPF

Will Drewry <wad@xxxxxxxxxxxx> · Fri, 13 Jan 2012 17:10:41 -0600

On Fri, Jan 13, 2012 at 1:01 PM, Will Drewry <wad@xxxxxxxxxxxx> wrote:
> On Fri, Jan 13, 2012 at 11:31 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>> On 01/12, Will Drewry wrote:
>>>
>>> On Thu, Jan 12, 2012 at 11:23 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>>> > On 01/12, Will Drewry wrote:
>>> >>
>>> >> On Thu, Jan 12, 2012 at 10:22 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>>> >> >> +      */
>>> >> >> +     regs = seccomp_get_regs(regs_tmp, &regs_size);
>>> >> >
>>> >> > Stupid question. I am sure you know what are you doing ;) and I know
>>> >> > nothing about !x86 arches.
>>> >> >
>>> >> > But could you explain why it is designed to use user_regs_struct ?
>>> >> > Why we can't simply use task_pt_regs() and avoid the (costly) regsets?
>>> >>
>>> >> So on x86 32, it would work since user_regs_struct == task_pt_regs
>>> >> (iirc), but on x86-64
>>> >> and others, that's not true.
>>> >
>>> > Yes sure, I meant that userpace should use pt_regs too.
>>> >
>>> >> If it would be appropriate to expose pt_regs to userspace, then I'd
>>> >> happily do so :)
>>> >
>>> > Ah, so that was the reason. But it is already exported? At least I see
>>> > the "#ifndef __KERNEL__" definition in arch/x86/include/asm/ptrace.h.
>>> >
>>> > Once again, I am not arguing, just trying to understand. And I do not
>>> > know if this definition is part of abi.
>>>
>>> I don't either :/  My original idea was to operate on task_pt_regs(current),
>>> but I noticed that PTRACE_GETREGS/SETREGS only uses the
>>> user_regs_struct. So I went that route.
>>
>> Well, I don't know where user_regs_struct come from initially. But
>> probably it is needed to allow to access the "artificial" things like
>> fs_base. Or perhaps this struct mimics the layout in the coredump.
>
> Not sure - added Roland whose name was on many of the files :)
>
> I just noticed that ptrace ABI allows pt_regs access using the register
> macros (PTRACE_PEEKUSR) and user_regs_struct access (PTRACE_GETREGS).
>
> But I think the latter is guaranteed to have a certain layout while the macros
> for PEEKUSR can do post-processing fixup.  (Which could be done in the
> bpf evaluator load_pointer() helper if needed.)
>
>>> I'd love for pt_regs to be fair game to cut down on the copying!
>>
>> Me too. I see no point in using user_regs_struct.
>
> I'll rev the change to use pt_regs and drop all the helper code.  If
> no one says otherwise, that certainly seems ideal from a performance
> perspective, and I see pt_regs exported to userland along with ptrace
> abi register offset macros.

On second thought, pt_regs is scary :)

>From looking at
  http://lxr.linux.no/linux+v3.2.1/arch/x86/include/asm/syscall.h#L97
and ia32syscall enty code, it appears that for x86, at least, the
pt_regs for compat processes will be 8 bytes wide per register on the
stack.  This means if a self-filtering 32-bit program runs on a 64-bit host in
IA32_EMU, its filters will always index into pt_regs incorrectly.

I'm not 100% that I am reading the code right, but it means that I can either
keep using user_regs_struct or fork the code behavior based on compat. That
would need to be arch dependent then which is pretty rough.

Any thoughts?

I'll do a v5 rev for Eric's comments soon, but I'm not quite sure
about the pt_regs
change yet.  If the performance boost is worth the effort of having a
per-arch fixup,
I can go that route.  Otherwise, I could look at some alternate approach for a
faster-than-regview payload.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html