Re: Compat 32-bit syscall entry from 64-bit task!? [was: Re: [RFC,PATCH 1/2] seccomp_filters: system call filtering using BPF]

Jamie Lokier <jamie@xxxxxxxxxxxxx> · Thu, 19 Jan 2012 16:11:27 +0000

Indan Zupancic wrote:
> On Thu, January 19, 2012 09:16, Chris Evans wrote:
> > On Wed, Jan 18, 2012 at 4:14 PM, Indan Zupancic <indan@xxxxxx> wrote:
> >> On Wed, January 18, 2012 22:13, Chris Evans wrote:
> >>> On Wed, Jan 18, 2012 at 4:12 AM, Indan Zupancic <indan@xxxxxx> wrote:
> >>>> On Wed, January 18, 2012 06:43, Chris Evans wrote:
> >>>>> 2) Tracee traps
> >>>>> 2b) Tracee could take a SIGKILL here
> >>>>> 3) Tracer looks at registers; bad syscall
> >>>>> 3b) Or tracee could take a SIGKILL here
> >>>>> 4) The only way to stop the bad syscall from executing is to rewrite
> >>>>> orig_eax (PTRACE_CONT + SIGKILL only kills the process after the
> >>>>> syscall has finished)
> >>>>
> >>>> Yes, we rewrite it to -1.
> >>>>
> >>>>> 5) Disaster: the tracee took a SIGKILL so any attempt to address it by
> >>>>> pid (such as PTRACE_SETREGS) fails.
> >>>>
> >>>> I assume that if a task can execute system calls and we get ptrace events
> >>>> for that, that we can do other ptrace operations too. Are you saying that
> >>>> the kernel has this ptrace gap between SIGKILL and task exit where ptrace
> >>>> doesn't work but the task continues executing system calls? That would be
> >>>> a huge bug, but it seems very unlikely too, as the task is stopped and
> >>>> shouldn't be able to disappear till it is continued by the tracer.
> >>>>
> >>>> I mean, really? That would be stupid.
> >>
> >> Okay, I tested this scenario and you're right, we're screwed.
> >>
> >> What the hell guys?
> >
> > Steady on :) ptrace() has never been sold as a technology upon which
> > its safe to build security solutions.
> 
> Well, that can be said of pretty much all kernel functionality.
> That is no excuse for crazy behaviour.
> 
> I more or less fixed it by turning all SIGKILLs into SIGTERMs.
> Perhaps I should use a more obscure signal instead.
> 
> >> What about other PID checks in the kernel, are they still
> >> safe if the process looks dead but is still active? Or is it a ptrace-only
> >> problem?
> >>
> >>>> If true we have to work around it by disallowing SIGKILL and just sending
> >>>> them ourselves within the jail. Meh.
> >>
> >> I guess this helps a bit. It doesn't prevent external signals, but prisoners
> >> don't have control over that.
> >
> > Well.... a prisoner may be able to play other tricks:
> > - Allocate lots of memory... kernel may start spraying around SIGKILLs
> > - Sending SIGKILL via prctl()
> 
> prctl is disallowed within our jail. Did you had PR_SET_PDEATHSIG in mind?
> But doesn't the tracer become the parent when ptracing or not for this?
> Or were you thinking about enabling SECCOMP and counting on the SIGKILL
> being process-wide instead of thread-specific?
> 
> > - Sending SIGKILL via fcntl()
> 
> I haven't written the fcntl demultiplexor yet, but I missed fcntl could
> be used for sending signals. I knew there was whacky stuff in there, but
> didn't expect it to be that bad. Thanks.
> 
> > - Sending SIGKILL via clone()
> 
> How? And can you send it to another process than yourself?
> 
> >
> >>
> >> Is this SIGKILL specific or is it true for all task ending signals?
> >
> > Can't remember - try it?
> 
> Tried: It's safe with SIGTERM, so I assume the others are fine too.
> I'll double check though...
> 
> >>
> >>>> How will you avoid file path races with BPF?
> >>>
> >>> There is typically no need for file-path based access control in an FTP server.
> >>> Take for example anonymous FTP, which will typically be inside a
> >>> chroot() to /var/ftp. Inside that filesystem tree -- if you can open()
> >>> it, you can have it.
> >>
> >> Ah, you count on having root access. We don't.
> >>
> >> Do you know any more crazy security destroying holes?
> >
> > Try spraying SIGCONT and / or SIGSTOP at tracees. It may be possible
> > to confuse the tracer about whether a SIGTRAP event is syscall entry
> > or exit.
> 
> Yes, heard about that weirdness before, but it's all ignored. We're
> using PTRACE_O_TRACESYSGOOD.
> 
> > Try doing an execve() that fails. May cause similar state confusion in
> > the tracer.
> 
> Our jailer pretty much ignores all signals and only handles syscalls
> and task exits. We actually check execve's return value to know if we
> have to do our stuff or not.

Take a look at the file README-linux-ptrace in recent strace Git.
(Thanks Denys!)

It describes some *really* ugly things Linux does to ptrace on execve
when there are threads: The most exciting being the return value is
sent to a different tid than called execve(), and other tids magically
disappear without notification.

You can use PTRACE_O_TRACEEXEC to see if the execve() succeeds, btw.
It has the useful side-effect of preventing the legacy behaviour of
SIGTRAP being sent as a normal queued signal after successful execve().

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html