Re: [PATCH bpf v3 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 12, 2020 at 06:03:03PM -0700, Stanislav Fomichev wrote:
> On Fri, Jun 12, 2020 at 5:34 PM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Mon, Jun 08, 2020 at 11:27:47AM -0700, Stanislav Fomichev wrote:
> > > Attaching to these hooks can break iptables because its optval is
> > > usually quite big, or at least bigger than the current PAGE_SIZE limit.
> > > David also mentioned some SCTP options can be big (around 256k).
> > >
> > > There are two possible ways to fix it:
> > > 1. Increase the limit to match iptables max optval. There is, however,
> > >    no clear upper limit. Technically, iptables can accept up to
> > >    512M of data (not sure how practical it is though).
> > >
> > > 2. Bypass the value (don't expose to BPF) if it's too big and trigger
> > >    BPF only with level/optname so BPF can still decide whether
> > >    to allow/deny big sockopts.
> > >
> > > The initial attempt was implemented using strategy #1. Due to
> > > listed shortcomings, let's switch to strategy #2. When there is
> > > legitimate a real use-case for iptables/SCTP, we can consider increasing
> > > the PAGE_SIZE limit.
> > >
> > > v3:
> > > * don't increase the limit, bypass the argument
> > >
> > > v2:
> > > * proper comments formatting (Jakub Kicinski)
> > >
> > > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> > > Cc: David Laight <David.Laight@xxxxxxxxxx>
> > > Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxx>
> > > ---
> > >  kernel/bpf/cgroup.c | 18 ++++++++++++++----
> > >  1 file changed, 14 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > > index fdf7836750a3..758082853086 100644
> > > --- a/kernel/bpf/cgroup.c
> > > +++ b/kernel/bpf/cgroup.c
> > > @@ -1276,9 +1276,18 @@ static bool __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp,
> > >
> > >  static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen)
> > >  {
> > > -     if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0)
> > > +     if (unlikely(max_optlen < 0))
> > >               return -EINVAL;
> > >
> > > +     if (unlikely(max_optlen > PAGE_SIZE)) {
> > > +             /* We don't expose optvals that are greater than PAGE_SIZE
> > > +              * to the BPF program.
> > > +              */
> > > +             ctx->optval = NULL;
> > > +             ctx->optval_end = NULL;
> > > +             return 0;
> > > +     }
> >
> > It's probably ok, but makes me uneasy about verifier consequences.
> > ctx->optval is PTR_TO_PACKET and it's a valid pointer from verifier pov.
> > Do we have cases already where PTR_TO_PACKET == PTR_TO_PACKET_END ?
> > I don't think we have such tests. I guess bpf prog won't be able to read
> > anything and nothing will crash, but having PTR_TO_PACKET that is
> > actually NULL would be an odd special case to keep in mind for everyone
> > who will work on the verifier from now on.
> >
> > Also consider bpf prog that simply reads something small like 4 bytes.
> > IP_FREEBIND sockopt (like your selftest in the patch 2) will have
> > those 4 bytes, so it's natural for the prog to assume that it can read it.
> > It will have
> > p = ctx->optval;
> > if (p + 4 > ctx->optval_end)
> >  /* goto out and don't bother logging, since that never happens */
> > *(u32*)p;
> >
> > but 'clever' user space would pass long optlen and prog suddenly
> > 'not seeing' the sockopt. It didn't crash, but debugging would be
> > surprising.
> >
> > I feel it's better to copy the first 4k and let the program see it.
> Agreed with the IP_FREEBIND example wrt observability, however it's
> not clear what to do with the cropped buffer if the bpf program
> modifies it.
> 
> Consider that huge iptables setsockopts where the usespace passes
> PAGE_SIZE*10 optlen with real data and bpf prog sees only part of it.
> Now, if the bpf program modifies the buffer (say, flips some byte), we
> are back to square one. We either have to silently discard that buffer
> or reallocate/merge. My reasoning with data == NULL, is that at least
> there is a clear signal that the program can't access the data (and
> can look at optlen to see if the original buffer is indeed non-zero
> and maybe deny such requests?).
> At this point I'm really starting to think that maybe we should just
> vmalloc everything that is >PAGE_SIZE and add a sysclt to limit an
> upper bound :-/
> I'll try to think about this a bit more over the weekend.

Yeah. Tough choices.
We can also detect in the verifier whether program accessed ctx->optval
and skip alloc/copy if program didn't touch it, but I suspect in most
case the program would want to read it.
I think vmallocing what optlen said is DoS-able. It's better to
stick with single page.
Let's keep brainstorming.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux