On Mon, Jun 15, 2020 at 09:41:38AM -0700, Stanislav Fomichev wrote: > On Fri, Jun 12, 2020 at 8:50 PM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > [ .. ] > > > > It's probably ok, but makes me uneasy about verifier consequences. > > > > ctx->optval is PTR_TO_PACKET and it's a valid pointer from verifier pov. > > > > Do we have cases already where PTR_TO_PACKET == PTR_TO_PACKET_END ? > > > > I don't think we have such tests. I guess bpf prog won't be able to read > > > > anything and nothing will crash, but having PTR_TO_PACKET that is > > > > actually NULL would be an odd special case to keep in mind for everyone > > > > who will work on the verifier from now on. > > > > > > > > Also consider bpf prog that simply reads something small like 4 bytes. > > > > IP_FREEBIND sockopt (like your selftest in the patch 2) will have > > > > those 4 bytes, so it's natural for the prog to assume that it can read it. > > > > It will have > > > > p = ctx->optval; > > > > if (p + 4 > ctx->optval_end) > > > > /* goto out and don't bother logging, since that never happens */ > > > > *(u32*)p; > > > > > > > > but 'clever' user space would pass long optlen and prog suddenly > > > > 'not seeing' the sockopt. It didn't crash, but debugging would be > > > > surprising. > > > > > > > > I feel it's better to copy the first 4k and let the program see it. > > > Agreed with the IP_FREEBIND example wrt observability, however it's > > > not clear what to do with the cropped buffer if the bpf program > > > modifies it. > > > > > > Consider that huge iptables setsockopts where the usespace passes > > > PAGE_SIZE*10 optlen with real data and bpf prog sees only part of it. > > > Now, if the bpf program modifies the buffer (say, flips some byte), we > > > are back to square one. We either have to silently discard that buffer > > > or reallocate/merge. My reasoning with data == NULL, is that at least > > > there is a clear signal that the program can't access the data (and > > > can look at optlen to see if the original buffer is indeed non-zero > > > and maybe deny such requests?). > > > At this point I'm really starting to think that maybe we should just > > > vmalloc everything that is >PAGE_SIZE and add a sysclt to limit an > > > upper bound :-/ > > > I'll try to think about this a bit more over the weekend. > > > > Yeah. Tough choices. > > We can also detect in the verifier whether program accessed ctx->optval > > and skip alloc/copy if program didn't touch it, but I suspect in most > > case the program would want to read it. > > I think vmallocing what optlen said is DoS-able. It's better to > > stick with single page. > > Let's keep brainstorming. > Btw, can we use sleepable bpf for that? As in, do whatever I suggested > in these patches (don't expose optval>PAGE_SIZE via context), but add > a new helper where you can say 'copy x bytes from y offset of the > original optval' (the helper will do sleepable copy_form_user). > That way we have a clean signal to the BPF that the value is too big > (optval==optval_end==NULL) and the user can fallback to the helper to > inspect the value. We can also provide another helper to export new > value for this case. sleepable will be read-only and with toctou. I guess this patch is the least evil then ? But I'm confused with the test in patch 2. Why does it do 'if (optval > optval_end)' ? How is that possible when patch 1 makes them equal. may be another idea: allocate 4k, copy first 4k into it, but keep ctx.optlen as original. if bpf prog reads optval and finds it ok in setsockopt, it can set ctx.optlen = 0 which would mean run the rest of setsockopt handling with original '__user *optval' and ignore the buffer that was passed to bpf prog. In case of ctx.optlen < 4k the behavior won't change from bpf prog and from kernel pov. When ctx.optlen > 4k and prog didn't adjust it __cgroup_bpf_run_filter_setsockopt will do ret = -EFAULT; and reject sockopt. So bpf prog would be force to do something with large optvals. yet it will be able to examine first 4k bytes of it. It's a bit of change in behavior, but I don't think bpf prog is doing ctx.optlen = 0, since that's more or less the same as doing ctx.optlen = -2. or may be use some other indicator ? And do something similar with getsockopt.