On Fri, Jun 12, 2020 at 8:50 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: [ .. ] > > > It's probably ok, but makes me uneasy about verifier consequences. > > > ctx->optval is PTR_TO_PACKET and it's a valid pointer from verifier pov. > > > Do we have cases already where PTR_TO_PACKET == PTR_TO_PACKET_END ? > > > I don't think we have such tests. I guess bpf prog won't be able to read > > > anything and nothing will crash, but having PTR_TO_PACKET that is > > > actually NULL would be an odd special case to keep in mind for everyone > > > who will work on the verifier from now on. > > > > > > Also consider bpf prog that simply reads something small like 4 bytes. > > > IP_FREEBIND sockopt (like your selftest in the patch 2) will have > > > those 4 bytes, so it's natural for the prog to assume that it can read it. > > > It will have > > > p = ctx->optval; > > > if (p + 4 > ctx->optval_end) > > > /* goto out and don't bother logging, since that never happens */ > > > *(u32*)p; > > > > > > but 'clever' user space would pass long optlen and prog suddenly > > > 'not seeing' the sockopt. It didn't crash, but debugging would be > > > surprising. > > > > > > I feel it's better to copy the first 4k and let the program see it. > > Agreed with the IP_FREEBIND example wrt observability, however it's > > not clear what to do with the cropped buffer if the bpf program > > modifies it. > > > > Consider that huge iptables setsockopts where the usespace passes > > PAGE_SIZE*10 optlen with real data and bpf prog sees only part of it. > > Now, if the bpf program modifies the buffer (say, flips some byte), we > > are back to square one. We either have to silently discard that buffer > > or reallocate/merge. My reasoning with data == NULL, is that at least > > there is a clear signal that the program can't access the data (and > > can look at optlen to see if the original buffer is indeed non-zero > > and maybe deny such requests?). > > At this point I'm really starting to think that maybe we should just > > vmalloc everything that is >PAGE_SIZE and add a sysclt to limit an > > upper bound :-/ > > I'll try to think about this a bit more over the weekend. > > Yeah. Tough choices. > We can also detect in the verifier whether program accessed ctx->optval > and skip alloc/copy if program didn't touch it, but I suspect in most > case the program would want to read it. > I think vmallocing what optlen said is DoS-able. It's better to > stick with single page. > Let's keep brainstorming. Btw, can we use sleepable bpf for that? As in, do whatever I suggested in these patches (don't expose optval>PAGE_SIZE via context), but add a new helper where you can say 'copy x bytes from y offset of the original optval' (the helper will do sleepable copy_form_user). That way we have a clean signal to the BPF that the value is too big (optval==optval_end==NULL) and the user can fallback to the helper to inspect the value. We can also provide another helper to export new value for this case. WDYT?