On Tue, 2022-05-24 at 10:22 +0300, Mike Rapoport wrote: > > @@ -949,6 +947,8 @@ static void bpf_prog_pack_free(struct > > bpf_binary_header *hdr) > > > > mutex_lock(&pack_mutex); > > if (hdr->size > bpf_prog_pack_size) { > > + set_memory_nx((unsigned long)hdr, hdr->size / > > PAGE_SIZE); > > + set_memory_rw((unsigned long)hdr, hdr->size / > > PAGE_SIZE); > > set_memory_{nx,rw} can fail. Please take care of error handling. I think there is nothing to do here, actually. At least on the freeing part. When called on a vmalloc primary address, the set_memory() calls will try to first change the permissions on the vmalloc alias, then try to change it on the direct map. Yea, it can fail from failing to allocate a page from a split, but whatever memory managed to change earlier will also succeed to change back on reset. The set_memory() functions don't rollback on failure. So all the modules callers depend on this logic of a second resetting call to reset anything that succeeded to change on the first call. The split will have already happened for any memory that succeeded to change, and the order of the changes is the same. As for the primary alias, for 4k vmallocs, it will always succeed, and set_memory() does this part first, so set_memory_x() will (crucially) always succeed for kernel text mappings. For 2MB vmalloc pages, the primary alias should succeed if the set_memory() call is 2MB aligned. So pre-VM_FLUSH_RESET_PERMS (and it also has pretty similar logic), they all went something like this: Allocate: 1. ptr = vmalloc(); 2. set_memory_ro(ptr); <-Could fail halfway though Free: 3. set_memory_rw(ptr); <-Could also fail halfway though and return an error, but only after the split work done in step 2. 4. vfree(ptr); It's pretty horrible. BPF people once tried to be more proper and change it to: ptr = vmalloc() if (set_memory_ro()) { vfree(ptr); } It looks correct, but this caused RO pages to be freed if set_memory_ro() half succeeded in changing permissions. If there is an error on reset, it means the first set_memory() call failed to change everything, but anything that succeeded is reset. So the right thing to do is to free the pages in either case. We could fail to allocate a pack if the initial set_memory() calls failed, but we still have to call set_memory_rw(), etc on free and ignore any errors. As an aside, this is one of the reasons it's hard to improve things incrementally here. Everything is carefully balanced like a house of cards. The solution IMO, is to change the interface so this type of logic is in one place instead of scattered in the callers.