On Mon, Aug 12, 2024 at 10:15 AM Kui-Feng Lee <sinquersw@xxxxxxxxx> wrote: > > > > On 8/12/24 09:58, Alexei Starovoitov wrote: > > On Wed, Aug 7, 2024 at 4:58 PM Kui-Feng Lee <thinker.li@xxxxxxxxx> wrote: > >> + > >> + user_data_mmap = mmap(NULL, sizeof(*user_data_mmap), PROT_READ | PROT_WRITE, > >> + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > >> + if (!ASSERT_NEQ(user_data_mmap, MAP_FAILED, "mmap")) > >> + return; > >> + > >> + memcpy(user_data_mmap, &user_data_mmap_v, sizeof(*user_data_mmap)); > >> + value.udata_mmap = user_data_mmap; > >> + value.udata = &user_data; > > > > There shouldn't be a need to do mmap(). It's too much memory overhead. > > The user should be able to write: > > static __thread struct user_data udata; > > value.udata = &udata; > > bpf_map_update_elem(map_fd, my_task_fd, &value) > > and do it once. > > Later multi thread user code will just access "udata". > > No map lookups. > > mmap() is not necessary here. There are two pointers here. > udata_mmap one is used to test the case crossing page boundary although > in the current RFC it fails to do it. It will be fixed later. > udata one works just like what you have described, except user_data is a > local variable. Hmm. I guess I misread the code. But then: + struct user_data user_data user_data = ...; + value.udata = &user_data; how is that supposed to work when the address points to the stack? I guess the kernel can still pin that page, but it will be junk as soon as the function returns. > > > > If sizeof(udata) is small enough the kernel will pin either > > one or two pages (if udata crosses page boundary). > > > > So no extra memory consumption by the user process while the kernel > > pins a page or two. > > In a good case it's one page and no extra vmap. > > I wonder whether we should enforce that one page case. > > It's not hard for users to write: > > static __thread struct user_data udata __attribute__((aligned(sizeof(udata)))); > > With one page restriction, the implementation would be much simpler. If > you think it is a reasonable restriction, I would enforce this rule. I'm worried about vmap(). Doing it for every map elemen (same as every task) might add substantial kernel side overhead. On my devserver: sudo cat /proc/vmallocinfo |grep vmap|wc -l 105 sudo cat /proc/vmallocinfo |wc -l 17608 I believe that the mechanism scales to millions, but adding one more vmap per element feels like a footgun. To avoid that the user would need to make sure their user_data doesn't cross the page, so imo we can make this mandatory.