> > > > > > I'm a bit concerned that this will introduce the latency back if vmap_lazy_nr > > > is greater than half of lazy_max_pages(). Which IIUC will be more likely if > > > the number of CPUs is large. > > > > > The threshold that we establish is two times more than lazy_max_pages(), > > i.e. in case of 4 system CPUs lazy_max_pages() is 24576, therefore the > > threshold is 49152, if PAGE_SIZE is 4096. > > > > It means that we allow rescheduling if vmap_lazy_nr < 49152. If vmap_lazy_nr > > is higher then we forbid rescheduling and free areas until it becomes lower > > again to stabilize the system. By doing that, we will not allow vmap_lazy_nr > > to be enormously increased. > > Sorry for late reply. > > This sounds reasonable. Such an extreme situation of vmap_lazy_nr being twice > the lazy_max_pages() is probably only possible using a stress test anyway > since (hopefully) the try_purge_vmap_area_lazy() call is happening often > enough to keep the vmap_lazy_nr low. > > Have you experimented with what is the highest threshold that prevents the > issues you're seeing? Have you tried 3x or 4x the vmap_lazy_nr? > I do not think it make sense to go with 3x/4x/etc threshold for many reasons. One of them is we just need to prevent that skew, returning back to reasonable balance. > > I also wonder what is the cost these days of the global TLB flush on the most > common Linux architectures and if the whole purge vmap_area lazy stuff is > starting to get a bit dated, and if we can do the purging inline as areas are > freed. There is a cost to having this mechanism too as you said, which is as > the list size grows, all other operations also take time. > I guess if we go with flushing the TLB each time per each vmap_area freeing, then i think we are in trouble. Though, i have not analyzed how that approach impacts performance. I agree about the cost of having such mechanism. Basically it is one of the source of bigger fragmentation(not limited to it). But from the other hand the KVA allocator has to be capable of effective and fast allocation even in that condition. I am working on v2 of https://lkml.org/lkml/2018/10/19/786. When i complete some other job related tasks i will upload a new RFC. -- Vlad Rezki