On Fri, May 14, 2021 at 9:21 PM Dan Siemon <dan@xxxxxxxxxxxxx> wrote: > > I've been trying to work with large Umem areas and have a few questions > . I'd appreciate any help or pointers. If it makes any difference, my > AF_XDP testing is with i40e. These issues are driver independent, but I appreciated that you report this. As you are very well aware of, some things are driver dependent. > 1) I use kernel args to reserve huge pages on boot. The application > mmap call with the huge TLB flag appears to use huge pages as I can see > the count of used huge pages go up (/proc/meminfo). However, the number > of pages used by the umem, as shown in ss output, looks to still be 4k > pages. Are there plans to support huge pages in Umem? How hard would > this be? Something similar has been on the todo list for two years, but sadly neither Björn nor I have had any time to pick this up and cannot see me having the time to pick it up in the foreseeable future either. There are at least 3 problems that would have to be addressed in this area: 1: Using a huge page for the umem kernel mapping. As you have allocated this using a huge page, it will be physically consecutive. 2: Making sure dma addresses are physically consecutive 3: Using a huge page for the IOMMU and its DMA mappings #1 and #3 are hard problems, at least in my mind. I am no mm or iommu guy, but I do not believe that there is support for this in the kernel for use by kernel mappings. The kernel will break down huge-pages into 4K pages for its own mappings. If I am incorrect, I hope that someone reading this will correct me. But we should do some mailing list browsing here to see what the latest thoughts are and what has been tried before. As for #2, Björn had some discussions with the iommu maintainer about this in the past [1]. There is no such interface in the iommu subsystem today, but components such as graphics drivers use a "hack" to make sure that this happens and if not fail. We do not have to fail, as we can always fall back to the method we have today. Today we have an array (dma_addr_t *dma_pages) to store all the addresses to the 4K DMA address regions. With this new interface in place, we could replace the array with just a single address pointing to the start of the area, improving performance. #2 is a prerequisite for #3 too. Christoph Hellwig submitted an interface proposal about a year ago [1], but nobody has taken on the challenge to implement it. [1] https://lkml.org/lkml/2020/7/8/131 > 2) It looks like there is a limit of 2GB on the maximum Umem size? I've > tried with and without huge pages. Is this fundamental? How hard would > it be to increase this? This was news to me. Do you know where in the xdp_umem_reg code it complains about this? I guess it is xsk_umem__create() that fails, or? The only limit I see from a basic inspection of the code is that the number of packet buffers cannot be larger than a u32 (4G). But you are not close to that limit. Björn, do you know where this limit stems from? Thanks: Magnus > For both of these, I'd like to try to help make them happen. If the > kernel side changes are deep or large, it may be beyond me but I can > offer lab equipment and testing. > > Thanks. >