On 12/9/20 10:59 AM, Joao Martins wrote: > On 12/8/20 7:29 PM, Jason Gunthorpe wrote: >> On Tue, Dec 08, 2020 at 05:29:00PM +0000, Joao Martins wrote: >> >>> static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int dirty) >>> { >>> + bool make_dirty = umem->writable && dirty; >>> + struct page **page_list = NULL; >>> struct sg_page_iter sg_iter; >>> + unsigned long nr = 0; >>> struct page *page; >>> >>> + page_list = (struct page **) __get_free_page(GFP_KERNEL); >> >> Gah, no, don't do it like this! >> >> Instead something like: >> >> for_each_sg(umem->sg_head.sgl, sg, umem->nmap, i) >> unpin_use_pages_range_dirty_lock(sg_page(sg), sg->length/PAGE_SIZE, >> umem->writable && dirty); >> >> And have the mm implementation split the contiguous range of pages into >> pairs of (compound head, ntails) with a bit of maths. >> > Got it :) > > I was trying to avoid another exported symbol. > > Albeit upon your suggestion below, it doesn't justify the efficiency/clearness lost. > This more efficient suggestion of yours leads to a further speed up from: 1073 rounds in 5.004 sec: 4663.622 usec / round (hugetlbfs) to 1370 rounds in 5.003 sec: 3651.562 usec / round (hugetlbfs) Right after I come back from holidays I will follow up with this series in separate. Joao