On 2/4/21 12:15 AM, John Hubbard wrote: > On 2/3/21 2:00 PM, Joao Martins wrote: >> Use the newly added unpin_user_page_range_dirty_lock() >> for more quickly unpinning a consecutive range of pages >> represented as compound pages. This will also calculate >> number of pages to unpin (for the tail pages which matching >> head page) and thus batch the refcount update. >> >> Running a test program which calls mr reg/unreg on a 1G in size >> and measures cost of both operations together (in a guest using rxe) >> with THP and hugetlbfs: > > In the patch subject line: > > s/__ib_mem_release/__ib_umem_release/ > Ah, yes. >> >> Before: >> 590 rounds in 5.003 sec: 8480.335 usec / round >> 6898 rounds in 60.001 sec: 8698.367 usec / round >> >> After: >> 2631 rounds in 5.001 sec: 1900.618 usec / round >> 31625 rounds in 60.001 sec: 1897.267 usec / round >> >> Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx> >> --- >> drivers/infiniband/core/umem.c | 12 ++++++------ >> 1 file changed, 6 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c >> index 2dde99a9ba07..ea4ebb3261d9 100644 >> --- a/drivers/infiniband/core/umem.c >> +++ b/drivers/infiniband/core/umem.c >> @@ -47,17 +47,17 @@ >> >> static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int dirty) >> { >> - struct sg_page_iter sg_iter; >> - struct page *page; >> + bool make_dirty = umem->writable && dirty; >> + struct scatterlist *sg; >> + int i; > > Maybe unsigned int is better, so as to perfectly match the scatterlist.length. > Will fix. >> >> if (umem->nmap > 0) >> ib_dma_unmap_sg(dev, umem->sg_head.sgl, umem->sg_nents, >> DMA_BIDIRECTIONAL); >> >> - for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) { >> - page = sg_page_iter_page(&sg_iter); >> - unpin_user_pages_dirty_lock(&page, 1, umem->writable && dirty); >> - } >> + for_each_sg(umem->sg_head.sgl, sg, umem->nmap, i) > > The change from umem->sg_nents to umem->nmap looks OK, although we should get > IB people to verify that there is not some odd bug or reason to leave it as is. > /me nods fwiw this was suggested by Jason :) as the way I had done was unnecessarily allocating a page to unpin pages. >> + unpin_user_page_range_dirty_lock(sg_page(sg), >> + DIV_ROUND_UP(sg->length, PAGE_SIZE), make_dirty); > > Is it really OK to refer directly to sg->length? The scatterlist library goes > to some effort to avoid having callers directly access the struct member variables. > > Actually, the for_each_sg() code and its behavior with sg->length and sg_page(sg) > confuses me because I'm new to it, and I don't quite understand how this works. So IIUC this can be done given how ib_umem_get allocates scatterlists (i.e. see the call to __sg_alloc_table_from_pages()). It builds a scatterlist with a segment size in device DMA max segment size (e.g. 64K, 2G or etc depending on what the device sets it to) and after created each scatterlist I am iterating represents a contiguous range of PFNs with a starting page. And if you keep pinning contiguous amounts of memory, it keeps coalescing this to the previously allocated sgl. > Especially with SG_CHAIN. I'm assuming that you've monitored /proc/vmstat for > nr_foll_pin* ? > Yeap I did. I see no pages left unpinned. >> >> sg_free_table(&umem->sg_head); >> } >> > > thanks, >