> -----Original Message----- > From: Jason Gunthorpe [mailto:jgg@xxxxxxxx] > Sent: Tuesday, January 26, 2021 4:47 AM > To: Wangzhou (B) <wangzhou1@xxxxxxxxxxxxx> > Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>; Arnd Bergmann > <arnd@xxxxxxxx>; Zhangfei Gao <zhangfei.gao@xxxxxxxxxx>; > linux-accelerators@xxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; Song Bao Hua (Barry Song) > <song.bao.hua@xxxxxxxxxxxxx>; Liguozhu (Kenneth) <liguozhu@xxxxxxxxxxxxx>; > chensihang (A) <chensihang1@xxxxxxxxxxxxx> > Subject: Re: [RFC PATCH v2] uacce: Add uacce_ctrl misc device > > On Mon, Jan 25, 2021 at 04:34:56PM +0800, Zhou Wang wrote: > > > +static int uacce_pin_page(struct uacce_pin_container *priv, > > + struct uacce_pin_address *addr) > > +{ > > + unsigned int flags = FOLL_FORCE | FOLL_WRITE; > > + unsigned long first, last, nr_pages; > > + struct page **pages; > > + struct pin_pages *p; > > + int ret; > > + > > + first = (addr->addr & PAGE_MASK) >> PAGE_SHIFT; > > + last = ((addr->addr + addr->size - 1) & PAGE_MASK) >> PAGE_SHIFT; > > + nr_pages = last - first + 1; > > + > > + pages = vmalloc(nr_pages * sizeof(struct page *)); > > + if (!pages) > > + return -ENOMEM; > > + > > + p = kzalloc(sizeof(*p), GFP_KERNEL); > > + if (!p) { > > + ret = -ENOMEM; > > + goto free; > > + } > > + > > + ret = pin_user_pages_fast(addr->addr & PAGE_MASK, nr_pages, > > + flags | FOLL_LONGTERM, pages); > > This needs to copy the RLIMIT_MEMLOCK and can_do_mlock() stuff from > other places, like ib_umem_get > > > + ret = xa_err(xa_store(&priv->array, p->first, p, GFP_KERNEL)); > > And this is really weird, I don't think it makes sense to make handles > for DMA based on the starting VA. > > > +static int uacce_unpin_page(struct uacce_pin_container *priv, > > + struct uacce_pin_address *addr) > > +{ > > + unsigned long first, last, nr_pages; > > + struct pin_pages *p; > > + > > + first = (addr->addr & PAGE_MASK) >> PAGE_SHIFT; > > + last = ((addr->addr + addr->size - 1) & PAGE_MASK) >> PAGE_SHIFT; > > + nr_pages = last - first + 1; > > + > > + /* find pin_pages */ > > + p = xa_load(&priv->array, first); > > + if (!p) > > + return -ENODEV; > > + > > + if (p->nr_pages != nr_pages) > > + return -EINVAL; > > + > > + /* unpin */ > > + unpin_user_pages(p->pages, p->nr_pages); > > And unpinning without guaranteeing there is no ongoing DMA is really > weird In SVA case, kernel has no idea if accelerators are accessing the memory so I would assume SVA has a method to prevent the pages being transferred from migration or release. Otherwise, SVA will crash easily in a system with high memory pressure. Anyway, This is a problem worth further investigating. > > Are you abusing this in conjunction with a SVA scheme just to prevent > page motion? Why wasn't mlock good enough? Page migration won't cause any disfunction in SVA case as IO page fault will get a valid page again. It is only a performance issue as IO page fault has larger latency than the usual page fault, would be 3-80slower than page fault[1] mlock, while certainly be able to prevent swapping out, it won't be able to stop page moving due to: * memory compaction in alloc_pages() * making huge pages * numa balance * memory compaction in CMA etc. [1] https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7482091&tag=1 > > Jason Thanks Barry