On 2021/1/21 19:52, Song Bao Hua (Barry Song) wrote: > > >> -----Original Message----- >> From: Greg Kroah-Hartman [mailto:gregkh@xxxxxxxxxxxxxxxxxxx] >> Sent: Friday, January 22, 2021 12:19 AM >> To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx> >> Cc: Wangzhou (B) <wangzhou1@xxxxxxxxxxxxx>; Zhangfei Gao >> <zhangfei.gao@xxxxxxxxxx>; Arnd Bergmann <arnd@xxxxxxxx>; >> linux-accelerators@xxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; >> chensihang (A) <chensihang1@xxxxxxxxxxxxx> >> Subject: Re: [PATCH] uacce: Add uacce_ctrl misc device >> >> On Thu, Jan 21, 2021 at 10:18:24AM +0000, Song Bao Hua (Barry Song) wrote: >>> >>> >>>> -----Original Message----- >>>> From: Greg Kroah-Hartman [mailto:gregkh@xxxxxxxxxxxxxxxxxxx] >>>> Sent: Thursday, January 21, 2021 10:46 PM >>>> To: Wangzhou (B) <wangzhou1@xxxxxxxxxxxxx> >>>> Cc: Zhangfei Gao <zhangfei.gao@xxxxxxxxxx>; Arnd Bergmann <arnd@xxxxxxxx>; >>>> linux-accelerators@xxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; >>>> chensihang (A) <chensihang1@xxxxxxxxxxxxx> >>>> Subject: Re: [PATCH] uacce: Add uacce_ctrl misc device >>>> >>>> On Thu, Jan 21, 2021 at 05:09:14PM +0800, Zhou Wang wrote: >>>>> When IO page fault happens, DMA performance will be affected. Pin user >> page >>>>> can avoid IO page fault, this patch introduces a new char device named >>>>> /dev/uacce_ctrl to help to maintain pin/unpin pages. User space can do >>>>> pin/unpin pages by ioctls of an open file of /dev/uacce_ctrl, all pinned >>>>> pages under one file will be unpinned in file release process. >>>> >>>> Also, what are you really trying to do here? If you need to mess with >>>> memory pages, why can't the existing memory apis work properly for you? >>>> Please work with the linux-mm developers to resolve the issue using the >>>> standard apis and not creating a one-off char device node for this type >>>> of thing. >>> >>> Basically the purpose is implementing a pinned memory poll for userspace >>> DMA to achieve better performance by removing io page fault. >> >> And what could possibly go wrong with that :) > > I think we have resolved this concern while uacce came in :-) > Uacce is based on SVA so devices are accessing memory in userspace > by strict permission control. > >> >>> I really like this can be done in generic mm code. Unfortunately there is >> no >>> this standard API in kernel to support userspace pin. Right now, various >>> subsystems depend on the ioctl of /dev/<name> to implement the pin, for example, >>> v4l2, gpu, infiniband, media etc. >>> >>> I feel it is extremely hard to sell a standard mpin() API like mlock() >>> for this stage as mm could hardly buy this. And it will require >>> huge changes in kernel. >> >> Why? This is what mlock() is for, why can't you use it? > > mlock() can only guarantee memory won't be swapped out, it doesn't > make sure memory won't move. alloc_pages() can cause memory compaction, > cma, numa balance, huge pages etc can move mlock()-ed pages. > We would still see many I/O page faults for mlock() area. > >> >>> We need a way to manage what pages are pinned by process and ensure the >>> pages can be unpinned while the process is killed abnormally. otherwise, >>> memory gets leaked. >> >> Can't mlock() handle that? It works on the process that called it. >> >>> file_operations release() is a good entry for this kind of things. In >>> this way, we don't have to maintain the pinned page set in task_struct >>> and unpin them during exit(). >>> >>> If there is anything to make it better by doing this in a driver. I >>> would believe we could have a generic misc driver for pin like >>> vms_ballon.c for ballon. The driver doesn't have to bind with uacce. >>> >>> In this way, the pinned memory pool implementation in userspace doesn't >>> need to depend on a specific uacce driver any more. >> >> Please work with the mm developers to get them to agree with this type >> of thing, as well as the dma developers, both of which you didn't cc: on >> this patch :( > > Yep. OK. Will send v2 with cc mm-list and iommu-list too, let's see if we could find a proper way to solve this problem :) Best, Zhou > >> >> Remember, you are creating a new api for Linux that goes around existing >> syscalls, but is in reality, a new syscall, so why not just make it a >> new syscall? > > The difficulty would be how to record which pages are pinned for a process > if it is done by a new syscall. > > For mlock(), it can be much easier as it will change VMA. Hardly we can > change VMA for pin. On the other hand, if the implementation is done in > driver, with file_operations, we can record pinned pages in the private > data of an opened file. > >> >> thanks, >> >> greg k-h > > Thanks > Barry > > . >