Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes

Simon Jeons <simon.jeons@xxxxxxxxx> · Fri, 12 Apr 2013 13:44:38 +0800



    Hi Jerome,

      On 04/12/2013 10:57 AM, Jerome Glisse wrote:

    
    On Thu, Apr 11, 2013 at 9:54 PM, Simon Jeons <simon.jeons@xxxxxxxxx>
      wrote:

      
          Hi Jerome,
          
            
              On 04/12/2013 02:38 AM, Jerome Glisse wrote:

              
                On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons
                wrote:

                
                  Hi Jerome,

                  On 04/11/2013 04:45 AM, Jerome Glisse wrote:

                  
                    On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon
                    Jeons wrote:

                    
                      Hi Jerome,

                      On 04/09/2013 10:21 PM, Jerome Glisse wrote:

                      
                        On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon
                        Jeons wrote:

                        
                          Hi Jerome,

                          On 02/10/2013 12:29 AM, Jerome Glisse wrote:

                          
                            On Sat, Feb 9, 2013 at 1:05 AM, Michel
                            Lespinasse <walken@xxxxxxxxxx>
                            wrote:

                            
                              On Fri, Feb 8, 2013 at 3:18 AM, Shachar
                              Raindel <raindel@xxxxxxxxxxxx>
                              wrote:

                              
                                Hi,

                                
                                We would like to present a reference
                                implementation for safely sharing

                                memory pages from user space with the
                                hardware, without pinning.

                                
                                We will be happy to hear the community
                                feedback on our prototype

                                implementation, and suggestions for
                                future improvements.

                                
                                We would also like to discuss adding
                                features to the core MM subsystem to

                                assist hardware access to user memory
                                without pinning.

                              
                              This sounds kinda scary TBH; however I do
                              understand the need for such

                              technology.

                              
                              I think one issue is that many MM
                              developers are insufficiently aware

                              of such developments; having a technology
                              presentation would probably

                              help there; but traditionally LSF/MM
                              sessions are more interactive

                              between developers who are already quite
                              familiar with the technology.

                              I think it would help if you could send in
                              advance a detailed

                              presentation of the problem and the
                              proposed solutions (and then what

                              they require of the MM layer) so people
                              can be better prepared.

                              
                              And first I'd like to ask, aren't IOMMUs
                              supposed to already largely

                              solve this problem ? (probably a dumb
                              question, but that just tells

                              you how much you need to explain :)

                            
                            For GPU the motivation is three fold. With
                            the advance of GPU compute

                            and also with newer graphic program we see a
                            massive increase in GPU

                            memory consumption. We easily can reach
                            buffer that are bigger than

                            1gbytes. So the first motivation is to
                            directly use the memory the

                            user allocated through malloc in the GPU
                            this avoid copying 1gbytes of

                            data with the cpu to the gpu buffer. The
                            second and mostly important

                            to GPU compute is the use of GPU seamlessly
                            with the CPU, in order to

                            achieve this you want the programmer to have
                            a single address space on

                            the CPU and GPU. So that the same address
                            point to the same object on

                            GPU as on the CPU. This would also be a
                            tremendous cleaner design from

                            driver point of view toward memory
                            management.

                            
                            And last, the most important, with such big
                            buffer (>1gbytes) the

                            memory pinning is becoming way to expensive
                            and also drastically

                            reduce the freedom of the mm to free page
                            for other process. Most of

                            the time a small window (every thing is
                            relative the window can be >

                            100mbytes not so small :)) of the object
                            will be in use by the

                            hardware. The hardware pagefault support
                            would avoid the necessity to

                          
                          What's the meaning of hardware pagefault?

                        
                        It's a PCIE extension (well it's a combination
                        of extension that allow

                        that see http://www.pcisig.com/specifications/iov/ats/).
                        Idea is that the

                        iommu can trigger a regular pagefault inside a
                        process address space on

                        behalf of the hardware. The only iommu
                        supporting that right now is the

                        AMD iommu v2 that you find on recent AMD
                        platform.

                      
                      Why need hardware page fault? regular page fault
                      is trigger by cpu

                      mmu, correct?

                    
                    Well here i abuse regular page fault term. Idea is
                    that with hardware page

                    fault you don't need to pin memory or take reference
                    on page for hardware to

                    use it. So that kernel can free as usual page that
                    would otherwise have been

                  
                  For the case when GPU need to pin memory, why GPU need
                  grap the

                  memory of normal process instead of allocating for
                  itself?

                
                Pin memory is today world where gpu allocate its own
                memory (GB of memory)

                that disappear from kernel control ie kernel can no
                longer reclaim this

                memory it's lost memory (i had complain about that
                already from user than

                saw GB of memory vanish and couldn't understand why the
                GPU was using so

                much).

                
                Tomorrow world we want gpu to be able to access memory
                that the application

                allocated through a simple malloc and we want the kernel
                to be able to

                recycly any page at any time because of memory pressure
                or because kernel

                decide to do so.

                
                That's just what we want to do. To achieve so we are
                getting hw that can do

                pagefault. No change to kernel core mm code (some
                improvement might be made).

              
          The memory disappear since you have a reference(gup) against
          it, correct? Tomorrow world you want the page fault trigger
          through iommu driver that call get_user_pages, it also will
          take a reference(since gup is called), isn't it? Anyway,
          assume tomorrow world doesn't take a reference, we don't need
          care page which used by GPU is reclaimed?
          
            
          Right now code use gup because it's convenient but it drop the
          reference right after the fault. So reference is hold only for
          short period of time.

        
    Are you sure gup will drop the reference right after the fault? I
    redig the codes and fail verify it. Could you point out to me?

    
          No you don't need to care about reclaim thanks to mmu
          notifier, ie before page is remove mmu notifier is call and
          iommu register a notifier, so it get the invalidate event and
          invalidate the device tlb and things goes on. If gpu access
          the page a new pagefault happen and a new page is allocated.

        
    Good idea! ;-)

    
          All this code is upstream in linux kernel just read it. There
          is just no device that use it yet.

          
          That being said we will want improvement so that page that are
          hot in the device are not reclaimed. But it can work without
          such improvement.

          
          Cheers,

          Jerome