Re: confused with child page table handling in fork().

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Fri, Oct 24, 2008 at 2:44 PM, Rene Herman <rene.herman@xxxxxxxxxxxx> wrote:
On 24-10-08 09:25, Prasad Joshi wrote:

My understanding is when a process does a fork
1. a new page table will be allocated to the process
2. it will be exactly same copy of the parent process
3. both the page tables (parent's and child's) will have entries
marked as read-only
       So any write to them will be detected as page fault and page fault
handler will treat the pages as COW and will essentially create a new
page, update the faulted process's page table and returns.

The question is
dup_mem() is actually copying the current process's mm_struct to the
new process mm_struct. So now new process's mm_struct->pgd will be
pointing to pgd of parent process.

Then it is calling mm_init() to allocate new pgd, why?
My understanding is conflicting with the code, Is my understanding correct?

Your understanding is correct. If I understand the question correctly it seems you are confusing pagetables and the pages themselves here though.

Yes, the pages are shared CoW but the page _tables_ are not. Both parent and child have an actual physical copy of the tables; a write will fault, unmark both parent and child's entries read-only, and update the address in the faulting process' entry to the address of the newly allocated page.

Thanks a lot, I got it,
dup_mmap() is the function which copies the page table entries of the parent process in the child process and marks them as read-only pages.

Flow of how sys_fork() will result into a call to the dup_mmap()
do_fork ()
    copy_process
        copy_mm
            dup_mm
                allocates memory to hold struct mm_struct
                memcpy(mm, current->oldmm, sizeof(*mm));
                calls mm_init() ==> creates new pgd for child process (also pmd and pud)
                calls dup_mmap()

dup_mmap ()
{
       duplicates the vma regions
       copyies page table entries from parent to child
       then calls pte_wrprotect() for child and parent page
}

static inline pte_t pte_wrprotect(pte_t pte)
{
    return __pte(pte_val(pte) & ~_PAGE_RW);
}

Thanks Rene,
--Prasad
 

It is possible to not only CoW the pages but also the tables themselves and there were some initial implementations of that in 2.{4,5} times and in the context of the rmap patch (which increased data that had to be copied) but as far as I'm aware, that died as not in the end all that useful. A single page holds 1024 PTEs and a child would fault on its stack immediately so that you'd get an immediate fault anyway at least for that.

In these days of ballooning 64-bit addressspaces, revisiting CoW tables might actually be useful -- if that hasn't in fact been done already, ofcourse...

Rene.


[Index of Archives]     [Newbies FAQ]     [Linux Kernel Mentors]     [Linux Kernel Development]     [IETF Annouce]     [Git]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux SCSI]     [Linux ACPI]
  Powered by Linux