On Fri, 31 May 2013 12:01:58 -0400 Vivek Goyal <vgoyal at redhat.com> wrote: > On Fri, May 31, 2013 at 04:21:27PM +0200, Michael Holzheu wrote: > > On Thu, 30 May 2013 16:38:47 -0400 > > Vivek Goyal <vgoyal at redhat.com> wrote: > > > > > On Wed, May 29, 2013 at 01:51:44PM +0200, Michael Holzheu wrote: > > > [...] > > For zfcpdump currently we add a load from [0, HSA_SIZE] where > > p_offset equals p_paddr. Therefore we can't distinguish in > > copy_oldmem_page() if we read from oldmem (HSA) or newmem. The > > range [0, HSA_SIZE] is used twice. As a workaroun we could use an > > artificial p_offset for the HSA memory chunk that is not used by > > the 1st kernel physical memory. This is not really beautiful, but > > probably doable. > > Ok, zfcpdump is a problem because HSA memory region is in addition to > regular memory address space. Right and the HSA memory is accessed with a read() interface and can't be directly mapped. [...] > If you decide not to do that, agreed that copy_oldmem_page() need to > differentiate between reference to HSA memory and reference to new > memory. I guess in that case we will have to go with original proposal > of using arch functions to access and read headers. Let me think about that a bit more ... [...] > > If copy_oldmem_page() now also must be able to copy to vmalloc > > memory, we would have to add new code for that: > > > > * oldmem -> newmem (real): Use direct memcpy_real() > > * oldmem -> newmem (vmalloc): Use intermediate buffer with > > memcpy_real() > > * newmem -> newmem: Use memcpy() > > > > What do you think? > > Yep, looks like you will have to do something like that. > > Can't we map HSA frames temporarily, copy data and tear down the > mapping? Yes, we would have to create a *temporarily* mapping (see suggestion below). We do not have enough memory to copy the complete HSA. > If not, how would remap_pfn_range() work with HSA region when > /proc/vmcore is mmaped()? I am no memory management expert, so I discussed that with Martin Schwidefsky (s390 architecture maintainer). Perhaps something like the following could work: After vmcore_mmap() is called the HSA pages are not initially mapped in the page tables. So when user space accesses those parts of /proc/vmcore, a fault will be generated. We implement a mechanism that in this case the HSA is copied to a new page in the page cache and a mapping is created for it. Since the page is allocated in the page cache, it can be released afterwards by the kernel when we get memory pressure. Our current idea for such an implementation: * Create new address space (struct address_space) for /proc/vmcore. * Implement new vm_operations_struct "vmcore_mmap_ops" with new vmcore_fault() ".fault" callback for /proc/vmcore. * Set vma->vm_ops to vmcore_mmap_ops in mmap_vmcore(). * The vmcore_fault() function will get a new page cache page, copy HSA page to page cache page add it to vmcore address space. To see how this could work, we looked into the functions filemap_fault() in "mm/filemap.c" and relay_buf_fault() in "kernel/relay.c". What do you think? Michael