On Mon, Dec 23, 2013 at 07:50:31AM -0700, Matthew Wilcox wrote: > On Mon, Dec 23, 2013 at 03:41:13PM +0200, Kirill A. Shutemov wrote: > > > + /* Fall back to PTEs if we're going to COW */ > > > + if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) > > > + return VM_FAULT_FALLBACK; > > > > Why? > > If somebody mmaps a file with MAP_PRIVATE and changes a single byte, I > think we should allocate a single page to hold that change, not a PMD's > worth of pages. We try allocate new huge page in the same situation for AnonTHP. I don't see a reason why not to do the same here. It would be much harder (if possible) to collapse small page into a huge one later. > > > + pgoff = ((address - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; > > > + size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT; > > > + if (pgoff >= size) > > > + return VM_FAULT_SIGBUS; > > > + if ((pgoff | PG_PMD_COLOUR) >= size) > > > + return VM_FAULT_FALLBACK; > > > > I don't think it's necessary to fallback in this case. > > Do you care about SIGBUS behaviour or what? > > I'm looking to preserve the same behaviour we see with PTE mappings. I mean, > it's supposed to be _transparent_ huge pages, right? We can't be totally transparent. At least from performance point of view. The question is whether it's critical to preserve SIGBUS beheviour. I would prefer to map last page in mapping with huge pages too, if it's possible. Do you know anyone who relay on SIGBUS for correctness? > > > > + insert: > > > + length = xip_get_pfn(inode, &bh, &pfn); > > > + if (length < 0) > > > + return VM_FAULT_SIGBUS; > > > + if (length < PMD_SIZE) > > > + return VM_FAULT_FALLBACK; > > > + if (pfn & PG_PMD_COLOUR) > > > + return VM_FAULT_FALLBACK; /* not aligned */ > > > > Without assistance from get_unmapped_area() you will hit this all the time > > (511 of 512 on x86_64). > > Yes ... I thought you were working on that part for your transparent huge > page cache patchset? Yeah, I have patch for x86-64. Just a side note. > > > And the check should be moved before get_block(), I think. > > Can't. The PFN we're checking is the PFN of the storage. We have to > call get_block() to find out where it's going to be. I see. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html