On Thu, Oct 09, 2014 at 04:47:16PM -0400, Matthew Wilcox wrote: > On Wed, Oct 08, 2014 at 11:11:00PM +0300, Kirill A. Shutemov wrote: > > On Wed, Oct 08, 2014 at 09:25:27AM -0400, Matthew Wilcox wrote: > > > + pgoff = ((address - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; > > > + size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT; > > > + if (pgoff >= size) > > > + return VM_FAULT_SIGBUS; > > > + /* If the PMD would cover blocks out of the file */ > > > + if ((pgoff | PG_PMD_COLOUR) >= size) > > > + return VM_FAULT_FALLBACK; > > > > IIUC, zero pading would work too. > > The blocks after this file might be allocated to another file already. > I suppose we could ask the filesystem if it wants to allocate them to > this file. > > Dave, Jan, is it acceptable to call get_block() for blocks that extend > beyond the current i_size? In what context? XFS basically does nothing for certain cases (e.g. read mapping for direct IO) where zeroes are always going to be returned, so essentially filesystems right now may actually just return a "hole" for any read mapping request beyond EOF. If "create" is set, then we'll either create or map existing blocks beyond EOF because the we have to reserve space or allocate blocks before the EOF gets extended when the write succeeds fully... > > > + if (length < PMD_SIZE) > > > + goto fallback; > > > + if (pfn & PG_PMD_COLOUR) > > > + goto fallback; /* not aligned */ > > > > So, are you rely on pure luck to make get_block() allocate 2M aligned pfn? > > Not really productive. You would need assistance from fs and > > arch_get_unmapped_area() sides. > > Certainly ext4 and XFS will align their allocations; if you ask it for a > 2MB block, it will try to allocate a 2MB block aligned on a 2MB boundary. As a sweeping generalisation, that's wrong. Empty filesystems might behave that way, but we don't *guarantee* that this sort of alignment will occur. XFS has several different extent alignment strategies and none of them will always work that way. Many of them are dependent on mkfs parameters, and even then are used only as *guidelines*. Further, alignment is dependent on the size of the write being done - on some filesystem configs a 2MB write might be aligned, but on others it won't be. More complex still is that mount options can change alignment behaviour, as can per-file extent size hints, as can truncation that removes post-eof blocks... IOWs, if you want the filesystem to guarantee alignment to the underlying hardware in this way for DAX, we're going to need to make some modifications to the allocator alignment strategy. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html