On Mon 02-12-13 13:13:15, Matthew Wilcox wrote: > On Mon, Dec 02, 2013 at 08:58:25AM -0700, Matthew Wilcox wrote: > > On Mon, Dec 02, 2013 at 09:33:18AM +0100, Jan Kara wrote: > > > > Indeed, XIP could make use of the structures we already > > > > have in the struct inode/address space to behave more like a normal > > > > filesystem. We already use the mapping tree to store > > > > per-page information that is used to serialise truncate vs page > > > > faults, so why not make XIP do exactly the same thing? > > > I believe that grabbing mmap_sem for writing during truncate (in case of > > > ext2 around xip_truncate_page() & truncate_setsize() calls should do the > > > trick. But I need to verify with lockdep that it doesn't introduce new > > > locking problems. > > > > Umm ... mmap_sem for write? On each of the tasks that have a file mmaped? > > I know we have double_down that orders by memory address, but I don't > > think we have an N_down_write(). > > Here's a simpler solution: Grab i_mmap_sem over vm_insert_mixed(). > i_mmap_sem is already taken in unmap_mapping_range(), called from > truncate_pagecache(), so we know that while we hold i_mmap_sem that > i_size has already been updated and the block can't be deallocated > from under us. We know that i_mmap_sem is safe to take here because it > already is (in __xip_unmap()). Yeah, that should do the trick. Honza > Now we need only worry about holepunch, and it's the same problem for > pagecache and XIP files alike. > > diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c > index d8d9fe3..f6de4c3 100644 > --- a/mm/filemap_xip.c > +++ b/mm/filemap_xip.c > @@ -260,8 +260,17 @@ again: > __xip_unmap(mapping, vmf->pgoff); > > found: > + /* We must recheck i_size under i_mmap_mutex */ > + mutex_lock(&mapping->i_mmap_mutex); > + size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> > + PAGE_CACHE_SHIFT; > + if (unlikely(vmf->pgoff >= size)) { > + mutex_unlock(&mapping->i_mmap_mutex); > + return VM_FAULT_SIGBUS; > + } > err = vm_insert_mixed(vma, (unsigned long)vmf->virtual_address, > xip_pfn); > + mutex_unlock(&mapping->i_mmap_mutex); > if (err == -ENOMEM) > return VM_FAULT_OOM; > /* > @@ -285,6 +294,15 @@ found: > } > if (error != -ENODATA) > goto out; > + > + /* We must recheck i_size under i_mmap_mutex */ > + mutex_lock(&mapping->i_mmap_mutex); > + size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> > + PAGE_CACHE_SHIFT; > + if (unlikely(vmf->pgoff >= size)) { > + ret = VM_FAULT_SIGBUS; > + goto out; > + } > /* not shared and writable, use xip_sparse_page() */ > page = xip_sparse_page(); > if (!page) > @@ -296,6 +314,7 @@ found: > > ret = VM_FAULT_NOPAGE; > out: > + mutex_unlock(&mapping->i_mmap_mutex); > write_seqcount_end(&xip_sparse_seq); > mutex_unlock(&xip_sparse_mutex); > > > -- > Matthew Wilcox Intel Open Source Technology Centre > "Bill, look, we understand that you're interested in selling us this > operating system, but compare it to ours. We can't possibly take such > a retrograde step." -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html