On Wed, 07 Jan 2009 12:47:22 -0500, Chris Mason wrote: > On Thu, 2009-01-08 at 02:33 +0900, Ryusuke Konishi wrote: > > On Wed, 07 Jan 2009 09:43:02 -0500, Chris Mason wrote: > > > On Wed, 2009-01-07 at 22:06 +0900, Ryusuke Konishi wrote: > > > > Chris Mason told me that page_mkwrite() has some works to do. > > > > > > > > On Wed, 17 Dec 2008 21:52:55 -0500, Chris Mason wrote: > > > > > nilfs_page_mkwrite doesn't seem to dirty the page? block_page_mkwrite > > > > > does more, including checks against i_size and others. > > > > > > > > This will insert i_size check, and hole block allocation code in the > > > > nilfs_page_mkwrite. > > > > > > > > Previously, the hole block allocation was delayed until just before > > > > writing for mmapped pages. This accompanies removal of the delayed > > > > allocation code. > > <snip> > > > > + /* > > > > + * use i_alloc_sem to stop truncate operations on the inode > > > > + */ > > > > + down_read(&inode->i_alloc_sem); > > > > > > I'm not 100% sure this is safe. It seems likely the direct io paths > > > could trigger a mkwrite with the i_alloc_sem already held? > > > > vmtruncate() can change i_size outside the page lock, and it even > > calls unmap_mapping_range(). Does it have any problems? > > You're right, i_size can change without you knowing about it, but before > vmtruncate does anything to the page itself, it locks the page. > > So, we tend to do a somewhat racey check against i_size. If the page is > outside i_size, we happily ignore it. If it is inside i_size, we assume > it is part of the file. Most of the code that does this checks i_size > once after locking the page and saves that value, so if i_size changes > later on the code at least does consistent operations based on a single > value of i_size. > > If the page straddles i_size and someone extends the file, they are > expected to lock the page as well (see block_truncate_page and the > places it gets called). > > -chris I got it, using page locking seems OK even for vmtruncate(). I don't know if the mkwrite is actually called from the direct io path, but no doubt about i_alloc_sem is used in it. To make this safer and avoid confusion, I took your advice. The revised patch is attached below. Ryusuke -- nilfs2: insert checks and hole block allocation in page_mkwrite (rev 2) Chris Mason told me that page_mkwrite() has some works to do. On Wed, 17 Dec 2008 21:52:55 -0500, Chris Mason wrote: > nilfs_page_mkwrite doesn't seem to dirty the page? block_page_mkwrite > does more, including checks against i_size and others. This will insert i_size check, a disk capacity check, and hole block allocation code in the nilfs_page_mkwrite. Previously, the hole block allocation was delayed until just before writing for mmapped pages. This accompanies removal of the delayed allocation code. This revision uses a page lock instead of i_alloc_sem for safety. Cc: Chris Mason <chris.mason@xxxxxxxxxx> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@xxxxxxxxxxxxx> --- fs/nilfs2/file.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++- fs/nilfs2/segment.c | 44 ++++---------------------------------- 2 files changed, 61 insertions(+), 41 deletions(-) diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c index 7ddd42e..c22f97c 100644 --- a/fs/nilfs2/file.c +++ b/fs/nilfs2/file.c @@ -75,8 +75,62 @@ nilfs_file_aio_write(struct kiocb *iocb, const struct iovec *iov, static int nilfs_page_mkwrite(struct vm_area_struct *vma, struct page *page) { - if (!(vma->vm_flags & (VM_WRITE | VM_MAYWRITE))) - return -EPERM; + struct inode *inode = vma->vm_file->f_dentry->d_inode; + struct nilfs_transaction_info ti; + int ret; + + if (unlikely(nilfs_near_disk_full(NILFS_SB(inode->i_sb)->s_nilfs))) + return -ENOSPC; + + lock_page(page); + if (page->mapping != inode->i_mapping || + page_offset(page) >= i_size_read(inode) || !PageUptodate(page)) { + unlock_page(page); + return -EINVAL; + } + + /* + * check to see if the page is mapped already (no holes) + */ + if (PageMappedToDisk(page)) { + unlock_page(page); + goto mapped; + } + if (page_has_buffers(page)) { + struct buffer_head *bh, *head; + int fully_mapped = 1; + + bh = head = page_buffers(page); + do { + if (!buffer_mapped(bh)) { + fully_mapped = 0; + break; + } + } while (bh = bh->b_this_page, bh != head); + + if (fully_mapped) { + SetPageMappedToDisk(page); + unlock_page(page); + goto mapped; + } + } + unlock_page(page); + + /* + * fill hole blocks + */ + ret = nilfs_transaction_begin(inode->i_sb, &ti, 1); + if (unlikely(ret)) + return ret; + + ret = block_page_mkwrite(vma, page, nilfs_get_block); + if (unlikely(ret)) { + nilfs_transaction_abort(inode->i_sb); + return ret; + } + nilfs_transaction_commit(inode->i_sb); + + mapped: SetPageChecked(page); wait_on_page_writeback(page); return 0; diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index 182fb26..b4b96ac 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -667,42 +667,6 @@ struct nilfs_sc_operations nilfs_sc_dsync_ops = { .write_node_binfo = NULL, }; -static int nilfs_prepare_data_page(struct inode *inode, struct page *page) -{ - int err = 0; - - lock_page(page); - if (!page_has_buffers(page)) - create_empty_buffers(page, 1 << inode->i_blkbits, 0); - - if (!PageMappedToDisk(page)) { - struct buffer_head *bh, *head; - sector_t blkoff - = page->index << (PAGE_SHIFT - inode->i_blkbits); - - int non_mapped = 0; - - bh = head = page_buffers(page); - do { - if (!buffer_mapped(bh)) { - if (!buffer_dirty(bh)) { - non_mapped++; - continue; - } - err = nilfs_get_block(inode, blkoff, bh, 1); - if (unlikely(err)) - goto out_unlock; - } - } while (blkoff++, (bh = bh->b_this_page) != head); - if (!non_mapped) - SetPageMappedToDisk(page); - } - - out_unlock: - unlock_page(page); - return err; -} - static int nilfs_lookup_dirty_data_buffers(struct inode *inode, struct list_head *listp, struct nilfs_sc_info *sci) @@ -727,9 +691,11 @@ static int nilfs_lookup_dirty_data_buffers(struct inode *inode, struct page *page = pvec.pages[i]; if (mapping->host) { - err = nilfs_prepare_data_page(inode, page); - if (unlikely(err)) - break; + lock_page(page); + if (!page_has_buffers(page)) + create_empty_buffers(page, + 1 << inode->i_blkbits, 0); + unlock_page(page); } bh = head = page_buffers(page); -- 1.5.6.5 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html