On Thu, Feb 21, 2008 at 01:07:17PM -0800, Mingming Cao wrote: > Hi Aneesh, > > It's a good start, a few comments below.. > ..... > > + page = __grab_cache_page(mapping, index); > > + if (!page) > > + return -ENOMEM; > > + } > > + > > I the page is already locked before calling get_block() via writepage(), > isn't it? and the journal transaction already started... > It would be via write_begin or writepage. But both the callbacks lock the page before their call getblock for all the blocks corresponding to the page. > > > + if (!page_has_buffers(page)) > > + create_empty_buffers(page, blocksize, 0); > > + > > + head = page_buffers(page); > > + /* Look for the buffer_head which map the block */ > > + bh = head; > > + while (offset > 0) { > > + bh = bh->b_this_page; > > + offset -= blocksize; > > + } > > + offset = (pos & (PAGE_CACHE_SIZE - 1)); > > + > > + /* Now write all the buffer_heads in the page */ > > + do { > > + set_buffer_uptodate(bh); > > + if (ext4_should_journal_data(inode)) { > > + err = ext4_journal_get_write_access(handle, bh); > > + /* do we have that many credits ??*/ > > + if (err) > > + goto err_out; > > + } > > + zero_user(page, offset, blocksize); > > Ah oh, you are trying to zero out the pages in the page cache, that's > seems wrong to me. By the time get_block() is called from writepages(), > the pages should have meaningful content that needs to flush to disk, > zero the pages out will lost the data. > It is writebegin. In case of writebegin the pages doesn't have the content. By the time we reach write begin the page is supposed to have buffer heads that are alreayd mapped. So we won't end up calling get_blk. Even in case of mmap with page_mkwrite change we would have called writebegin equivalent before the writepage. > > + offset += blocksize; > > + if (ext4_should_journal_data(inode)) { > > + err = ext4_journal_dirty_metadata(handle, bh); > > + if (err) > > + goto err_out; > > + } else { > > + if (ext4_should_order_data(inode)) { > > + err = ext4_journal_dirty_data(handle, > > + bh); > > + if (err) > > + goto err_out; > > + } > > + mark_buffer_dirty(bh); > > + } > > + > > + bh = bh->b_this_page; > > + blkcount--; > > + } while ((bh != head) && (blkcount > 0)); > > + /* only unlock if we have locked */ > > + if (index != skip_index) > > + unlock_page(page); > > + page_cache_release(page); > > + } > > + > > + return 0; > > +err_out: > > + unlock_page(page); > > + page_cache_release(page); > > + return err; > > +} > > + > > I was thinking just simply create a new bh, zero out the bh, then map > the bh with the block number to zero out, lastly submit a IO via > ll_rw_block. It maybe more efficient to do this via bio(perhaps cooking > a bio with zeroed out pages and submit_bio) but I have not look very > closely to it. Just throw out my thoughts. > But we would still need pages. buffer head need to have a mapped page b_page. Also if we don't take the page from page cache then we would have to wait for the I/O to complete using wait_on_buffer before we can update the extent information. Using page cache also plug it nicely with different journalling mode. -aneesh - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html