On Sat, 2009-02-14 at 16:11 +0300, Edward Shishkin wrote: > Peter Zijlstra writes: > > On Fri, 2009-02-13 at 16:57 +0300, Edward Shishkin wrote: > > > > > > > > Eew, so reiser4 will totally side-step the regular vm inode > > > writeback > > > > paths -- or is this fixed by a more elaborate than usual > > > > a_ops->writepages() ? > > > > > > > The second. > > > > > > reiser4_writepages() catches the anonymous (tagged) > > > pages, captures them mandatory, then commits all atoms > > > of the file. > > > > OK, can you then make it painfully clear in the function comment, esp. > > since you export this function? > > Hello. > Does it look better? > > Thanks, > Edward. > > This is a fixup for the following "todo": > akpm wrote: > > reiser4_set_page_dirty_internal() pokes around in VFS internals. > > Use __set_page_dirty_no_buffers() or create a new library function > > in mm/page-writeback.c. > > Problem: > > In accordance with reiser4 transactional model every dirty page > should be "captured" by some atom. However, outside reiser4 context > dirty page can not be captured in some cases, as it is accompanied > with specific work (jnode creation, etc). Reiser4 recognizes such > "anonymous" pages (i.e. pages that were dirtied outside of reiser4) > by the tag PAGECACHE_TAG_DIRTY. Pages dirtied inside reiser4 context > are not tagged at all: we don't need this. Indeed, once page is > dirtied and captured, it is attached to a jnode (a special header > to keep a track of transactions). > > reiser4_set_page_dirty_internal() was the internal reiser4 function > that set dirty bit without tagging the page. Having such internal > function led to real problems (incorrect task io accounting, etc. > because of not updating this internal "friend"). > > Solution: > > The following patch adds a core library function that sets a dirty > bit without tagging the page. It should be modified simultaneously > with its "friends": __set_page_dirty_nobuffers, __set_page_dirty. > > Signed-off-by: Edward Shishkin<edward.shishkin@xxxxxxxxx> > --- > include/linux/mm.h | 1 + > mm/page-writeback.c | 40 ++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 41 insertions(+) > > --- mmotm.orig/include/linux/mm.h > +++ mmotm/include/linux/mm.h > @@ -841,6 +841,7 @@ int redirty_page_for_writepage(struct wr > struct page *page); > int set_page_dirty(struct page *page); > int set_page_dirty_lock(struct page *page); > +int set_page_dirty_notag(struct page *page); > int clear_page_dirty_for_io(struct page *page); > > extern unsigned long move_page_tables(struct vm_area_struct *vma, > --- mmotm.orig/mm/page-writeback.c > +++ mmotm/mm/page-writeback.c > @@ -1248,6 +1248,46 @@ int __set_page_dirty_nobuffers(struct pa > EXPORT_SYMBOL(__set_page_dirty_nobuffers); > > /* > + * Some filesystems, which don't use buffers, provide their own > + * writeback means. And it can happen, that even dirty tag, which > + * is used by generic methods is not needed. In this case it would > + * be reasonably to use the following lightweight version of > + * __set_page_dirty_nobuffers: > + * > + * Don't tag page as dirty in its radix tree, just set dirty bit > + * and update the accounting. > + * NOTE: This function also doesn't take care of races, i.e. the > + * caller should guarantee that page can not be truncated. > + */ Maybe something like /* * set_page_dirty_notag() -- similar to __set_page_dirty_nobuffers() * except it doesn't tag the page dirty in the page-cache radix tree. * This means that the address space using this cannot use the regular * filemap ->writepages() helpers and must provide its own means of * tracking and finding dirty pages. * * NOTE: furthermore, this version also doesn't handle truncate races. */ > +int set_page_dirty_notag(struct page *page) > +{ > + struct address_space *mapping = page->mapping; > + > + if (!TestSetPageDirty(page)) { > + WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); > + if (mapping_cap_account_dirty(mapping)) { > + /* > + * Since we don't tag the page as dirty, > + * acquiring the tree_lock is replaced > + * with disabling preemption to protect > + * per-cpu data used for accounting. > + */ This should be local_irq_save(flags) > + preempt_disable(); > + __inc_zone_page_state(page, NR_FILE_DIRTY); > + __inc_bdi_stat(mapping->backing_dev_info, > + BDI_RECLAIMABLE); > + task_dirty_inc(current); > + task_io_account_write(PAGE_CACHE_SIZE); > + preempt_enable(); local_irq_restore() These accounting functions rely on being atomic wrt interrupts. > + } > + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > + return 1; > + } > + return 0; > +} > +EXPORT_SYMBOL(set_page_dirty_notag); How much performance gain do you see by avoiding that radix tree op? I take it the only reason you don't use the regular __set_page_dirty_nobuffers() and just clear the tag when you do the write-out by whatever alternative means you have to find the page, is that it gains you some performance. It would be good to have some numbers to judge this on. -- To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html