Peter Zijlstra writes: [...] > Maybe something like > > /* > * set_page_dirty_notag() -- similar to __set_page_dirty_nobuffers() > * except it doesn't tag the page dirty in the page-cache radix tree. > * This means that the address space using this cannot use the regular > * filemap ->writepages() helpers and must provide its own means of > * tracking and finding dirty pages. > * > * NOTE: furthermore, this version also doesn't handle truncate races. > */ > Yup, your version is better. I just have replaced ^finding dirty pages^finding non-tagged dirty pages, see below.. > > +int set_page_dirty_notag(struct page *page) > > +{ > > + struct address_space *mapping = page->mapping; > > + > > + if (!TestSetPageDirty(page)) { > > + WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); > > + if (mapping_cap_account_dirty(mapping)) { > > + /* > > + * Since we don't tag the page as dirty, > > + * acquiring the tree_lock is replaced > > + * with disabling preemption to protect > > + * per-cpu data used for accounting. > > + */ > > This should be local_irq_save(flags) > > > + preempt_disable(); > > + __inc_zone_page_state(page, NR_FILE_DIRTY); > > + __inc_bdi_stat(mapping->backing_dev_info, > > + BDI_RECLAIMABLE); > > + task_dirty_inc(current); > > + task_io_account_write(PAGE_CACHE_SIZE); > > + preempt_enable(); > > local_irq_restore() > > These accounting functions rely on being atomic wrt interrupts. > Thanks! > > + } > > + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > + return 1; > > + } > > + return 0; > > +} > > +EXPORT_SYMBOL(set_page_dirty_notag); > > How much performance gain do you see by avoiding that radix tree op? > Nop. We want to use it with extended semantics. All dirty pages are divided into 2 categories: A) tagged in the radix tree (with PAGECACHE_TAG_DIRTY). B) captured by atoms (usual linked lists). reiser4_writepages() looks for pages of "A" in the radix tree and moves them to "B". set_page_dirty_notag(), introduced by my patch, is needed for pages of "B". If "B" is empty, then we get the traditional semantics with regular ->writepages(). That's all! > I take it the only reason you don't use the regular > __set_page_dirty_nobuffers() and just clear the tag when you do the > write-out by whatever alternative means you have to find the page, is > that it gains you some performance. > > It would be good to have some numbers to judge this on. > So, performance is not a concern. We want to extend functionality, which is currently restricted by the unnecessary(*) property "dirty bit is set <=> dirty tag is installed". (*) Reiser4 successfully uses such design since 2002. I don't remember any BUG_ONs, and related problems. Please, consider the appended patch. Thanks, Edward. Add set_page_dirty_notag() to the core library to enable extended functionality of radix tree attached to inode->i_mapping. Signed-off-by: Edward Shishkin<edward.shishkin@xxxxxxxxx> --- include/linux/mm.h | 1 + mm/page-writeback.c | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+) --- mmotm.orig/include/linux/mm.h +++ mmotm/include/linux/mm.h @@ -841,6 +841,7 @@ int redirty_page_for_writepage(struct wr struct page *page); int set_page_dirty(struct page *page); int set_page_dirty_lock(struct page *page); +int set_page_dirty_notag(struct page *page); int clear_page_dirty_for_io(struct page *page); extern unsigned long move_page_tables(struct vm_area_struct *vma, --- mmotm.orig/mm/page-writeback.c +++ mmotm/mm/page-writeback.c @@ -1248,6 +1248,42 @@ int __set_page_dirty_nobuffers(struct pa EXPORT_SYMBOL(__set_page_dirty_nobuffers); /* + * set_page_dirty_notag() -- similar to __set_page_dirty_nobuffers() + * except it doesn't tag the page dirty in the page-cache radix tree. + * This means that the address space using this cannot use the regular + * filemap ->writepages() helpers and must provide its own means of + * tracking and finding non-tagged dirty pages. + * + * NOTE: furthermore, this version also doesn't handle truncate races. + */ +int set_page_dirty_notag(struct page *page) +{ + struct address_space *mapping = page->mapping; + + if (!TestSetPageDirty(page)) { + WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); + if (mapping_cap_account_dirty(mapping)) { + /* + * The accounting functions rely on + * being atomic wrt interrupts. + */ + unsigned long flags; + local_irq_save(flags); + __inc_zone_page_state(page, NR_FILE_DIRTY); + __inc_bdi_stat(mapping->backing_dev_info, + BDI_RECLAIMABLE); + task_dirty_inc(current); + task_io_account_write(PAGE_CACHE_SIZE); + local_irq_restore(flags); + } + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + return 1; + } + return 0; +} +EXPORT_SYMBOL(set_page_dirty_notag); + +/* * When a writepage implementation decides that it doesn't want to write this * page for some reason, it should redirty the locked page via * redirty_page_for_writepage() and it should then unlock the page and return 0 -- To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html