On Thu, 15 May 2014 11:48:09 +0100 Mel Gorman <mgorman@xxxxxxx> wrote: > Changelog since v3 > o Correct handling of exclusive waits > > This patch introduces a new page flag for 64-bit capable machines, > PG_waiters, to signal there are processes waiting on PG_lock and uses it to > avoid memory barriers and waitqueue hash lookup in the unlock_page fastpath. > > This adds a few branches to the fast path but avoids bouncing a dirty > cache line between CPUs. 32-bit machines always take the slow path but the > primary motivation for this patch is large machines so I do not think that > is a concern. > > The test case used to evaulate this is a simple dd of a large file done > multiple times with the file deleted on each iterations. The size of > the file is 1/10th physical memory to avoid dirty page balancing. In the > async case it will be possible that the workload completes without even > hitting the disk and will have variable results but highlight the impact > of mark_page_accessed for async IO. The sync results are expected to be > more stable. The exception is tmpfs where the normal case is for the "IO" > to not hit the disk. > > The test machine was single socket and UMA to avoid any scheduling or > NUMA artifacts. Throughput and wall times are presented for sync IO, only > wall times are shown for async as the granularity reported by dd and the > variability is unsuitable for comparison. As async results were variable > do to writback timings, I'm only reporting the maximum figures. The sync > results were stable enough to make the mean and stddev uninteresting. > > The performance results are reported based on a run with no profiling. > Profile data is based on a separate run with oprofile running. The > kernels being compared are "accessed-v2" which is the patch series up > to this patch where as lockpage-v2 includes this patch. > > async dd > 3.15.0-rc3 3.15.0-rc3 > accessed-v3 lockpage-v3 > ext3 Max elapsed 11.5900 ( 0.00%) 11.0000 ( 5.09%) > ext4 Max elapsed 13.3400 ( 0.00%) 13.4300 ( -0.67%) > tmpfs Max elapsed 0.4900 ( 0.00%) 0.4800 ( 2.04%) > btrfs Max elapsed 12.7800 ( 0.00%) 13.8200 ( -8.14%) > xfs Max elapsed 2.0900 ( 0.00%) 2.1100 ( -0.96%) So ext3 got 5% faster and btrfs got 8% slower? > > ... > The numbers look pretty marginal from here and the patch is, umm, not a thing of beauty or simplicity. I'd be inclined to go find something else to work on, frankly. > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -241,15 +241,22 @@ void delete_from_page_cache(struct page *page) > } > EXPORT_SYMBOL(delete_from_page_cache); > > -static int sleep_on_page(void *word) > +static int sleep_on_page(struct page *page) > { > - io_schedule(); > + /* > + * A racing unlock can miss that the waitqueue is active and clear the > + * waiters again. Only sleep if PageWaiters is still set and timeout > + * to recheck as races can still occur. > + */ > + if (PageWaiters(page)) > + io_schedule_timeout(HZ); ew. > return 0; > } > > ... > > +/* Returns true if the page is locked */ Comment is inaccurate. > +static inline bool prepare_wait_bit(struct page *page, wait_queue_head_t *wqh, > + wait_queue_t *wq, int state, int bit_nr, bool exclusive) > +{ > + > + /* Set PG_waiters so a racing unlock_page will check the waitiqueue */ > + if (!PageWaiters(page)) > + SetPageWaiters(page); > + > + if (exclusive) > + prepare_to_wait_exclusive(wqh, wq, state); > + else > + prepare_to_wait(wqh, wq, state); > + return test_bit(bit_nr, &page->flags); > } > > ... > > int wait_on_page_bit_killable(struct page *page, int bit_nr) > { > + wait_queue_head_t *wqh; > DEFINE_WAIT_BIT(wait, &page->flags, bit_nr); > + int ret = 0; > > if (!test_bit(bit_nr, &page->flags)) > return 0; > + wqh = page_waitqueue(page); > + > + do { > + if (prepare_wait_bit(page, wqh, &wait.wait, TASK_KILLABLE, bit_nr, false)) > + ret = sleep_on_page_killable(page); > + } while (!ret && test_bit(bit_nr, &page->flags)); > + finish_wait(wqh, &wait.wait); > > - return __wait_on_bit(page_waitqueue(page), &wait, > - sleep_on_page_killable, TASK_KILLABLE); > + return ret; > } Please find a way to test all this nicely when there are signals pending? > /** > > ... > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html