On Mon, Sep 30, 2013 at 08:00:02AM -0700, Greg Kroah-Hartman wrote: > On Mon, Sep 30, 2013 at 07:31:35AM -0600, Khalid Aziz wrote: > > On 09/30/2013 07:26 AM, Greg Kroah-Hartman wrote: > > > On Mon, Sep 30, 2013 at 03:14:52PM +0200, Jack Wang wrote: > > >> On 09/30/2013 12:11 PM, Luis Henriques wrote: > > >>> 3.5.7.22 -stable review patch. If anyone has any objections, please let me know. > > >>> > > >>> ------------------ > > >>> > > >>> From: Khalid Aziz <khalid.aziz@xxxxxxxxxx> > > >>> > > >>> commit 7cb2ef56e6a8b7b368b2e883a0a47d02fed66911 upstream. > > >>> > > >>> I am working with a tool that simulates oracle database I/O workload. > > >>> This tool (orion to be specific - > > >>> <http://docs.oracle.com/cd/E11882_01/server.112/e16638/iodesign.htm#autoId24>) > > >>> allocates hugetlbfs pages using shmget() with SHM_HUGETLB flag. It then > > >>> does aio into these pages from flash disks using various common block > > >>> sizes used by database. I am looking at performance with two of the most > > >>> common block sizes - 1M and 64K. aio performance with these two block > > >>> sizes plunged after Transparent HugePages was introduced in the kernel. > > >>> Here are performance numbers: > > >>> > > >>> pre-THP 2.6.39 3.11-rc5 > > >>> 1M read 8384 MB/s 5629 MB/s 6501 MB/s > > >>> 64K read 7867 MB/s 4576 MB/s 4251 MB/s > > >>> > > >>> I have narrowed the performance impact down to the overheads introduced by > > >>> THP in __get_page_tail() and put_compound_page() routines. perf top shows > > >>>> 40% of cycles being spent in these two routines. Every time direct I/O > > >>> to hugetlbfs pages starts, kernel calls get_page() to grab a reference to > > >>> the pages and calls put_page() when I/O completes to put the reference > > >>> away. THP introduced significant amount of locking overhead to get_page() > > >>> and put_page() when dealing with compound pages because hugepages can be > > >>> split underneath get_page() and put_page(). It added this overhead > > >>> irrespective of whether it is dealing with hugetlbfs pages or transparent > > >>> hugepages. This resulted in 20%-45% drop in aio performance when using > > >>> hugetlbfs pages. > > >>> > > >>> Since hugetlbfs pages can not be split, there is no reason to go through > > >>> all the locking overhead for these pages from what I can see. I added > > >>> code to __get_page_tail() and put_compound_page() to bypass all the > > >>> locking code when working with hugetlbfs pages. This improved performance > > >>> significantly. Performance numbers with this patch: > > >>> > > >>> pre-THP 3.11-rc5 3.11-rc5 + Patch > > >>> 1M read 8384 MB/s 6501 MB/s 8371 MB/s > > >>> 64K read 7867 MB/s 4251 MB/s 6510 MB/s > > >>> > > >>> Performance with 64K read is still lower than what it was before THP, but > > >>> still a 53% improvement. It does mean there is more work to be done but I > > >>> will take a 53% improvement for now. > > >>> > > >>> Please take a look at the following patch and let me know if it looks > > >>> reasonable. > > >>> > > >>> [akpm@xxxxxxxxxxxxxxxxxxxx: tweak comments] > > >>> Signed-off-by: Khalid Aziz <khalid.aziz@xxxxxxxxxx> > > >>> Cc: Pravin B Shelar <pshelar@xxxxxxxxxx> > > >>> Cc: Christoph Lameter <cl@xxxxxxxxx> > > >>> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> > > >>> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > > >>> Cc: Mel Gorman <mel@xxxxxxxxx> > > >>> Cc: Rik van Riel <riel@xxxxxxxxxx> > > >>> Cc: Minchan Kim <minchan@xxxxxxxxxx> > > >>> Cc: Andi Kleen <andi@xxxxxxxxxxxxxx> > > >>> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > >>> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > > >>> [ luis: backported to 3.5: adjusted context ] > > >>> Signed-off-by: Luis Henriques <luis.henriques@xxxxxxxxxxxxx> > > >> Hi Greg, > > >> > > >> I suppose this patch also needed for 3.4, right? > > > > > > As it didn't originally apply there, I didn't apply it. > > > > > > If people think it should be applicable for 3.4, I'll take it. > > > > > > thanks, > > > > > > greg k-h > > > > > > > Hi Greg, > > > > I did send you a backported version of this patch to apply to 3.0, 3.2 > > and 3.4 last Monday and cc'd stable@xxxxxxxxxxxxxxx. That patch should > > apply cleanly to those three kernels. > > Ah, you didn't specifically say that in the patch, so I just thought you > were reminding me to apply it to the 3.10 and 3.11 trees. Please be > more explicit in the future. > > I'll queue it up for the next round of stable kernels after this one. And I've lost it, I can't find it in my archives anywhere. Sorry about that, can you resend it please? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html