RE: [PATCH] [RFC] iomap: Use FUA for pure data O_DSYNC DIO writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hope you are feeling better.

1. How can we effect similar changes in ext4?
2. How do we get these changes pushed into all common release builds for both xfs or ext4 optimizations?

We are seeing dramatic performance increases with the xfs fixes.   https://bugzilla.redhat.com/show_bug.cgi?id=1548524

4 socket, TPCC
9,108 – Old kernel
36,618 – Patched kernel

2 socket, TPCC
9,606 - Old kernel
18,840 - Patched kernel

--------------------------------------------------------------------------------------------------
-- Times in milli-seconds
-- wt=SQL writethrough setting
-- awt=SQL alternatewritethrough setting
-- -T3979=SQL open with Fua behavior
-- -T3982=SQL opens without Fua and use fsync (forced flush) behavior
--------------------------------------------------------------------------------------------------
--
--	Linux patched kernel for XFS FUA opts
--  SQL Server 2017 CU6
--  blktrace active
--  Centos7 - 4proc, 8GB ram running on Hyper-V
--  Win10, 8proc, 32GB ram host
--  Samsung SSD drive
--  Clean SQL Server restarts
--------------------------------------------------------------------------------------------------
--  PATCHED KERNEL 4.16.x
--	     								Create DB Inserts Checkpoint Options
--  ----------------------------------- --------- ------- ---------- -----------------------------------
--  Default (-T3982 = forced flush)		3787      29814   47		 O_DIRECT and fsync from SQL
--  -T3979 and wt=1 awt=1			    4224	  25880   30         O_DIRECT and specialized fdatasync  (Best without FUA is ~10x slower than FUA)
--
--  -T3979 and wt=0						2860	  2570	  13		 O_DIRECT 
--  -T3979 and wt=1 awt=0				4204	  2573	  17		 O_DIRECT | O_DSYNC - Inserts and checkpoint on allocated space nears O_DIRECT only  <--- !!! DESIRED BEHAVIOR !!!
--  -T3979 and wt=1 awt=0 NO STAMP		4127	  2823	  17		 O_DIRECT | O_DSYNC - Issues internal flushs for 1st writes, create faster but inserts slower
--
--  UNPATCHED KERNEL 3.x
--	     								Create DB Inserts Checkpoint Options
--  ----------------------------------- --------- ------- ---------- -----------------------------------
--  Default (-T3982 = forced flush)		3097      29700   26		 O_DIRECT and fsync from SQL
--  -T3979 and wt=1 awt=1			    4210	  25800   27         O_DIRECT and specialized fdatasync
--
--  -T3979 and wt=0						2880	  2134	  6			 O_DIRECT 
--  -T3979 and wt=1 awt=0 		        4443	  25290	  42		 O_DIRECT | O_DSYNC
--  -T3979 and wt=1 awt=0 NO STAMP		4120	  25680	  40		 O_DIRECT | O_DSYNC
--




-----Original Message-----
From: Robert Dorr 
Sent: Thursday, March 22, 2018 7:38 AM
To: Jan Kara <jack@xxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>; Dave Chinner <david@xxxxxxxxxxxxx>; Dan Williams <dan.j.williams@xxxxxxxxx>; linux-xfs@xxxxxxxxxxxxxxx; linux-fsdevel <linux-fsdevel@xxxxxxxxxxxxxxx>; Theodore Ts'o <tytso@xxxxxxx>; Matthew Wilcox <mawilcox@xxxxxxxxxxxxx>; Scott Konersmann <scottkon@xxxxxxxxxxxxx>; Slava Oks <slavao@xxxxxxxxxxxxx>; Jasraj Dange <jasrajd@xxxxxxxxxxxxx>; Michael Nelson <micn@xxxxxxxxxxxxx>
Subject: RE: [PATCH] [RFC] iomap: Use FUA for pure data O_DSYNC DIO writes

Sorry to hear that.  Get well soon.

Thanks for the response.


-----Original Message-----
From: Jan Kara <jack@xxxxxxx>
Sent: Thursday, March 22, 2018 9:36 AM
To: Robert Dorr <rdorr@xxxxxxxxxxxxx>
Cc: Jan Kara <jack@xxxxxxx>; Christoph Hellwig <hch@xxxxxx>; Dave Chinner <david@xxxxxxxxxxxxx>; Dan Williams <dan.j.williams@xxxxxxxxx>; linux-xfs@xxxxxxxxxxxxxxx; linux-fsdevel <linux-fsdevel@xxxxxxxxxxxxxxx>; Theodore Ts'o <tytso@xxxxxxx>; Matthew Wilcox <mawilcox@xxxxxxxxxxxxx>; Scott Konersmann <scottkon@xxxxxxxxxxxxx>; Slava Oks <slavao@xxxxxxxxxxxxx>; Jasraj Dange <jasrajd@xxxxxxxxxxxxx>; Michael Nelson <micn@xxxxxxxxxxxxx>
Subject: Re: [PATCH] [RFC] iomap: Use FUA for pure data O_DSYNC DIO writes

On Mon 19-03-18 16:14:16, Robert Dorr wrote:
> Awesome news on the spin.   
> 
> Are you going to be able to make changes to EXT4 to accommodate 
> REQ_FUA without generic_write_flush to improve performance like was done for xfs?

I'm currently on a sick leave so I'm slow on replies. Sorry. I don't see a reason why ext4 could not have the same optimization as XFS for that case of direct IO. However I'm not sure when I get to implementing that.

								Honza

> -----Original Message-----
> From: Jan Kara <jack@xxxxxxx>
> Sent: Monday, March 19, 2018 11:07 AM
> To: Robert Dorr <rdorr@xxxxxxxxxxxxx>
> Cc: Christoph Hellwig <hch@xxxxxx>; Dave Chinner 
> <david@xxxxxxxxxxxxx>; Dan Williams <dan.j.williams@xxxxxxxxx>; 
> linux-xfs@xxxxxxxxxxxxxxx; linux-fsdevel 
> <linux-fsdevel@xxxxxxxxxxxxxxx>; Jan Kara <jack@xxxxxxx>; Theodore 
> Ts'o <tytso@xxxxxxx>; Matthew Wilcox <mawilcox@xxxxxxxxxxxxx>; Scott 
> Konersmann <scottkon@xxxxxxxxxxxxx>; Slava Oks <slavao@xxxxxxxxxxxxx>; 
> Jasraj Dange <jasrajd@xxxxxxxxxxxxx>; Michael Nelson 
> <micn@xxxxxxxxxxxxx>
> Subject: Re: [PATCH] [RFC] iomap: Use FUA for pure data O_DSYNC DIO 
> writes
> 
> On Tue 13-03-18 18:52:57, Robert Dorr wrote:
> > I tried the RWF_ODSYNC and xfs seems to honor properly but it causes
> > ext4 to go spin out a single CPU and hang the system. 😊
> 
> Yeah, that's a bug in generic direct IO code that I've just recently 
> fixed
> (d9c10e5b8863 "direct-io: Fix sleep in atomic due to sync AIO" in 4.16-rc4).
> 
> 								Honza
> 
> > 
> > 
> > -----Original Message-----
> > From: Christoph Hellwig <hch@xxxxxx>
> > Sent: Tuesday, March 13, 2018 11:13 AM
> > To: Robert Dorr <rdorr@xxxxxxxxxxxxx>
> > Cc: Dave Chinner <david@xxxxxxxxxxxxx>; Dan Williams 
> > <dan.j.williams@xxxxxxxxx>; linux-xfs@xxxxxxxxxxxxxxx; Christoph 
> > Hellwig <hch@xxxxxx>; linux-fsdevel <linux-fsdevel@xxxxxxxxxxxxxxx>; 
> > Jan Kara <jack@xxxxxxx>; Theodore Ts'o <tytso@xxxxxxx>; Matthew 
> > Wilcox <mawilcox@xxxxxxxxxxxxx>; Scott Konersmann 
> > <scottkon@xxxxxxxxxxxxx>; Slava Oks <slavao@xxxxxxxxxxxxx>; Jasraj 
> > Dange <jasrajd@xxxxxxxxxxxxx>; Michael Nelson <micn@xxxxxxxxxxxxx>
> > Subject: Re: [PATCH] [RFC] iomap: Use FUA for pure data O_DSYNC DIO 
> > writes
> > 
> > > I think the answer is the iomap* logic makes sure to issue generic_write_sync for O_DSYNC W1 after W1 is completed by hardware and then waits for completion of the flush request (REQ_PREFLUSH) before W1 is returned to the AIO completion ring, preventing io_getevents from processing W1 before the flush occurs and completes.   I just need proper confirmation from the experts on this code that this is the expected behavior.
> > 
> > Yes. that is the case.
> > 
> > > For SQL Server using O_DIRECT | O_DSYNC on current kernels is very performance impacting.   Instead we enable a mode for SQL that opens O_DIRECT only and issues fsync/fdatasync when we are hardening log files or checkpointing data files.   This reduces the write, flush, write, flush pattern allowing for write, write, write, ... then flush as we only issue flush requests when required to maintain the data integrity boundaries of SQL Server.  The performance is significantly better then the device flush for each write as you can imagine.
> > > 
> > > Testing shows the FUA enhancement is better then the write, flush pattern.   For SQL Server we want to dynamically open with O_DIRECT | O_DSYNC when REQ_FUA can be properly used and open with O_DIRECT and leverage SQL Server's alternate flush scheme when running on older kernel or a system that does not support FUA (SATA, IDE, ...)
> > 
> > There is no really good way to figure this out except for benchmarking, given that the FUA use an implementation detail.
> > 
> > Btw, another feature that might be interesting to you is the RWF_DSYNC flag to the pwritev2 syscall, which allows to apply O_DSYNC semantics on a per-I/O basis instead of using it at open time.
> --
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux