Re: XFS IO multiplication problem on centos/rhel 6 using hp p420i raid controllers

Dennis Kaarsemaker <dennis.kaarsemaker@xxxxxxxxxxx> · Fri, 08 Mar 2013 10:09:08 +0100

On Fri, 2013-03-08 at 10:00 +1100, Dave Chinner wrote:
> On Thu, Mar 07, 2013 at 11:12:08AM +0100, Dennis Kaarsemaker wrote:
> > On Thu, 2013-03-07 at 14:57 +1100, Dave Chinner wrote:
> > > On Wed, Mar 06, 2013 at 02:53:12PM +0100, Dennis Kaarsemaker wrote:
> ....
> > > > #<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
> > > > #cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut 
> > > >    1   0  1636   4219     16      1   2336    313    184    195     12     133 
> > > >    1   0  1654   2804     64      3   2919    432    391    352     20     208 
> > > > 
> > > > [root@bc291bprdb-01 ~]# collectl
> > > > #<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
> > > > #cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut 
> > > >    1   0  2220   3691    332     13  39992    331    112    122      6      92 
> > > >    0   0  1354   2708      0      0  39836    335    103    125      9      99 
> > > >    0   0  1563   3023    120      6  44036    369    399    317     13     188 
> > > > 
> > > > Notice the KBWrit difference. These are two identical hp gen 8 machines,
> > > > doing the same thing (replicating the same mysql schema). The one
> > > > writing ten times as many bytes in the same amount of transactions is
> > > > running centos 6 (and was running rhel 6).
> > > 
> > > So what is the problem? it is writing too much on the on the centos
> > > 6 machine? Either way, this doesn't sound like a filesystem problem
> > > - the size and amount of data writes is entirely determined by the
> > > application.
> > 
> > For performing the same amount of work (processing the same mysql
> > transactions, the same amount of IO transactions resulting from them),
> > the 'broken' case writes ten-ish times as many bytes.
> 
> Thanks for clarifying.
> 
> > > > /dev/mapper/sysvm-mysqlVol /mysql/bp xfs rw,relatime,attr2,delaylog,allocsize=1024k,logbsize=256k,sunit=512,swidth=1536,noquota 0 0
> > > 
> > > What is the reason for using allocsize, sunit/swidth? Are you using
> > > them on other machines?
> > 
> > xfs autodetects them from the hpsa driver. They seem to be correct for
> > the raid layout (256 strips, 3 drives per mirror pool) and I don't seem
> > to be able to override them.
> 
> That's fine, they're set correctly. I'd forgotten that the number
> are emitted in /proc/mounts even when they are not specified as
> mount options.
> 
> > > And if you remove the allocsize mount option, does the behaviour on
> > > centos6.3 change? What happens if you set allocsize=4k?
> > 
> > The allocsize parameter has no effect. It was put in place to correct a
> > monitoring issue: due to mysql's access patterns, using the default
> > large allocsize on rhel 6 makes our monitoring report the filesystem as
> > much fuller than it actually is.
> 
> Which is due to speculative EOF preallocation, and so it is only set
> on the CentOS box that is showing the larger write behaviour? Have
> you tried setting it to 4k? If not, please do - EOF preallocation for
> sparse extending writes can result in extra zeroing occurring, and
> so if it is anything related to the filesystem, this is the likely
> culprit. Setting it to 4k sets it back to the default value used
> on older versions of Linux....

I've set it to 4k, but no change, though I haven't rebuilt the files yet
with this setting (doing that as we speak, takes 90 minutes). I'm also
wondering how this could cause the increasing bytes out as reported by
vmstat, should zeroing do that?

-- 
Dennis Kaarsemaker, Systems Architect
Booking.com
Herengracht 597, 1017 CE Amsterdam
Tel external +31 (0) 20 715 3409
Tel internal (7207) 3409

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs