Very slow directory listing and high CPU usage on replicated volume

harry.mangalam at uci.edu (harry mangalam) · Mon, 05 Nov 2012 06:57:19 -0800

Jeff Darcy wrote a nice piece in his hekafs blog about 'the importance of 
keeping things sequential' which is essentially about the contention for heads 
between data io and journal io.
<http://hekafs.org/index.php/2012/11/the-importance-of-staying-sequential/>
(also congrats on the Linux Journal article on the glupy python/gluster 
approach).

We've been experimenting with SSDs on ZFS (using the SSDs fo the ZIL 
(journal)) and while it's provided a little bit of a boost, it has not been 
dramatic.  Ditto XFS.  However, we did not stress it at all with heavy loads 
in a gluster env and I'm now thinking that there is where you would see the 
improvement. (see Jeff's graph about how the diff in threads/load affects 
IOPS).

Is anyone running a gluster system with the underlying XFS writing the journal 
to SSDs?  If so, any improvement?  I would have expected to hear about this as 
a recommended architecture for gluster if it had performed MUCH better, but 
...?

We're about to combine 2 clusters and may just go ahead with this approach as 
a /scratch system to test this approach.

hjm

On Monday, November 05, 2012 07:58:22 AM Jonathan Lefman wrote:
> I take it back. Things deteriorated pretty quickly after I began dumping
> data onto my volume from multiple clients. Initially my transfer rates were
> okay, not fast, but livable. However after about an hour of copying several
> terabytes from 3-4 client machines, the rates of transfer often dropped to
> lb/s. Sometimes I would see a couple second burst of good transfer rates.
> 
> Anyone have ideas on how to address this effectively? I'm at a loss.
> 
> -Jon
> On Nov 2, 2012 1:21 PM, "Jonathan Lefman" <jonathan.lefman at essess.com>
> 
> wrote:
> > I should have also said that my volume is working well now and all is
> > well.
> > 
> > -Jon
> > 
> > 
> > On Fri, Nov 2, 2012 at 1:21 PM, Jonathan Lefman <
> > 
> > jonathan.lefman at essess.com> wrote:
> >> Thank you Brian. I'm happy to hear that this behavior is not typical. I
> >> am now using xfs on all of my drives.  I also wiped out the entire
> >> 
> >>  /etc/glusterd directory for good measure.  I bet that there was residual
> >> 
> >> information from a previous attempt at a gluster volume that must have
> >> caused problems.  Or moving to xfs from ext4 is an amazing fix, but I
> >> think
> >> this is less likely.
> >> 
> >> I appreciate your time responding to me.
> >> 
> >> -Jon
> >> 
> >> On Nov 2, 2012 4:44 AM, "Brian Candler" <B.Candler at pobox.com> wrote:
> >>> On Thu, Nov 01, 2012 at 08:03:21PM -0400, Jonathan Lefman wrote:
> >>> >    Soon after loading up about 100 MB of small files (about 300kb
> >>> 
> >>> each),
> >>> 
> >>> >    the drive usage is at 1.1T.
> >>> 
> >>> That is very odd. What do you get if you run du and df on the individual
> >>> bricks themselves? 100MB is only ~330 files of 300KB each.
> >>> 
> >>> Did you specify any special options to mkfs.ext4? Maybe -l 512 would
> >>> help,
> >>> as the xattrs are more likely to sit within the indoes themselves.
> >>> 
> >>> If you start everything from scratch, it would be interesting to see df
> >>> stats when the filesystem is empty.  It may be that a huge amount of
> >>> space
> >>> has been allocated to inodes.  If you expect most of your files >16KB
> >>> then
> >>> you could add -i 16384 to mkfs.ext4 to reduce the space reserved for
> >>> inodes.
> >>> But using xfs would be better, as it doesn't reserve any space for
> >>> inodes,
> >>> it allocates it dynamically.
> >>> 
> >>> Ignore the comment that glusterfs is "not designed for handling large
> >>> count
> >>> small files" - 300KB is not small.
> >>> 
> >>> Regards,
> >>> 
> >>> Brian.
-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
--
Passive-Aggressive Supporter of the The Canada Party:
  <http://www.americabutbetter.com/>