Re: Sudden, dramatic performance drops with Glusterfs

Michael Rightmire <Michael.Rightmire@xxxxxxx> · Fri, 8 Nov 2019 09:32:44 +0100

Hi  Strahil, 

Thanks for the reply. See below. 

Also, as an aside, I tested by installing a single Cenots 7 machine with
 the ZBOD, installed gluster and ZFSonLinux as recommended at..

 https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Administrator%20Guide/Gluster%20On%20ZFS/

And created a gluster volume consisting of one brick made up of a local 
ZFS raidz2, copied about 4 TB of data to it, and am having the same 
issue. 

The biggest part of the issue is with things like "ls" and "find". IF I 
read a single file, or write a single file it works great. But if I run 
rsync (which does alot of listing, writing, renaming, etc) it is slow as
 garbage. I.e. a find command that will finish in 30 seconds when run 
directly on the underlying ZFS directory, takes about an hour. 

Strahil wrote on 08-Nov-19 05:39:

  Hi Michael,

  What is your 'gluster volume info <VOL> ' showing.

I've been playing with the install (since it's a fresh machine) so I can't
 give you verbatim output. However, it was showing two bricks, one on 
each server, started, and apparently healthy. 

  How much is your zpool full ? Usually when it gets too 
full, the ZFS performance drops seriosly.

The zpool is only at about 30% usage. It's a new server setup.

We have about 10TB of data on a 30TB volume (made up of two 30TB ZFS 
raidz2 bricks, each residing on different servers, via a 10GB dedicated 
Ethernet connection.) 

  Try to rsync a file directly to one of the bricks, then 
to the other brick (don't forget to remove the files after that, as 
gluster will not know about them).

If I rsync manually, or scp a file directly to the zpool bricks (outside
 of gluster) I get 30-100MBytes/s (depending on what I'm copying.)

If I rsync THROUGH gluster (via the glusterfs mounts) I get 1 - 5MB/s

  What are your mounting options ? Usually 
'noatime,nodiratime' are a good start.

I'll try these. Currently using ...

(mounting TO serverA) serverA:/homes /glusterfs/homes    glusterfs 
defaults,_netdev 0 0

  Are you using ZFS provideed by Ubuntu packagees or 
directly from ZOL project ?

ZFS provided by Ubuntu 18 repo...

  libzfs2linux/bionic-updates,now 0.7.5-1ubuntu16.6 amd64 
[installed,automatic]

  zfs-dkms/bionic-updates,bionic-updates,now 0.7.5-1ubuntu16.6 all 
[installed]

  zfs-zed/bionic-updates,now 0.7.5-1ubuntu16.6 amd64 
[installed,automatic]

  zfsutils-linux/bionic-updates,now 0.7.5-1ubuntu16.6 amd64 [installed]

Gluster provided by. "add-apt-repository ppa:gluster/glusterfs-5" ...

  glusterfs 5.10

  Repository revision: git://git.gluster.org/glusterfs.git

  Best Regards,

Strahil Nikolov

  On Nov 6, 2019 12:50, Michael Rightmire 
<Michael.Rightmire@xxxxxxx> wrote:

Hello list!

I'm new to Glusterfs in general. We have chosen to use it as our 
distributed file system on a new set of HA file servers. 

The setup is: 

2 SUPERMICRO SuperStorage Server 6049PE1CR36L with 24-4TB spinning disks
 and NVMe for cache and slog.

HBA not RAID card 

Ubuntu 18.04 server (on both systems)

ZFS filestorage

Glusterfs 5.10

Step one was to install Ubuntu, ZFS, 
and gluster. This all 
went without issue. 

We have 3 ZFS raidz2 identical on both servers

We have three glusterfs mirrored 
volumes - 1 attached to each 
raidz on each server. I.e.

And mounted the gluster volumes as (for
 example) "/glusterfs/homes -> 
/zpool/homes". I.e. 

gluster volume create homes replica 2 transport tcp server1:/zpool-homes/homes
 server2:/zpool-homes/homes force

  (on server1) server1:/homes     44729413504 16032705152 
28696708352  36% /glusterfs/homes

    The problem is, the 
performance has deteriorated terribly.  

    We needed to copy all
 of our data from the old server to the new glusterfs volumes 
(appx. 60TB).

We decided to do this with multiple 
rsync commands (like 400 simultanous rsyncs)

The copy went well for the first 4 days, with an average across all 
rsyncs of  150-200 MBytes per second. 

  Then, suddenly, on the fourth day, it dropped to about 50 MBytes/s.

Then, by the end of the day, down to ~5MBytes/s
 (five).

I've stopped the rsyncs, and I can still copy an individual file across to
 the glusterfs shared directory at 100MB/s. 

But actions such as "ls -la" or "find" 
take forever!

    Are there obvious flaws in my setup
 to correct?

    How can I better 
troubleshoot this?

Thanks!

-- 

Mike

-- 

Mike

Karlsruher Institut für Technologie (KIT)

Institut für Anthropomatik und Robotik (IAR)

Hochperformante Humanoide Technologien (H2T)

Michael Rightmire  

B.Sci, HPUXCA, MCSE, MCP, VDB, ISCB

Systems IT/Development

Adenauerring 2 , Gebäude 50.20, Raum 022

76131 Karlsruhe

Telefon: +49 721 608-45032

Fax:     +49
 721
608-44077

E-Mail:    Michael.Rightmire@xxxxxxx

http://www.humanoids.kit.edu/

http://h2t.anthropomatik.kit.edu

KIT – Die Forschungsuniversität in der
Helmholtz-Gemeinschaft

Das KIT ist seit 2010 als familiengerechte 
Hochschule
zertifiziert

________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users