Re: Very slow roaming profiles on top of glusterfs

Alex Crow <acrow@xxxxxxxxxxxxxxxx> · Mon, 14 Sep 2015 13:37:27 +0100

Hi Diego,

I think it's the overhead of fstat() calls. Gluster keeps its metadata 
on the bricks themselves, and this has to be looked up for every file 
access. For big files this is not an issue as it only happens once, but 
when accessing lots of small files this overhead rapidly builds up, the 
smaller the file the worse the issue. Profiles do have hundreds of very 
small files!

I was looking to use GlusterFS for generic file sharing as well, but I 
noticed the same issue while testing backups from a GlusterFS volume. On 
one vol (scanned 4-bit greyscale images and small PDFs) backups were 
taking over 16 hours whereas with a traditional FS they were completing 
in just over 1 hour.

It may be worth trying out one of the distributed filesystems that use a 
separate in-memory metadata server. I've tried LizardFS and MooseFS and 
they are both much faster than GlusterFS for small files, although 
large-file sequential performance is not as good (but still plenty for a 
Samba server).

Alex

On 14/09/15 13:21, Diego Remolina wrote:
Bump...

Anybody has any clues as to how I can try and identify the cause of
the slowness?

Diego

On Wed, Sep 9, 2015 at 7:42 PM, Diego Remolina <dijuremo@xxxxxxxxx> wrote:
Hi,

I am running two glusterfs servers as replicas. I have a 3rd server
which provides quorum. Since gluster was introduced, we have had an
issue where windows roaming profiles are extremely slow. The initial
setup was done on 3.6.x and since 3.7.x has small file performance
improvements, I upgraded to 3.7.3, but that has not helped.

It seems that for some reason gluster is very slow when dealing with
lots of small files. I am not sure how to really troubleshoot this via
samba, but I have come up with other tests that produce rather
disconcerting results as shown below.

If I run directly on the brick:
[root@ysmha01 /]# time ( find
/bricks/hdds/brick/home/jgibbs/.winprofile.V2 -type f > /dev/null )
real 0m3.683s
user 0m0.042s
sys 0m0.154s

Now running on the gluster volume mounted via fuse:
[root@ysmha01 /]# mount | grep export
10.0.1.6:/export on /export type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,allow_other,max_read=131072)

[root@ysmha01 /]# time ( find /export/home/jgibbs/.winprofile.V2 -type
f > /dev/null )
real 0m57.812s
user 0m0.118s
sys 0m0.374s

In general, the time to run the command on this particular user can be
up to 2 minutes. If I run the command on the brick first, then it
seems the time to run on the mounted gluster volume is lower like in
the example above. I assume some caching in preserved.

This particular user has 13,216 files in his roaming profile, which
adds up to about 452MB of data.

The server performance over samba for copying big files (both read and
write) is great, I can almost max out the gigabit connections on the
desktops.

Reading from samba share on the server and writing to local drive:
111MB/s (Copying a 650MB iso file)
Reading from local drive and writing to server samba share: 94MB/s
(Copying a 3.2GB ISO file)

The servers are connected to the network with 10Gbit adapters and also
use separate adapters; one 10 Gbit adapter is used for services, and
other for the backend storage communication.

The servers have hardware raid controllers and the samba shares are on
top of an Areca ARC-1882 controller, with a volume made out of 12 2TB
drives in raid 6.

If you can provide any steps to better troubleshoot this problem and
fix the issue, I will really appreciate it.

Diego

Further details about the machines below:

[root@ysmha01 /]# cat /etc/redhat-release
CentOS Linux release 7.1.1503 (Core)

[root@ysmha01 /]# gluster volume info export
Volume Name: export
Type: Replicate
Volume ID: b4353b3f-6ef6-4813-819a-8e85e5a95cff
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.0.1.7:/bricks/hdds/brick
Brick2: 10.0.1.6:/bricks/hdds/brick
Options Reconfigured:
performance.io-cache: on
performance.io-thread-count: 64
nfs.disable: on
cluster.server-quorum-type: server
performance.cache-size: 1024MB
server.allow-insecure: on
cluster.server-quorum-ratio: 51%

Each server has dual Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz with
32GB of memory.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

--
This message is intended only for the addressee and may contain
confidential information. Unless you are that person, you may not
disclose its contents or use it in any way and are requested to delete
the message along with any attachments and notify us immediately.
"Transact" is operated by Integrated Financial Arrangements plc. 29
Clement's Lane, London EC4N 7AE. Tel: (020) 7608 4900 Fax: (020) 7608
5300. (Registered office: as above; Registered in England and Wales
under number: 3727592). Authorised and regulated by the Financial
Conduct Authority (entered on the Financial Services Register; no. 190856).

.

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users