Hi,
I want to report back the performance issues i've had so far with
glusterfs mainline 2.5, patch 690 and fuse-2.7.2glfs8.
I'm setting a mail system, which is all ran by Xen 3.2.0 and every
"actual" piece of the mail system is a virtual machine from xen.
Anyway... the virtual machines accessing glusterfs are 6 dovecots and 4
postfixs. There are also 6 nodes, which share their own disk to the
gluster filesystem. Two of the nodes, share 2 disks, one for the
glusterfs, and the other for the namespace
these are the conf files:
****nodes with namespace****
volume esp
type storage/posix
option directory /mnt/compartit
end-volume
volume espa
type features/posix-locks
subvolumes esp
end-volume
volume espai
type performance/io-threads
option thread-count 15
option cache-size 512MB
subvolumes espa
end-volume
volume nm
type storage/posix
option directory /mnt/namespace
end-volume
volume ultim
type protocol/server
subvolumes espai nm
option transport-type tcp/server
option auth.ip.espai.allow *
option auth.ip.nm.allow *
end-volume
*************
***nodes without namespace*****
volume esp
type storage/posix
option directory /mnt/compartit
end-volume
volume espa
type features/posix-locks
subvolumes esp
end-volume
volume espai
type performance/io-threads
option thread-count 15
option cache-size 512MB
subvolumes espa
end-volume
volume ultim
type protocol/server
subvolumes espai
option transport-type tcp/server
option auth.ip.espai.allow *
end-volume
*****************************
***clients****
volume espai1
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.204
option remote-subvolume espai
end-volume
volume espai2
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.205
option remote-subvolume espai
end-volume
volume espai3
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.206
option remote-subvolume espai
end-volume
volume espai4
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.207
option remote-subvolume espai
end-volume
volume espai5
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.213
option remote-subvolume espai
end-volume
volume espai6
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.214
option remote-subvolume espai
end-volume
volume namespace1
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.204
option remote-subvolume nm
end-volume
volume namespace2
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.205
option remote-subvolume nm
end-volume
volume grup1
type cluster/afr
subvolumes espai1 espai2
end-volume
volume grup2
type cluster/afr
subvolumes espai3 espai4
end-volume
volume grup3
type cluster/afr
subvolumes espai5 espai6
end-volume
volume nm
type cluster/afr
subvolumes namespace1 namespace2
end-volume
volume ultim
type cluster/unify
subvolumes grup1 grup2 grup3
option scheduler rr
option namespace nm
end-volume
************
The thing is that in earlier patches, the whole system used to hang,
with many different error messages.
Right now, it's been on for days without any hang at all, but i'm facing
serious performance issues.
By only running an "ls" command, it can take like 3 seconds to show
something when the system is "under load". It doesn't happen at all when
there's no activity, so i don't thing has anything to do with xen. Well,
actually, "under load" can mean 3 mails arriving per second. I'm
monitoring everything, and no virtual machine is using more than 20% of
cpu or so.
First, i had log level on both nodes and clients set to DEBUG, but now
is just WARNING, and i've restarted everything so many times.
I was suggested to use "type performance/io-threads" on the node side.
It actually worked, before that, it wasn't 3 seconds, but 5 or more.
I've set the "thread-count" value to different values and also "cache-size"
The system is supposed to handle a big amount of traffic, far more than
3 mails a second.
What do you think about the whole set up? Should i keep using namespace?
Should i use new nodes for namespaces? Should i use different values for
iothread?
One last thing... i'm using reiserfs on the "storage devices" that nodes
share. Should i be using XFS or something else?
Logs don't show any kind of error now... i don't have a clue of what is
failing now....
I would be pleased if you could give some ideas.
Thank you.