Hi, We are looking into replacing our current storage solution and are evaluating gluster for this purpose. Our current solution uses a SAN with two servers attached that serve samba and NFS 4. Clients connect to those servers using NFS or SMB. All users' home directories live on this server. I would like to have some insight in who else is using gluster for home directories for about 500 users and what performance they get out of the solution. Which connectivity method are you using on the clients (gluster native, nfs, smb)? Which volume options do you have configured for your gluster volume? What hardware are you using? Are you using snapshots and/or quota? If so, any number on performance impact? The solution I had in mind for our setup is multiple servers/bricks with replica 3 arbiter 1 volume where each server is also running nfs-ganesha and samba in HA. Clients would be connecting to one of the nfs servers (dns round robin). In this case the nfs servers would be the gluster clients. Gluster traffic would go over a dedicated network with 10G and jumbo frames. I'm currently testing gluster (3.12, now 3.13) on older machines[1] and have created a replica 3 arbiter 1 volume 2x(2+1). I seem to run in all sorts of (performance) problems. I must be doing something wrong but I've tried all sorts of benchmarks and nothing seems to make my setup live up to what I would expect from this hardware. * I understand that gluster only starts to work well when multiple clients are connecting in parallel, but I did expect the single client performance to be better. * Unpacking the linux-4.15.7.tar.xz file on the brick XFS filesystem followed by a sync takes about 1 minute. Doing the same on the gluster volume using the fuse client (client is one of the brick servers) takes over 9 minutes and neither disk nor cpu nor network are reaching their bottleneck. Doing the same over NFS-ganesha (client is a workstation connected through gbit) takes even longer (more than 30min!?). I understand that unpacking a lot of small files may be the worst workload for a distributed filesystem, but when I look at the file sizes of the files in our users' home directories, more than 90% is smaller than 1MB. * A file copy of a 300GB file over NFS 4 (nfs-ganesha) starts fast (90MB/s) and then drops to 20MB/s. When I look at the servers during the copy, I don't see where the bottleneck is as the cpu, disk and network are not maxing out (on none of the bricks). When the same client copies the file to our current NFS storage it is limited by the gbit network connection of the client. * I had the 'cluster.optimize-lookup' option enabled but ran into all sorts of issues where ls is showing either the wrong files (content of a different directory), or claiming a directory does not exist when mkdir says it already exists... I current have the following options set: server.outstanding-rpc-limit: 256 client.event-threads: 4 performance.io-thread-count: 16 performance.parallel-readdir: on server.event-threads: 4 performance.cache-size: 2GB performance.rda-cache-limit: 128MB performance.write-behind-window-size: 8MB performance.md-cache-timeout: 600 performance.cache-invalidation: on performance.stat-prefetch: on network.inode-lru-limit: 500000 performance.nl-cache-timeout: 600 performance.nl-cache: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on transport.address-family: inet nfs.disable: on cluster.enable-shared-storage: enable The brick servers have 2 dual-core cpu's so I've set the client and server event threads to 4. * When using nfs-ganesha I run into bugs that makes me wonder who is using nfs-ganesha with gluster and why are they not hitting these bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1543996 https://bugzilla.redhat.com/show_bug.cgi?id=1405147 * nfs-ganesha does not have the 'async' option that kernel nfs has. I can understand why they don't want to implement this feature, but do wonder how others are increasing their nfs-ganesha performance. I've put some SSD's in each brick and have them configured as lvmcache to the bricks. This setup only increases throughput once the data is on the ssd and not for just-written data. Regards, Rik [1] 4 servers with 2 1Gbit nics (one for the client traffic, one for s2s traffic with jumbo frames enabled). Each server has two disks (bricks). [2] ioping from the nfs client shows the following latencies: min/avg/max/mdev = 695.2 us / 2.17 ms / 7.05 ms / 1.92 ms ping rtt from client to nfs-ganesha server: rtt min/avg/max/mdev = 0.106/1.551/6.195/2.098 ms ioping on the volume fuse mounted from a brick: min/avg/max/mdev = 557.0 us / 824.4 us / 2.68 ms / 421.9 us ioping on the brick xfs filesystem: min/avg/max/mdev = 275.2 us / 515.2 us / 12.4 ms / 1.21 ms Are these normal numbers? _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users