На 25 май 2020 г. 5:49:00 GMT+03:00, Olivier <Olivier.Nicole@xxxxxxxxxxxx> написа: >Strahil Nikolov <hunter86_bg@xxxxxxxxx> writes: > >> On May 23, 2020 7:29:23 AM GMT+03:00, Olivier ><Olivier.Nicole@xxxxxxxxxxxx> wrote: >>>Hi, >>> >>>I have been struggling with NFS Ganesha: one gluster node with >ganesha >>>serving only one client could not handle the load when dealing with >>>thousand of small files. Legacy gluster NFS works flawlesly with 5 or >6 >>>clients. >>> >>>But the documentation for gNFS is scarce, I could not find where to >>>configure the various autorizations, so any pointer is greatly >welcome. >>> >>>Best regards, >>> >>>Olivier >> >> Hi Oliver, >> >> Can you hint me why you are using gluster with a single node in the >TSP serving only 1 client ? >> Usually, this is not a typical gluster workload. > >Hi Strahil, > >Of course I have more than one node, other nodes are supporting the >bricks and the data. I am using a node with no data to solve this issue >with NFS. But in my comparison between gNFS and Ganesha, I was using >the >same configuration, with one node with no birck accessing the other >nodes for the data. So the only change between what is working and what >was not is the NFS server. Beside, I have been using NFS for over 15 >years and know that given my data and type of activity, one single NFS >server should be able to serve 5 to 10 clients without a problem, that >is why I suspected Ganesha from the begining. You are not clmparing apples-to-apples. Pure NFS has been used in UNIXes before reaching modern OS-es. Linux has long been using Pure NFS and the kernel has been optimized for that, while Ganesha is new tech and requires some tuning. You haven't mentioned what kind of issues do you see - searching a directory, accessing a lot of files for read, writing a lot of small files, etc. Usually a negative lookup (searching/accessing) inexisting object (file/dir/etc) has a serious performance degradation. >If I cannot configure gNFS, I think I could glusterfs_mount the volume >and use the native NFS server of Linux, but that would add overhead and >leave some features behind, that is why my focus is primarily on >configuring gNFS. > >> >> Also can you specify: >> - Brick block device type and details (raid type, lvm, vdo, etc ) > >All nodes are VMware virtual machines, the RAID being at VMware level Yeah, that's not very descriptive. For write-intensive and small-file workload the optimal raid mode is raid10 with at least 12 disks per node. What is the I/O scheduler, are you using Thin LVM or thic? How many snapshots you have ? Are you using striping on LVM level ( if you use local storage then most probably no striping)? PE size in VG ? >> - xfs_info of the brick What kind of FS are you using ? You need to be sure that inode size is at least 512 bytes (1024 for swift) in order to be supported. >> - mount options for the brick > >Bricks are not mounted It is not good to share OS and Gluster Bricks VMDK. You can benefit from options like 'noatime,nodiratime,nobarrier,inode64' . Noatime requires storage with battery-backed write cache. >> - SELINUX/APPARMOR status >> - sysctl tunables (including tuned profile) > >All systems are vanilla Ubuntu with no tuning. I have done some tests and you can benefit from the rhgs random IO tuned profile . The latest source rpm can be found at: ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.5.0.0-1.el7rhgs.src.rpm On top of that you need to modify it to disable LRO, as it is automatically enabled for VMXNET NICs. This increases bandwidth but reduces lattency which is crucial for looking up thousand of files/directories. >> - gluster volume information and status > >sudo gluster volume info gv0 > >Volume Name: gv0 >Type: Distributed-Replicate >Volume ID: cc664830-1dd0-4dd4-9f1c-493578297e79 >Status: Started >Snapshot Count: 0 >Number of Bricks: 2 x 2 = 4 >Transport-type: tcp >Bricks: >Brick1: gluster3000:/gluster1/br >Brick2: gluster5000:/gluster/br >Brick3: gluster3000:/gluster2/br >Brick4: gluster2000:/gluster/br >Options Reconfigured: >features.quota-deem-statfs: on >features.inode-quota: on >features.quota: on >transport.address-family: inet >nfs.disable: off >features.cache-invalidation: on >on@gluster3:~$ sudo gluster volume status gv0 >Status of volume: gv0 >Gluster process TCP Port RDMA Port Online > Pid >------------------------------------------------------------------------------ >Brick gluster3000:/gluster1/br 49152 0 Y > 1473 >Brick gluster5000:/gluster/br 49152 0 Y > 724 >Brick gluster3000:/gluster2/br 49153 0 Y > 1549 >Brick gluster2000:/gluster/br 49152 0 Y > 723 >Self-heal Daemon on localhost N/A N/A Y > 1571 >NFS Server on localhost N/A N/A N > N/A >Quota Daemon on localhost N/A N/A Y > 1560 >Self-heal Daemon on gluster2000.cs.ait.ac.t >h N/A N/A Y > 835 >NFS Server on gluster2000.cs.ait.ac.th N/A N/A N > N/A >Quota Daemon on gluster2000.cs.ait.ac.th N/A N/A Y > 735 >Self-heal Daemon on gluster5000.cs.ait.ac.t >h N/A N/A Y > 829 >NFS Server on gluster5000.cs.ait.ac.th N/A N/A N > N/A >Quota Daemon on gluster5000.cs.ait.ac.th N/A N/A Y > 736 >Self-heal Daemon on fbsd3500 N/A N/A Y > 2584 >NFS Server on fbsd3500 2049 0 Y > 2671 >Quota Daemon on fbsd3500 N/A N/A Y > 2571 > >Task Status of Volume gv0 >------------------------------------------------------------------------------ >Task : Rebalance >ID : 53e7c649-27f0-4da0-90dc-af59f937d01f >Status : completed You don't have any tunings in the volume, despite the predefined ones in /var/lib/glusterd/groups. Both metadata-cache and nl-cache bring some performance when having a small-file workload. You have to try them and check the results. Use a real-world workload job for testing, as synthetic benches do not always show the real truth. In order to reset (revert) a setting you can use 'gluster volume reset gv0 <setting>' >> - ganesha settings > >MDCACHE >{ >Attr_Expiration_Time = 600; >Entries_HWMark = 50000; >LRU_Run_Interval = 90; >FD_HWMark_Percent = 60; >FD_LWMark_Percent = 20; >FD_Limit_Percent = 90; >} >EXPORT >{ > Export_Id = 2; > etc. >} > >> - Network settings + MTU > >MTU 1500 (I think it is my switch that never worked with jumbo >frames). I have a dedicated VLAN for NFS and gluster and a VLAN for >users connection. Verify that there is no fragmentation between the TSP nodes and between the NFS (Ganesha) and the cluster: For example MTU is 1500 , then use a size of 1500 - 28 (ICMP + IP headers) = 1472 ping -M do -s 1472 -c 4 -I <interface> <other gluster node> Even the dumbest gigabit switches support Jumbo frames of 9000 (anything above that is supported by expensive hardware), so I would recommend you to verify if Jumbo frames is possible at least between the TSP nodes and maybe the NFS. >I hope that helps. > >Best regards, > >Olivier > >> >> Best Regards, >> Strahil Nikolov >> As you can see you are getting further into the deep and we haven't covered the storage stack yet, nor any Ganesha settings :) Good luck! Best Regards, Strahil Nikolov ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users