I forgot to mention that you need to verify/set the VMware machines for high-performance/low-lattency workload. На 25 май 2020 г. 17:13:52 GMT+03:00, Strahil Nikolov <hunter86_bg@xxxxxxxxx> написа: > > >На 25 май 2020 г. 5:49:00 GMT+03:00, Olivier ><Olivier.Nicole@xxxxxxxxxxxx> написа: >>Strahil Nikolov <hunter86_bg@xxxxxxxxx> writes: >> >>> On May 23, 2020 7:29:23 AM GMT+03:00, Olivier >><Olivier.Nicole@xxxxxxxxxxxx> wrote: >>>>Hi, >>>> >>>>I have been struggling with NFS Ganesha: one gluster node with >>ganesha >>>>serving only one client could not handle the load when dealing with >>>>thousand of small files. Legacy gluster NFS works flawlesly with 5 >or >>6 >>>>clients. >>>> >>>>But the documentation for gNFS is scarce, I could not find where to >>>>configure the various autorizations, so any pointer is greatly >>welcome. >>>> >>>>Best regards, >>>> >>>>Olivier >>> >>> Hi Oliver, >>> >>> Can you hint me why you are using gluster with a single node in the >>TSP serving only 1 client ? >>> Usually, this is not a typical gluster workload. >> >>Hi Strahil, >> >>Of course I have more than one node, other nodes are supporting the >>bricks and the data. I am using a node with no data to solve this >issue >>with NFS. But in my comparison between gNFS and Ganesha, I was using >>the >>same configuration, with one node with no birck accessing the other >>nodes for the data. So the only change between what is working and >what >>was not is the NFS server. Beside, I have been using NFS for over 15 >>years and know that given my data and type of activity, one single NFS >>server should be able to serve 5 to 10 clients without a problem, that >>is why I suspected Ganesha from the begining. > >You are not clmparing apples-to-apples. Pure NFS has been used in >UNIXes before reaching modern OS-es. Linux has long been using Pure >NFS and the kernel has been optimized for that, while Ganesha is new > tech and requires some tuning. > >You haven't mentioned what kind of issues do you see - searching >a directory, accessing a lot of files for read, writing a lot of >small files, etc. > >Usually a negative lookup (searching/accessing) inexisting object >(file/dir/etc) has a serious performance degradation. > >>If I cannot configure gNFS, I think I could glusterfs_mount the volume >>and use the native NFS server of Linux, but that would add overhead >and >>leave some features behind, that is why my focus is primarily on >>configuring gNFS. >> >>> >>> Also can you specify: >>> - Brick block device type and details (raid type, lvm, vdo, etc ) >> >>All nodes are VMware virtual machines, the RAID being at VMware level > >Yeah, that's not very descriptive. >For write-intensive and small-file workload the optimal raid mode >is raid10 with at least 12 disks per node. >What is the I/O scheduler, are you using Thin LVM or thic? How many >snapshots you have ? >Are you using striping on LVM level ( if you use local storage >then most probably no striping)? >PE size in VG ? > >>> - xfs_info of the brick > >What kind of FS are you using ? You need to be sure that >inode size is at least 512 bytes (1024 for swift) in order to be > supported. > >>> - mount options for the brick >> >>Bricks are not mounted > >It is not good to share OS and Gluster Bricks VMDK. You can benefit >from options like 'noatime,nodiratime,nobarrier,inode64' . Noatime >requires storage with battery-backed write cache. > >>> - SELINUX/APPARMOR status >>> - sysctl tunables (including tuned profile) >> >>All systems are vanilla Ubuntu with no tuning. > >I have done some tests and you can benefit from the rhgs random IO >tuned profile . The latest source rpm can be found at: >ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.5.0.0-1.el7rhgs.src.rpm > >On top of that you need to modify it to disable LRO, as it is >automatically enabled for VMXNET NICs. This increases bandwidth but >reduces lattency which is crucial for looking up thousand of >files/directories. > >>> - gluster volume information and status >> >>sudo gluster volume info gv0 >> >>Volume Name: gv0 >>Type: Distributed-Replicate >>Volume ID: cc664830-1dd0-4dd4-9f1c-493578297e79 >>Status: Started >>Snapshot Count: 0 >>Number of Bricks: 2 x 2 = 4 >>Transport-type: tcp >>Bricks: >>Brick1: gluster3000:/gluster1/br >>Brick2: gluster5000:/gluster/br >>Brick3: gluster3000:/gluster2/br >>Brick4: gluster2000:/gluster/br >>Options Reconfigured: >>features.quota-deem-statfs: on >>features.inode-quota: on >>features.quota: on >>transport.address-family: inet >>nfs.disable: off >>features.cache-invalidation: on >>on@gluster3:~$ sudo gluster volume status gv0 >>Status of volume: gv0 >>Gluster process TCP Port RDMA Port >Online >> Pid >>------------------------------------------------------------------------------ >>Brick gluster3000:/gluster1/br 49152 0 Y > >> 1473 >>Brick gluster5000:/gluster/br 49152 0 Y > >> 724 >>Brick gluster3000:/gluster2/br 49153 0 Y > >> 1549 >>Brick gluster2000:/gluster/br 49152 0 Y > >> 723 >>Self-heal Daemon on localhost N/A N/A Y > >> 1571 >>NFS Server on localhost N/A N/A N > >> N/A >>Quota Daemon on localhost N/A N/A Y > >> 1560 >>Self-heal Daemon on gluster2000.cs.ait.ac.t >>h N/A N/A Y > >> 835 >>NFS Server on gluster2000.cs.ait.ac.th N/A N/A N > >> N/A >>Quota Daemon on gluster2000.cs.ait.ac.th N/A N/A Y > >> 735 >>Self-heal Daemon on gluster5000.cs.ait.ac.t >>h N/A N/A Y > >> 829 >>NFS Server on gluster5000.cs.ait.ac.th N/A N/A N > >> N/A >>Quota Daemon on gluster5000.cs.ait.ac.th N/A N/A Y > >> 736 >>Self-heal Daemon on fbsd3500 N/A N/A Y > >> 2584 >>NFS Server on fbsd3500 2049 0 Y > >> 2671 >>Quota Daemon on fbsd3500 N/A N/A Y > >> 2571 >> >>Task Status of Volume gv0 >>------------------------------------------------------------------------------ >>Task : Rebalance >>ID : 53e7c649-27f0-4da0-90dc-af59f937d01f >>Status : completed > > >You don't have any tunings in the volume, despite the predefined >ones in /var/lib/glusterd/groups. >Both metadata-cache and nl-cache bring some performance when having a >small-file workload. You have to try them and check the results. >Use a real-world workload job for testing, as synthetic benches do >not always show the real truth. >In order to reset (revert) a setting you can use 'gluster volume reset > gv0 <setting>' > > >>> - ganesha settings >> >>MDCACHE >>{ >>Attr_Expiration_Time = 600; >>Entries_HWMark = 50000; >>LRU_Run_Interval = 90; >>FD_HWMark_Percent = 60; >>FD_LWMark_Percent = 20; >>FD_Limit_Percent = 90; >>} >>EXPORT >>{ >> Export_Id = 2; >> etc. >>} >> >>> - Network settings + MTU >> >>MTU 1500 (I think it is my switch that never worked with jumbo >>frames). I have a dedicated VLAN for NFS and gluster and a VLAN for >>users connection. > >Verify that there is no fragmentation between the TSP nodes and >between the NFS (Ganesha) and the cluster: >For example MTU is 1500 , then use a size of 1500 - 28 (ICMP >+ IP headers) = 1472 >ping -M do -s 1472 -c 4 -I <interface> <other gluster node> > >Even the dumbest gigabit switches support Jumbo frames of 9000 >(anything above that is supported by expensive hardware), so I >would recommend you to verify if Jumbo frames is possible at least >between the TSP nodes and maybe the NFS. > >>I hope that helps. >> >>Best regards, >> >>Olivier >> >>> >>> Best Regards, >>> Strahil Nikolov >>> > > >As you can see you are getting further into the deep and we haven't >covered the storage stack yet, nor any Ganesha settings :) > >Good luck! > >Best Regards, >Strahil Nikolov ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users