On May 16, 2020 12:41:09 PM GMT+03:00, "Patrick Bégou" <Patrick.Begou@xxxxxxxxxxxxxxxxxxxx> wrote: >Hi Barbara, > >Thanks for all these suggestions. Yes, jumbo frames are activated and I >have only two 10Gb ethernet switch between the server and the client, >connected with a monomode fiber. >I saw yesterday that the client showing the problem had not the right >MTU (1500 instead of 9000). I don't know why. I changed the MTU to 9000 >yesterday and I'm looking at the logs now to see if the problems occur >again. > >I will try to increase the number of nfs daemon in a few day, to check >each setup change one after the other. Because of covid19, I'm working >from home so I should be really careful when changing the setup of the >servers. > >On a cluster node I try to set "rsize=1048576,wsize=1048576,vers=4,tcp" >(I cannot have a larger value for rsize/wsize) but comparison with the >mount using default setup do not show significant improvements. I sent >20GB to the server or 2x10GB (2 concurrent processes) with dd to be >larger than the raid controller cache but lower than the server and >client RAM. It was just a short test this morning. > >Patrick > >Le 15/05/2020 à 15:32, Barbara Krašovec a écrit : >> The number of threads has nothing to do with the number of cores on >the machine. It depends on the I/O, network speed, type of workload >etc. >> We usually start with 32 threads and increase if necessary. >> >> You can check the statistics with: >> watch 'cat /proc/net/rpc/nfsd | grep th’ >> >> Or you can check on the client >> bide5.bin >> nfsstat -rc >> Client rpc stats: >> calls retrans authrefrsh >> 1326777974 0 1326645701 >> >> If you see a large number of retransmissions, you should increase the >number of threads. >> >> However, your problem could also be related to the filesystem or >network. >> >> Do you have jumbo frames (if yes, you should have them on clients and >server)? You might think about disabling flow control on the switch and >on the network card. Are there a lot of dropped packets? >> >> For network tuning, check http://fasterdata.es.net/host-tuning/linux/ >> >> Did you try to enable readahead (blockdev —setra) on the filesystem? >> >> On the client side, changing the mount options helps. The default >read/write block size is quite little, increase it (rsize, wsize), and >use noatime. >> >> >> Cheers, >> Barbara >> >> >> >> >> >>> On 15 May 2020, at 09:26, Patrick Bégou ><Patrick.Begou@xxxxxxxxxxxxxxxxxxxx> wrote: >>> >>> Le 13/05/2020 à 15:36, Patrick Bégou a écrit : >>>> Le 13/05/2020 à 07:32, Simon Matter via CentOS a écrit : >>>>>> Le 12/05/2020 à 16:10, James Pearson a écrit : >>>>>>> Patrick Bégou wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I need some help with NFSv4 setup/tuning. I have a dedicated >nfs server >>>>>>>> (2 x E5-2620 8cores/16 threads each, 64GB RAM, 1x10Gb ethernet >and 16x >>>>>>>> 8TB HDD) used by two servers and a small cluster (400 cores). >All the >>>>>>>> servers are running CentOS 7, the cluster is running CentOS6. >>>>>>>> >>>>>>>> Time to time on the server I get: >>>>>>>> >>>>>>>> kernel: NFSD: client xxx.xxx.xxx.xxx testing state ID >with >>>>>>>> incorrect client ID >>>>>>>> >>>>>>>> And the client xxx.xxx.xxx.xxx freeze whith: >>>>>>>> >>>>>>>> kernel: nfs: server xxxxx.legi.grenoble-inp.fr not >responding, >>>>>>>> still trying >>>>>>>> kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK >>>>>>>> kernel: nfs: server xxxxx.legi.grenoble-inp.fr not >responding, >>>>>>>> still trying >>>>>>>> kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK >>>>>>>> >>>>>>>> There is a discussion on RedHat7 support about this but only >open to >>>>>>>> subscribers. Other searches with google do not provide useful >>>>>>>> information. >>>>>>>> >>>>>>>> Do you have an idea how to solve these freeze states ? >>>>>>>> >>>>>>>> More generally I would be really interested with some >advice/tutorials >>>>>>>> to improve NFS performances in this dedicated context. There >are so >>>>>>>> many >>>>>>>> [different] things about tuning NFS available on the web that >I'm a >>>>>>>> little bit lost (the opposite of the previous question). So if >some one >>>>>>>> has "the tutorial"...;-) >>>>>>> How many nfsd threads are you running on the server? - current >count >>>>>>> will be in /proc/fs/nfsd/threads >>>>>>> >>>>>>> James Pearson >>>>>> Hi James, >>>>>> >>>>>> Thanks for your answer. I've configured 24 threads (for 16 >hardware >>>>>> cores/ 32Threads on the NFS server with this processors) >>>>>> >>>>>> But it seams that there are buffer setup to modify too when >increasing >>>>>> the threads number... It is not done. >>>>>> >>>>>> Load average on the server is below 1.... >>>>> I'd be very careful with higher thread numbers than physical >cores. NFS >>>>> threads and so called CPU hyper/simultaneous threads are quite >different >>>>> things and it can hurt performance if not configured correctly. >>>>> >>>> So you suggest to limit the setup to 16 daemons ? I'll try this >evening. >>>> >>> Setting 16 daemons (the number of physical cores) do not solve this >>> problem. Moreover I saw a document (but old) provided by DELL to >>> optimize NFS servers performances in HPC context and they suggest to >>> use... 128 daemons on a dedicated poweredge server. :-\ >>> >>> I saw that it is always the same client showing the problem (a large >fat >>> node), may be I must investigate on the client side more than on the >>> serveur side. >>> >>> Patrick >>> >>> >>> >>> _______________________________________________ >>> CentOS mailing list >>> CentOS@xxxxxxxxxx <mailto:CentOS@xxxxxxxxxx> >>> https://lists.centos.org/mailman/listinfo/centos ><https://lists.centos.org/mailman/listinfo/centos> >> _______________________________________________ >> CentOS mailing list >> CentOS@xxxxxxxxxx >> https://lists.centos.org/mailman/listinfo/centos > > >_______________________________________________ >CentOS mailing list >CentOS@xxxxxxxxxx >https://lists.centos.org/mailman/listinfo/centos Hi , Why don't you leave the client negotiate the version itself ? pNFS requires at minimum - v4.1 and can bring extra performance. P.S.: According to the man pages 'vers' is : 'is an alternative to the nfsvers option. It is included for compatibility with other operating systems.' I was always using 'nfsvers' :). Best Regards, Strahil Nikolov _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx https://lists.centos.org/mailman/listinfo/centos