Re: Adventures in NFS re-exporting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



----- On 12 Nov, 2020, at 13:57, bfields bfields@xxxxxxxxxxxx wrote:
> On Thu, Nov 12, 2020 at 01:01:24PM +0000, Daire Byrne wrote:
>> 
>> Having just completed a bunch of fresh cloud rendering with v5.9.1 and Trond's
>> NFSv3 lookupp emulation patches, I can now revise my original list of issues
>> that others will likely experience if they ever try to do this craziness:
>> 
>> 1) Don't re-export NFSv4.0 unless you set vfs_cache_presure=0 otherwise you will
>> see random input/output errors on your clients when things are dropped out of
>> the cache. In the end we gave up on using NFSv4.0 with our Netapps because the
>> 7-mode implementation seemed a bit flakey with modern Linux clients (Linux
>> NFSv4.2 servers on the other hand have been rock solid). We now use NFSv3 with
>> Trond's lookupp emulation patches instead.
> 
> So,
> 
>		NFSv4.2			  NFSv4.2
>	client --------> re-export server -------> original server
> 
> works as long as both servers are recent Linux, but when the original
> server is Netapp, you need the protocol used in both places to be v3, is
> that right?

Well, yes NFSv4.2 all the way through works well for us but it's re-exporting a NFSv4.0 server (Linux OR Netapp) that seems to still show the input/output errors when dropping caches. Every other possible combination now seems to be working without ESTALE or input/errors with the lookupp emulation patches.

So this is still not working when dropping caches on the re-export server:

		NFSv3/4.x			  NFSv4.0
	client --------> re-export server -------> original server

The bit specific to the Netapp is simply that our 7-mode only supports NFSv4.0 so I can't actually test NFSv4.1/4.2 on a more modern Netapp firmware release. So I have to use NFSv3 to mount the Netapp and can then happily re-export that using NFSv4.x or NFSv3 (if the filehandles fit in 63 bytes).

>> 2) In order to better utilise the re-export server's client cache when
>> re-exporting an NFSv3 server (using either NFSv3 or NFSv4), we still need to
>> use the horrible inode_peek_iversion_raw hack to maintain good metadata
>> performance for large numbers of clients. Otherwise each re-export server's
>> clients can cause invalidation of the re-export server client cache. Once you
>> have hundreds of clients they all combine to constantly invalidate the cache
>> resulting in an order of magnitude slower metadata performance. If you are
>> re-exporting an NFSv4.x server (with either NFSv3 or NFSv4.x) this hack is not
>> required.
> 
> Have we figured out why that's required, or found a longer-term
> solution?  (Apologies, the memory of the earlier conversation is
> fading....)

There was some discussion about NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR allowing for the hack/optimisation but I guess that is only for the case when re-exporting NFSv4 to the eventual clients. It would not help if you were re-exporting an NFSv3 server with NFSv3 to the clients? I lack the deeper understanding to say anything more than that.

In our case we re-export everything to the clients using NFSv4.2 whether the originating server is NFSv3 (e.g our Netapp) or NFSv4.2 (our RHEL7 storage servers).

With NFSv4.2 as the originating server, we found that either this hack/optimsation was not required or the incidence rate of invalidating the re-export server's client cache was much less as to not cause significant performance problems when many clients requested the same metadata.

Daire



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux