I looked and both the Rename and Close are done on slotid 0, so the client does do them in that order. Also, I should mention that this bug may not be what causes crispyduck's problem. (It could result in an accumulation of Opens on the server, I think?) rick ________________________________________ From: Rick Macklem <rmacklem@xxxxxxxxxxx> Sent: Thursday, April 21, 2022 7:52 PM To: J. Bruce Fields; crispyduck@xxxxxxxxxx Cc: Chuck Lever III; Linux NFS Mailing List Subject: Re: Problems with NFS4.1 on ESXi J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: [stuff snipped] > On Thu, Apr 21, 2022 at 12:40:49PM -0400, bfields wrote: > > > > > > Stale filehandles aren't normal, and suggest some bug or > > misconfiguration on the server side, either in NFS or the exported > > filesystem. > > Actually, I should take that back: if one client removes files while a > second client is using them, it'd be normal for applications on that > second client to see ESTALE. I took a look at crispyduck's packet trace and here's what I saw: Packet# 48 Lookup of test-ovf.vmx 49 NFS_OK FH is 0x7c9ce14b (the hash) ... 51 Open Claim_FH fo 0x7c9ce14b 52 NFS_OK Open Stateid 0x35be ... 138 Rename test-ovf.vmx~ to test-ovf.vmx 139 NFS_OK ... 141 Close with PutFH 0x7c9ce14b 142 NFS4ERR_STALE for the PutFH So, it seems that the Rename will delete the file (names another file to the same name "test-ovf.vmx". Then the subsequent Close's PutFH fails, because the file for the FH has been deleted. Looks like yet another ESXi client bug to me? (I've seen assorted other ones, but not this one. I have no idea how this might work on a FreeBSD server. I can only assume the RPC sequence ends up different for FreeBSD for some reason? Maybe the Close gets processed before the Rename? I didn't look at the Sequence args for these RPCs to see if they use different slots.) > So it might be interesting to know what actually happens when VM > templates are imported. If you look at the packet trace, somewhat weird, like most things for this client. It does a Lookup of the same file name over and over again, for example. > I suppose you could also try NFSv4.0 or try varying kernel versions to > try to narrow down the problem. I think it only does NFSv4.1. I've tried to contact the VMware engineers, but never had any luck. I wish they'd show up at a bakeathon, but... > No easy ideas off the top of my head, sorry. I once posted a list of problems I had found with ESXi 6.5 to a FreeBSD mailing list and someone who worked for VMware cut/pasted it into their problem database. They responded to him with "might be fixed in a future release" and, indeed, they were fixed in ESXi 6.7, so if you can get this to them, they might fix it? rick --b. > Figuring out more than that would require more > investigation. > > --b. > > > > > Br, > > Andi > > > > > > > > > > > > > > Von: Chuck Lever III <chuck.lever@xxxxxxxxxx> > > Gesendet: Donnerstag, 21. April 2022 16:58 > > An: Andreas Nagy <crispyduck@xxxxxxxxxx> > > Cc: Linux NFS Mailing List <linux-nfs@xxxxxxxxxxxxxxx> > > Betreff: Re: Problems with NFS4.1 on ESXi > > > > Hi Andreas- > > > > > On Apr 21, 2022, at 12:55 AM, Andreas Nagy <crispyduck@xxxxxxxxxx> wrote: > > > > > > Hi, > > > > > > I hope this mailing list is the right place to discuss some problems with nfs4.1. > > > > Well, yes and no. This is an upstream developer mailing list, > > not really for user support. > > > > You seem to be asking about products that are currently supported, > > and I'm not sure if the Debian kernel is stock upstream 5.13 or > > something else. ZFS is not an upstream Linux filesystem and the > > ESXi NFS client is something we have little to no experience with. > > > > I recommend contacting the support desk for your products. If > > they find a specific problem with the Linux NFS server's > > implementation of the NFSv4.1 protocol, then come back here. > > > > > > > Switching from FreeBSD host as NFS server to a Proxmox environment also serving NFS I see some strange issues in combination with VMWare ESXi. > > > > > > After first thinking it works fine, I started to realize that there are problems with ESXi datastores on NFS4.1 when trying to import VMs (OVF). > > > > > > Importing ESXi OVF VM Templates fails nearly every time with a ESXi error message "postNFCData failed: Not Found". With NFS3 it is working fine. > > > > > > NFS server is running on a Proxmox host: > > > > > > root@sepp-sto-01:~# hostnamectl > > > Static hostname: sepp-sto-01 > > > Icon name: computer-server > > > Chassis: server > > > Machine ID: 028da2386e514db19a3793d876fadf12 > > > Boot ID: c5130c8524c64bc38994f6cdd170d9fd > > > Operating System: Debian GNU/Linux 11 (bullseye) > > > Kernel: Linux 5.13.19-4-pve > > > Architecture: x86-64 > > > > > > > > > File system is ZFS, but also tried it with others and it is the same behaivour. > > > > > > > > > ESXi version 7.2U3 > > > > > > ESXi vmkernel.log: > > > 2022-04-19T17:46:38.933Z cpu0:262261)cswitch: L2Sec_EnforcePortCompliance:209: [nsx@6876 comp="nsx-esx" subcomp="vswitch"]client vmk1 requested promiscuous mode on port 0x4000010, disallowed by vswitch policy > > > 2022-04-19T17:46:40.897Z cpu10:266351 opID=936118c3)World: 12075: VC opID esxui-d6ab-f678 maps to vmkernel opID 936118c3 > > > 2022-04-19T17:46:40.897Z cpu10:266351 opID=936118c3)WARNING: NFS41: NFS41FileDoCloseFile:3128: file handle close on obj 0x4303fce02850 failed: Stale file handle > > > 2022-04-19T17:46:40.897Z cpu10:266351 opID=936118c3)WARNING: NFS41: NFS41FileOpCloseFile:3718: NFS41FileCloseFile failed: Stale file handle > > > 2022-04-19T17:46:41.164Z cpu4:266351 opID=936118c3)WARNING: NFS41: NFS41FileDoCloseFile:3128: file handle close on obj 0x4303fcdaa000 failed: Stale file handle > > > 2022-04-19T17:46:41.164Z cpu4:266351 opID=936118c3)WARNING: NFS41: NFS41FileOpCloseFile:3718: NFS41FileCloseFile failed: Stale file handle > > > 2022-04-19T17:47:25.166Z cpu18:262376)ScsiVmas: 1074: Inquiry for VPD page 00 to device mpx.vmhba32:C0:T0:L0 failed with error Not supported > > > 2022-04-19T17:47:25.167Z cpu18:262375)StorageDevice: 7059: End path evaluation for device mpx.vmhba32:C0:T0:L0 > > > 2022-04-19T17:47:30.645Z cpu4:264565 opID=9529ace7)World: 12075: VC opID esxui-6787-f694 maps to vmkernel opID 9529ace7 > > > 2022-04-19T17:47:30.645Z cpu4:264565 opID=9529ace7)VmMemXfer: vm 264565: 2465: Evicting VM with path:/vmfs/volumes/9f10677f-697882ed-0000-000000000000/test-ovf/test-ovf.vmx > > > 2022-04-19T17:47:30.645Z cpu4:264565 opID=9529ace7)VmMemXfer: 209: Creating crypto hash > > > 2022-04-19T17:47:30.645Z cpu4:264565 opID=9529ace7)VmMemXfer: vm 264565: 2479: Could not find MemXferFS region for /vmfs/volumes/9f10677f-697882ed-0000-000000000000/test-ovf/test-ovf.vmx > > > 2022-04-19T17:47:30.693Z cpu4:264565 opID=9529ace7)VmMemXfer: vm 264565: 2465: Evicting VM with path:/vmfs/volumes/9f10677f-697882ed-0000-000000000000/test-ovf/test-ovf.vmx > > > 2022-04-19T17:47:30.693Z cpu4:264565 opID=9529ace7)VmMemXfer: 209: Creating crypto hash > > > 2022-04-19T17:47:30.693Z cpu4:264565 opID=9529ace7)VmMemXfer: vm 264565: 2479: Could not find MemXferFS region for /vmfs/volumes/9f10677f-697882ed-0000-000000000000/test-ovf/test-ovf.vmx > > > > > > tcpdump taken on the esxi with filter on the nfs server ip is attached here: > > > https://easyupload.io/xvtpt1 > > > > > > I tried to analyze, but have no idea what exactly the problem is. Maybe it is some issue with the VMWare implementation? > > > Would be nice if someone with better NFS knowledge could have a look on the traces. > > > > > > Best regards, > > > cd > > > > -- > > Chuck Lever > >