Re: Problems with NFS4.1 on ESXi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 22, 2022 at 8:59 AM J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
>
> On Thu, Apr 21, 2022 at 12:40:49PM -0400, bfields wrote:
> > On Thu, Apr 21, 2022 at 03:30:19PM +0000, crispyduck@xxxxxxxxxx wrote:
> > > Thanks. From VMWare side nobody will help here as this is not supported. They support NFS4.1, but officially only from some storage vendors.
> > >
> > > I had it running in the past on FreeBSD, where I also some problems in the beginning  (RECLAIM_COMPLETE) and Rick Macklem helped to figure out the problem and fixed it with some patches that should now be part of FreeBSD.
> > >
> > > I plan to use it with ZFS, but also tested it on ext4, with exact same behavior.
> > >
> > > NFS3 works fine, NFS4.1 seems to work fine, except the described problems.
> > >
> > > The reason for NFS4.1 is session trunking, which gives really awesome speeds when using multiple NICs/subnets. Comparable to ISCSI.
> > > ANFS4.1 based storage for ESXi and other Hypervisors.
> > >
> > > The test is also done without session trunking.
> > >
> > > This needs NFS expertise, no idea where else i could ask to have a look on the traces.
> >
> > Stale filehandles aren't normal, and suggest some bug or
> > misconfiguration on the server side, either in NFS or the exported
> > filesystem.
>
> Actually, I should take that back: if one client removes files while a
> second client is using them, it'd be normal for applications on that
> second client to see ESTALE.

I looked at the traces and they looked OK to me. The ESTALE was from
the vmware client sending a RENAME onto a file that was opened
previously and then sending a CLOSE on that filehandle which resulted
in ESTALE. So something like this:
OPEN (foobar)
RENAME (something else, foobar)
CLOSE (foobar) leads to ESTALE

I agree with Chuck's suggestion which was to ask vmware support.

> So it might be interesting to know what actually happens when VM
> templates are imported.
>
> I suppose you could also try NFSv4.0 or try varying kernel versions to
> try to narrow down the problem.
>
> No easy ideas off the top of my head, sorry.
>
> --b.
>
> > Figuring out more than that would require more
> > investigation.
> >
> > --b.
> >
> > >
> > > Br,
> > > Andi
> > >
> > >
> > >
> > >
> > >
> > >
> > > Von: Chuck Lever III <chuck.lever@xxxxxxxxxx>
> > > Gesendet: Donnerstag, 21. April 2022 16:58
> > > An: Andreas Nagy <crispyduck@xxxxxxxxxx>
> > > Cc: Linux NFS Mailing List <linux-nfs@xxxxxxxxxxxxxxx>
> > > Betreff: Re: Problems with NFS4.1 on ESXi
> > >
> > > Hi Andreas-
> > >
> > > > On Apr 21, 2022, at 12:55 AM, Andreas Nagy <crispyduck@xxxxxxxxxx> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I hope this mailing list is the right place to discuss some problems with nfs4.1.
> > >
> > > Well, yes and no. This is an upstream developer mailing list,
> > > not really for user support.
> > >
> > > You seem to be asking about products that are currently supported,
> > > and I'm not sure if the Debian kernel is stock upstream 5.13 or
> > > something else. ZFS is not an upstream Linux filesystem and the
> > > ESXi NFS client is something we have little to no experience with.
> > >
> > > I recommend contacting the support desk for your products. If
> > > they find a specific problem with the Linux NFS server's
> > > implementation of the NFSv4.1 protocol, then come back here.
> > >
> > >
> > > > Switching from FreeBSD host as NFS server to a Proxmox environment also serving NFS I see some strange issues in combination with VMWare ESXi.
> > > >
> > > > After first thinking it works fine, I started to realize that there are problems with ESXi datastores on NFS4.1 when trying to import VMs (OVF).
> > > >
> > > > Importing ESXi OVF VM Templates fails nearly every time with a ESXi error message "postNFCData failed: Not Found". With NFS3 it is working fine.
> > > >
> > > > NFS server is running on a Proxmox host:
> > > >
> > > >  root@sepp-sto-01:~# hostnamectl
> > > >  Static hostname: sepp-sto-01
> > > >  Icon name: computer-server
> > > >  Chassis: server
> > > >  Machine ID: 028da2386e514db19a3793d876fadf12
> > > >  Boot ID: c5130c8524c64bc38994f6cdd170d9fd
> > > >  Operating System: Debian GNU/Linux 11 (bullseye)
> > > >  Kernel: Linux 5.13.19-4-pve
> > > >  Architecture: x86-64
> > > >
> > > >
> > > > File system is ZFS, but also tried it with others and it is the same behaivour.
> > > >
> > > >
> > > > ESXi version 7.2U3
> > > >
> > > > ESXi vmkernel.log:
> > > > 2022-04-19T17:46:38.933Z cpu0:262261)cswitch: L2Sec_EnforcePortCompliance:209: [nsx@6876 comp="nsx-esx" subcomp="vswitch"]client vmk1 requested promiscuous mode on port 0x4000010, disallowed by vswitch policy
> > > > 2022-04-19T17:46:40.897Z cpu10:266351 opID=936118c3)World: 12075: VC opID esxui-d6ab-f678 maps to vmkernel opID 936118c3
> > > > 2022-04-19T17:46:40.897Z cpu10:266351 opID=936118c3)WARNING: NFS41: NFS41FileDoCloseFile:3128: file handle close on obj 0x4303fce02850 failed: Stale file handle
> > > > 2022-04-19T17:46:40.897Z cpu10:266351 opID=936118c3)WARNING: NFS41: NFS41FileOpCloseFile:3718: NFS41FileCloseFile failed: Stale file handle
> > > > 2022-04-19T17:46:41.164Z cpu4:266351 opID=936118c3)WARNING: NFS41: NFS41FileDoCloseFile:3128: file handle close on obj 0x4303fcdaa000 failed: Stale file handle
> > > > 2022-04-19T17:46:41.164Z cpu4:266351 opID=936118c3)WARNING: NFS41: NFS41FileOpCloseFile:3718: NFS41FileCloseFile failed: Stale file handle
> > > > 2022-04-19T17:47:25.166Z cpu18:262376)ScsiVmas: 1074: Inquiry for VPD page 00 to device mpx.vmhba32:C0:T0:L0 failed with error Not supported
> > > > 2022-04-19T17:47:25.167Z cpu18:262375)StorageDevice: 7059: End path evaluation for device mpx.vmhba32:C0:T0:L0
> > > > 2022-04-19T17:47:30.645Z cpu4:264565 opID=9529ace7)World: 12075: VC opID esxui-6787-f694 maps to vmkernel opID 9529ace7
> > > > 2022-04-19T17:47:30.645Z cpu4:264565 opID=9529ace7)VmMemXfer: vm 264565: 2465: Evicting VM with path:/vmfs/volumes/9f10677f-697882ed-0000-000000000000/test-ovf/test-ovf.vmx
> > > > 2022-04-19T17:47:30.645Z cpu4:264565 opID=9529ace7)VmMemXfer: 209: Creating crypto hash
> > > > 2022-04-19T17:47:30.645Z cpu4:264565 opID=9529ace7)VmMemXfer: vm 264565: 2479: Could not find MemXferFS region for /vmfs/volumes/9f10677f-697882ed-0000-000000000000/test-ovf/test-ovf.vmx
> > > > 2022-04-19T17:47:30.693Z cpu4:264565 opID=9529ace7)VmMemXfer: vm 264565: 2465: Evicting VM with path:/vmfs/volumes/9f10677f-697882ed-0000-000000000000/test-ovf/test-ovf.vmx
> > > > 2022-04-19T17:47:30.693Z cpu4:264565 opID=9529ace7)VmMemXfer: 209: Creating crypto hash
> > > > 2022-04-19T17:47:30.693Z cpu4:264565 opID=9529ace7)VmMemXfer: vm 264565: 2479: Could not find MemXferFS region for /vmfs/volumes/9f10677f-697882ed-0000-000000000000/test-ovf/test-ovf.vmx
> > > >
> > > > tcpdump taken on the esxi with filter on the nfs server ip is attached here:
> > > > https://easyupload.io/xvtpt1
> > > >
> > > > I tried to analyze, but have no idea what exactly the problem is. Maybe it is some issue with the VMWare implementation?
> > > > Would be nice if someone with better NFS knowledge could have a look on the traces.
> > > >
> > > > Best regards,
> > > > cd
> > >
> > > --
> > > Chuck Lever
> > >



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux