Hi Benjamin, Thanks for your reply. I suspected so but wanted to confirm. I tried block/scsi layout as well but that isn't working either. Based on an old thread on this list on "how to setup pnfs for block layout", I set up an iscsi target on the DS and the initiators on the MDS and client. All machines are running 4.15 kernel. Client as an iscsi initiator but does not mount the device. MDS also an initiator with an xfs filesystem mounted on the device. This mount point is exported as an nfs share from the MDS. The client mounts the exported MDS share. By default LAYOUT_SCSI is used but the GETDEVICEINFO call keeps failing with NFS4ERR_INVAL. As a result all reads/writes from the client are routed to the MDS instead of the DS. Anything wrong with my setup? More details below. Thanks, jrk ---------------- On MDS ---------------- # cat /proc/partitions major minor #blocks name 11 0 1048575 sr0 252 0 67108864 vda 252 1 66107392 vda1 252 2 1 vda2 252 5 998400 vda5 8 0 1048576 sda 8 1 1047552 sda1 <-- iscsi device root@ubuntu1804:/# mount | grep xfs /dev/sda1 on /sudosrv type xfs (rw,relatime,attr2,inode64,noquota) root@ubuntu1804:/# cat /etc/exports /sudosrv *(rw,sync,fsid=0,no_subtree_check,no_root_squash,pnfs) ------------------ On the client ------------------ root@ubuntu18_04_2:# cat /proc/partitions major minor #blocks name 11 0 1048575 sr0 252 0 67108864 vda 252 1 66107392 vda1 252 2 1 vda2 252 5 998400 vda5 8 0 1048576 sda 8 1 1047552 sda1 #mount 192.168.122.92:/ on /mnt type nfs4 (rw,relatime,vers=4.1,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.83,local_lock=none,addr=192.168.122.92) #cat /proc/self/mountstats device 192.168.122.92:/ mounted on /mnt with fstype nfs4 statvers=1.1 opts: rw,vers=4.1,rsize=524288,wsize=524288,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.83,local_lock=none age: 2290 impl_id: name='',domain='',date='0,0' caps: caps=0x3ffff,wtmult=512,dtsize=32768,bsize=0,namlen=255 nfsv4: bm0=0xfdffbfff,bm1=0x40f9be3e,bm2=0x803,acl=0x3,sessions,pnfs=LAYOUT_SCSI sec: flavor=1,pseudoflavor=1 Packet capture between the MDS and client <snip> 119 2019-03-06 22:55:07.868442 192.168.122.83 → 192.168.122.92 NFS 286 V4 Call LAYOUTGET 121 2019-03-06 22:55:07.876419 192.168.122.92 → 192.168.122.83 NFS 266 V4 Reply (Call In 119) LAYOUTGET 122 2019-03-06 22:55:07.876950 192.168.122.83 → 192.168.122.92 NFS 234 V4 Call GETDEVINFO 123 2019-03-06 22:55:07.877218 192.168.122.92 → 192.168.122.83 NFS 158 V4 Reply (Call In 122) GETDEVINFO Status: NFS4ERR_INVAL 124 2019-03-06 22:55:07.877445 192.168.122.83 → 192.168.122.92 NFS 282 V4 Call LAYOUTRETURN 126 2019-03-06 22:55:07.877539 192.168.122.83 → 192.168.122.92 NFS 1478 V4 Call WRITE StateID: 0xa6a8 Offset: 0 Len: 4096 128 2019-03-06 22:55:07.877746 192.168.122.92 → 192.168.122.83 NFS 170 V4 Reply (Call In 124) LAYOUTRETURN 129 2019-03-06 22:55:07.901481 192.168.122.92 → 192.168.122.83 NFS 246 V4 Reply (Call In 126) WRITE 131 2019-03-06 22:55:07.901702 192.168.122.83 → 192.168.122.92 NFS 266 V4 Call CLOSE StateID: 0xa305 132 2019-03-06 22:55:07.901902 192.168.122.92 → 192.168.122.83 NFS 246 V4 Reply (Call In 131) CLOSE </snip> #tshark -V -tad -n -r /tmp/iscsi-m-xfs.lcap frame.number == 122 Network File System, Ops(2): SEQUENCE, GETDEVINFO [Program Version: 4] [V4 Procedure: COMPOUND (1)] Tag: <EMPTY> length: 0 contents: <EMPTY> minorversion: 1 Operations (count: 2): SEQUENCE, GETDEVINFO Opcode: SEQUENCE (53) sessionid: 82bf805cbcdce1ef0c00000000000000 seqid: 0x00000004 slot id: 1 high slot id: 1 cache this?: No Opcode: GETDEVINFO (47) device ID: 01000000000000000000000000000000 layout type: LAYOUT4_SCSI (5) maxcount: 527904 notification bitmap: 6 [Main Opcode: GETDEVINFO (47)] #tshark -V -tad -n -r /tmp/iscsi-m-xfs.lcap frame.number == 123 Opcode: GETDEVINFO (47) Status: NFS4ERR_INVAL (22) On the client, nfs-blkmap is running and blocklayout driver is loaded root@ubuntu18_04_2:~# systemctl status nfs-blkmap * nfs-blkmap.service - pNFS block layout mapping daemon Loaded: loaded (/lib/systemd/system/nfs-blkmap.service; disabled; vendor preset: enabled) Active: active (running) since Wed 2019-03-06 22:41:54 PST; 1h 0min ago Main PID: 485 (blkmapd) Tasks: 1 (limit: 4695) CGroup: /system.slice/nfs-blkmap.service --485 /usr/sbin/blkmapd Mar 06 22:41:54 ubuntu18_04_2 blkmapd[485]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory Mar 06 22:41:54 ubuntu18_04_2 systemd[1]: Starting pNFS block layout mapping daemon... Mar 06 22:41:54 ubuntu18_04_2 systemd[1]: nfs-blkmap.service: Can't open PID file /run/blkmapd.pid (yet?) after start: No such file or di Mar 06 22:41:54 ubuntu18_04_2 systemd[1]: Started pNFS block layout mapping daemon. Mar 06 22:52:54 ubuntu18_04_2 blkmapd[485]: blocklayout pipe file created On Wed, 6 Mar 2019 at 21:31, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: > > Hi jrk, > > The upstream linux knfsd server currently only supports a very simple > flexfiles layout where the MDS and the DS are the same server, so > there's no way (yet) to configure knfsd to give out flexfiles layouts > that point to other DS servers. > > See commit 9b9960a0ca4773e21c4b153ed355583946346b25 in the linux git > repo for the work that implements this simple server. > > Ben > > On 6 Mar 2019, at 5:39, Kanika wrote: > > > Hi, > > I am trying to deploy pnfs using the mainstream-ed NFSv4 server/client > > available with 4.15 kernels. I have been able to setup the MDS and > > client using the flexfile layout. I can't find any documentation on > > how to inform MDS about the data servers. > > > > The setup I have so far: > > Metadata server and data server are running Ubuntu 18.04 > > Client is Centos 7.5 > > > > MDS (intended), ip: 192.168.122.92 > > root@ubuntu1804:~/#cat /etc/exports > > /sudosrv *(rw,sync,fsid=0,no_subtree_check,no_root_squash,pnfs) <-- > > PNFS set > > /sudosrv/share1 *(rw,sync,fsid=1,no_subtree_check,no_root_squash) > > > > Intended DS, ip: 192.168.122.83 > > root@ubuntu18_04_2:~#cat /etc/exports > > /srv/homes *(rw,sync,fsid=4,no_root_squash,no_subtree_check) > > > > Client, ip:192.168.122.5 (Centos 7.5) > > Mount the pseudo root filesystem exported by MDS > > > > #mount -t nfs -o v4.2 -o rw 192.168.122.92:/ /mnt/ > > #cat /proc/self/mountstats > > > > device nfsd mounted on /proc/fs/nfsd with fstype nfsd > > device 192.168.122.92:/ mounted on /mnt with fstype nfs4 statvers=1.1 > > opts: > > rw,vers=4.2,rsize=524288,wsize=524288,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.5,local_lock=none > > age: 127 > > impl_id: name='',domain='',date='0,0' > > caps: caps=0x1fbffdf,wtmult=512,dtsize=32768,bsize=0,namlen=255 > > nfsv4: > > bm0=0xfdffbfff,bm1=0x40f9be3e,bm2=0x20803,acl=0x3,sessions,pnfs=LAYOUT_FLEX_FILES > > <--- flex file layout > > sec: flavor=1,pseudoflavor=1 > > > > I tried 2 ways to link MDS and DS > > 1. Use the "refer" while exporting the DS share from MDS. This works > > as referrals are expected to work but not like "pnfs". > > 2. Mount the DS on MDS in a subdir of /sudosrv/share1. All IO > > operations go only to the MDS as if it were a local share. No exchange > > between the client and the DS. > > > > How can the DS be known to an MDS for a local filesystem like ext4? > > Any help will be appreciated. > > > > Thanks, > > jrk