On Wed, May 20, 2015 at 05:27:26PM +0100, Benjamin ESTRABAUD wrote: > On 15/05/15 20:20, J. Bruce Fields wrote: > >On Fri, May 15, 2015 at 10:44:13AM -0700, Benjamin ESTRABAUD wrote: > >>I've been using pNFS for a while since recently, and I am very pleased > >>with its overall stability and performance. > >> > >>A pNFS MDS server was setup with SAN storage in the backend (a RAID0 > >>built ontop of multiple LUNs). Clients were given access to the same > >>RAID0 using the same LUNs on the same SAN. > >> > >>However, I've been noticing a small issue with it that prevents me > >>from using pNFS to its full potential: If I run non-direct IOs (for > >>instance "dd" without the "oflag=direct" option), IOs run excessively > >>slowly (3-4MB/sec) and the dd process hangs until forcefully > >>terminated. > > > Sorry for the late reply, I was unavailable for the past few days. I > had time to look at the problem further. > > >And that's reproduceable every time? > > Thanks for the detailed report. Quick questions: > It is, and here is what is happening more in details: > > on the client, "/mnt/pnfs1" is the "pNFS" mount point. We use NFS v 4.1. > > * Running dd with bs=512 and no "direct" set on the client: > > dd if=/dev/zero of=/mnt/pnfs1/testfile bs=512 count=100000000 > > => Here we get variable performance, dd's average is 100MB/sec, and > we can see all the IOs going to the SAN block device. nfsstat > confirms that no IOs are going through the NFS server (no "writes" > are recorded, only "layoutcommit". Performance is maybe low but at > this block size we don't really care. > > * Running dd with bs=512 and "direct" setL > > dd if=/dev/zero of=/mnt/pnfs1/testfile bs=512 count=100000000 oflag=direct > > => Here, funnily enough, all the IOs are sent over NFS. The > "nfsstat" command shows writes increasing, the SAN block device > activity on the client is idle. The performance is about 13MB/sec, > but again expected with such a small IO size. The only unexpected is > that small 512bytes IOs are not going through the iSCSI SAN. > > * Running dd with bs=1M and no "direct" set on the client: > > dd if=/dev/zero of=/mnt/pnfs1/testfile bs=1M count=100000000 > > => Here the IOs "work" and go through the SAN (no "write" counter > increasing in "nfsstat" and I can see disk statistics on the block > device on the client increasing). However the speed at which the IOs > go through is really slow (the actual speed recorded on the SAN > device fluctuates a lot, from 3MB/sec to a lot more). Overall dd is > not really happy and "Ctrl-C"ing it takes a long time, and in the > last try actually caused a kernel panic (see > http://imgur.com/YpXjvQ3 sorry about the picture format, did not > have the dmesg output capturing and had access to the VGA only). > When "dd" finally comes around and terminates, the average speed is > 200MB/sec. > Again the SAN block device shows IOs being submitted and "nfsstat" > shows no "writes" but a few "layoutcommits", showing that the writes > are not going through the "regular" NFS server. > > > * Running dd with bs=1M and no "direct" set on the client: I think you meant to leave out the "no" there? > dd if=/dev/zero of=/mnt/pnfs1/testfile bs=1M count=100000000 oflag=direct > > => Here the IOs work much faster (almost twice as fast as with > "direct" set, or 350+MB/sec) and dd is much more responsive (can > "Ctrl-C" it almost instantly). Again the SAN block device shows IOs > being submitted and "nfsstat" shows no "writes" but a few > "layoutcommits", showing that the writes are not going through the > "regular" NFS server. > > This shows that somehow running with "oflag=direct" causes > unstability and lower performance, at least on this version. And I think you mean "running without", not "running with"? Assuming those are just typos, unless I'm missing something. --b. > > Both clients are running Linux 4.1.0-rc2 on CentOS 7.0 and the > server is running Linux 4.1.0-rc2 on CentOS 7.1. > > >Can you get network captures and figure out (for example), whether the > >slow writes are going over iSCSI or NFS, and if they're returning errors > >in either case? > > > I'm going to do that now (try and locate errors). However "nfsstat" > does indicate that slower writes are going through iSCSI. > > >>The same behaviour can be observed laying out an IO file > >>with FIO for instance, or using some applications which do not use the > >>ODIRECT flag. When using direct IO I can observe lots of iSCSI > >>traffic, at extremely good performance (same performance as the SAN > >>gets on "raw" block devices). > >> > >>All the systems are running CentOS 7.0 with a custom kernel 4.1-rc2 > >>(pNFS enabled) apart from the storage nodes which are running a custom > >>minimal Linux distro with Kernel 3.18. > >> > >>The SAN is all 40G Mellanox Ethernet, and we are not using the OFED > >>driver anywhere (Everything is only "standard" upstream Linux). > > > >What's the non-SAN network (that the NFS traffic goes over)? > > > The NFS traffic also goes through the same SAN actually, both the > iSCSI LUNs and the NFS server are accessible over the same 40G/sec > Ethernet fabric. > > Regards, > Ben. > > >--b. > > > >> > >>Would anybody have any ideas where this issue could be coming from? > >> > >>Regards, Ben - MPSTOR.-- To unsubscribe from this list: send the line > >>"unsubscribe linux-nfs" in the body of a message to > >>majordomo@xxxxxxxxxxxxxxx More majordomo info at > >>http://vger.kernel.org/majordomo-info.html > >-- > >To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > >the body of a message to majordomo@xxxxxxxxxxxxxxx > >More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html