Re: Issue running buffered writes to a pNFS (NFS 4.1 backed by SAN) filesystem.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20/05/15 20:40, J. Bruce Fields wrote:
On Wed, May 20, 2015 at 05:27:26PM +0100, Benjamin ESTRABAUD wrote:
On 15/05/15 20:20, J. Bruce Fields wrote:
On Fri, May 15, 2015 at 10:44:13AM -0700, Benjamin ESTRABAUD wrote:
I've been using pNFS for a while since recently, and I am very pleased
with its overall stability and performance.

A pNFS MDS server was setup with SAN storage in the backend (a RAID0
built ontop of multiple LUNs). Clients were given access to the same
RAID0 using the same LUNs on the same SAN.

However, I've been noticing a small issue with it that prevents me
>from using pNFS to its full potential: If I run non-direct IOs (for
instance "dd" without the "oflag=direct" option), IOs run excessively
slowly (3-4MB/sec) and the dd process hangs until forcefully
terminated.

Sorry for the late reply, I was unavailable for the past few days. I
had time to look at the problem further.

And that's reproduceable every time?


Hi Bruce,

Thanks for the detailed report.  Quick questions:

It is, and here is what is happening more in details:

on the client, "/mnt/pnfs1" is the "pNFS" mount point. We use NFS v 4.1.

* Running dd with bs=512 and no "direct" set on the client:

dd if=/dev/zero of=/mnt/pnfs1/testfile bs=512 count=100000000

=> Here we get variable performance, dd's average is 100MB/sec, and
we can see all the IOs going to the SAN block device. nfsstat
confirms that no IOs are going through the NFS server (no "writes"
are recorded, only "layoutcommit". Performance is maybe low but at
this block size we don't really care.

* Running dd with bs=512 and "direct" setL

dd if=/dev/zero of=/mnt/pnfs1/testfile bs=512 count=100000000 oflag=direct

=> Here, funnily enough, all the IOs are sent over NFS. The
"nfsstat" command shows writes increasing, the SAN block device
activity on the client is idle. The performance is about 13MB/sec,
but again expected with such a small IO size. The only unexpected is
that small 512bytes IOs are not going through the iSCSI SAN.

* Running dd with bs=1M and no "direct" set on the client:

dd if=/dev/zero of=/mnt/pnfs1/testfile bs=1M count=100000000

=> Here the IOs "work" and go through the SAN (no "write" counter
increasing in "nfsstat" and I can see disk statistics on the block
device on the client increasing). However the speed at which the IOs
go through is really slow (the actual speed recorded on the SAN
device fluctuates a lot, from 3MB/sec to a lot more). Overall dd is
not really happy and "Ctrl-C"ing it takes a long time, and in the
last try actually caused a kernel panic (see
http://imgur.com/YpXjvQ3 sorry about the picture format, did not
have the dmesg output capturing and had access to the VGA only).
When "dd" finally comes around and terminates, the average speed is
200MB/sec.
Again the SAN block device shows IOs being submitted and "nfsstat"
shows no "writes" but a few "layoutcommits", showing that the writes
are not going through the "regular" NFS server.


* Running dd with bs=1M and no "direct" set on the client:

I think you meant to leave out the "no" there?

Exactly, that's what I meant, sorry was confused.

dd if=/dev/zero of=/mnt/pnfs1/testfile bs=1M count=100000000 oflag=direct

=> Here the IOs work much faster (almost twice as fast as with
"direct" set, or 350+MB/sec) and dd is much more responsive (can
"Ctrl-C" it almost instantly). Again the SAN block device shows IOs
being submitted and "nfsstat" shows no "writes" but a few
"layoutcommits", showing that the writes are not going through the
"regular" NFS server.

This shows that somehow running with "oflag=direct" causes
unstability and lower performance, at least on this version.

And I think you mean "running without", not "running with"?

Assuming those are just typos, unless I'm missing something.

Also right, I meant that without oflag=direct I get lower performance. Well, actually, as my later mail shows, it does only for a specific file size. I'm going to be running more tests to narrow it down.

In the meantime I tried looking into network traces but couldn't capture nice traces as Wireshark was losing input. I'm running wireshark remotely, with the tcpdump input coming from a slow SSH session, so maybe I'll try and capture a few seconds worth of output, scp the file back to me and use that instead.

Ben.

--b.


Both clients are running Linux 4.1.0-rc2 on CentOS 7.0 and the
server is running Linux 4.1.0-rc2 on CentOS 7.1.

Can you get network captures and figure out (for example), whether the
slow writes are going over iSCSI or NFS, and if they're returning errors
in either case?

I'm going to do that now (try and locate errors). However "nfsstat"
does indicate that slower writes are going through iSCSI.

The same behaviour can be observed laying out an IO file
with FIO for instance, or using some applications which do not use the
ODIRECT flag. When using direct IO I can observe lots of iSCSI
traffic, at extremely good performance (same performance as the SAN
gets on "raw" block devices).

All the systems are running CentOS 7.0 with a custom kernel 4.1-rc2
(pNFS enabled) apart from the storage nodes which are running a custom
minimal Linux distro with Kernel 3.18.

The SAN is all 40G Mellanox Ethernet, and we are not using the OFED
driver anywhere (Everything is only "standard" upstream Linux).

What's the non-SAN network (that the NFS traffic goes over)?

The NFS traffic also goes through the same SAN actually, both the
iSCSI LUNs and the NFS server are accessible over the same 40G/sec
Ethernet fabric.

Regards,
Ben.

--b.


Would anybody have any ideas where this issue could be coming from?

Regards, Ben - MPSTOR.-- To unsubscribe from this list: send the line
"unsubscribe linux-nfs" in the body of a message to
majordomo@xxxxxxxxxxxxxxx More majordomo info at
http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux