RE: NFSv4 vs NFSv3 with MPICH2-1.4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just a quick follow-up...I was wondering if anyone had the chance to take a look
at the tcpdump I sent to a few of you last week.

If anyone else on the list wants to take a look, please let me know, and I will
send you the link privately.

Thanks,
Greg

Quoting Gregory Magoon <gmagoon@xxxxxxx>:

Thanks all for the feedback and sorry for the delay...one of our HDDs failed on
Saturday, so I had to take care of that.

Because I don't want to interrupt a working system, it will not be convenient
for me to try the "no delegations" option that has been suggested.

I was however, able to grab a hold of a temporarily free node (temporarily
returned to NFSv4 configuration) to capture the tcp traffic. I have sent a
short (< 1 sec) snapshot captured during (I believe) the allred3 mpich2 test. I have privately sent you a link to the file. Hopefully the issue will be obvious from this (e.g. you will immediately see that I am doing something I shouldn't be doing). If a longer snapshot started before the tests would be useful, I can
get that too.

I had posted on the mpich mailing list before I came here (
http://lists.mcs.anl.gov/pipermail/mpich-discuss/2011-July/010432.html ) and
unfortunately they weren't able to provide any insights.

Thanks again,
Greg


Quoting "Loewe, Bill" <bloewe@xxxxxxxxxxx>:

Hi Greg,

IOR is independent of MPICH2, but does require MPI for process coordination. By default, IOR will use the "-a POSIX" option for standard POSIX I/O -- open(), write(), close(), etc.

In addition, IOR can use the MPI-IO library calls (MPI_File_open(), etc.) to perform I/O.

For the build process of MPICH2 "make tests" exercises this MPI-IO (ROMIO) interface which uses an ADIO (Abstract-Device Interface for I/O) layer. ADIO can interface to different file systems (NFS, PanFS, PVFS2, Lustre, e.g.).

The errors you're encountering in "make tests" for MPICH2 do not appear to be testing the I/O, however, but seem to be an issue with the launcher for the tests in general. I agree with Boaz that it may make sense to follow up with the MPICH developers for this. Under their main page (http://www.mcs.anl.gov/research/projects/mpich2/) they have a support pulldown with FAQ and a mailing list. They may be able to help resolve this for you.

Thanks,

--Bill.

-----Original Message-----
From: Harrosh, Boaz
Sent: Friday, July 29, 2011 8:20 PM
To: Gregory Magoon
Cc: Trond Myklebust; linux-nfs@xxxxxxxxxxxxxxx; J. Bruce Fields; Loewe, Bill
Subject: Re: NFSv4 vs NFSv3 with MPICH2-1.4

On 07/28/2011 04:15 PM, Gregory Magoon wrote:
Unfortunately, I'm not familiar enough with MPICH2 to have an idea about
significant changes between version 1.3 and 1.4, but other evidence suggests that the version is not the issue and that I would have the same problem with
v1.3.

I'm using the MPICH2 test suite invoked by "make testing" (see below
for initial
output).

I'm using the nfs-kernel-server and nfs-common Ubuntu packages (natty
release).


You have not answered the most important question:
Also are you using the builtin nfs-client driver or the POSIX interface?

Which I'll assume means you don't know. So I'll try to elaborate. Just for
background, I've never used "make tests" before all I used was IOR & mdtest.

Now if you print the usage string for IOR you get this option:

	-a S  api --  API for I/O [POSIX|MPIIO|HDF5|NCMPI]

I'm not familiar with the code but what I understand is only "-a POSIX" will actually use the regular Kernel VFS interface for read/writing of files. The other options have different drivers for different protocols. I do not know first hand, but I once heard in a conference that -a MPIIO has a special NFS driver that uses better NFS semantics and avoids the POSIX semantics which are bad for big cluster performance. All this is speculations and rumors on my part, and you will need to consult with the
mpich guys.

Now I can imagine that a "make tests" would try all possible combinations of "-a S" So you'll need to dig out what is the falling test and is it really using the Kernel NFS driver at that point. (I bet if you do a tcpdump like Bruce said the guys here will
be able to see if this is a Linux NFS or not)

I CC: Bill Loewe that might know much more then me about this subject. And please do speak with the MPICH people (But keep us in the loop it is interesting to know)

Thanks
Boaz

Thanks,
Greg

user@node01:~/Molpro/src/mpich2-1.4$ make testing
(cd test && make testing)
make[1]: Entering directory `/home/user/Molpro/src/mpich2-1.4/test'
(NOXMLCLOSE=YES && export NOXMLCLOSE && cd mpi && make testing)
make[2]: Entering directory `/home/user/Molpro/src/mpich2-1.4/test/mpi'
./runtests -srcdir=. -tests=testlist \
-mpiexec=/home/user/Molpro/src/mpich2-install/bin/mpiexec \
                    -xmlfile=summary.xml
Looking in ./testlist
Processing directory attr
Looking in ./attr/testlist
Processing directory coll
Looking in ./coll/testlist
Unexpected output in allred: [mpiexec@node01] APPLICATION TIMED OUT
Unexpected output in allred: [proxy:0:0@node01] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
Unexpected output in allred: [proxy:0:0@node01] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
Unexpected output in allred: [proxy:0:0@node01] main
(./pm/pmiserv/pmip.c:226):
demux engine error waiting for event
Unexpected output in allred: [mpiexec@node01] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated
badly; aborting
Unexpected output in allred: [mpiexec@node01] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
completion
Unexpected output in allred: [mpiexec@node01] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:189): launcher returned error waiting for
completion
Unexpected output in allred: [mpiexec@node01] main (./ui/mpich/mpiexec.c:397):
process manager error waiting for completion
Program allred exited without No Errors


Hi Gregory

We are using MPICH2-1.3.1 and the IOR mpich test. as well as the mdtest
test. And have had no issues so far with nfsv4 nfsv4.1 and pnfs. In fact
this is our standard performance test.

What tests are you using?
Do you know of any major changes between MPICH2-1.3.1 and MPICH2-1.4?
Also are you using the builtin nfs-client driver or the POSIX interface?

Boaz







--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux