[Bug 676879] Review Request: mpiexec - MPI job launcher that uses the PBS task interface directly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Please do not reply directly to this email. All additional
comments should be made in the comments box of this bug.


https://bugzilla.redhat.com/show_bug.cgi?id=676879

Doug Ledford <dledford@xxxxxxxxxx> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dledford@xxxxxxxxxx,
                   |                            |fenlason@xxxxxxxxxx

--- Comment #11 from Doug Ledford <dledford@xxxxxxxxxx> 2011-03-16 11:53:28 EDT ---
Sorry to take so long, but sometimes you're just busy.

So, here's what I see from reading up on the mpiexec website.

First, this doesn't necessarily work with lots of batch systems.  It works with
OpenPBS, PBS Pro, and Torque.  However, these are all three forks of OpenPBS
where OpenPBS appears to be dead, PBS Pro is (I guess) some group taking
OpenPBS and delivering ongoing service and support around it, and Torque is an
actively developed fork of OpenPBS.

Second, it's not intended to be multi-MPI friendly.  Not really anyway.  I know
the web page talks about a lot of MPI packages, but in truth, there are only a
few MPI families, with lots of forks along those families.  There is the mpich
family, which includes mpich, mpich2, mvapich, mvapich2, intel mpi, etc.  Then
there is the lam family, which includes lam and openmpi.  I don't know of any
other open source mpi families that are still alive.  There might be other
closed source mpis out there, but we don't care about those.  In any case, the
mpiexec website basically calls out that lam/openmpi get things right on their
own so there is no need to use mpiexec there and recommends that you don't
(side note: this does not surprise me in the least, the mpich family of job
starting daemons has always been nothing more than a bunch of scripts calling
rsh, hardly what I would call robust or well designed, more like a quick and
dirty job to get things running in the early days and then they never went back
and did things right later).

So, for all the talk on the web site about how many mpis this supports, and how
many PBSes this supports, it really only supports one mpi family and one pbs
family.  Given that, this *absolutely* does not belong in the main path.  I
know that's already been fixed, but I'm putting this here so that someone
doesn't get the idea in the future to undo that fix.

Now, what's more, is I'm not entirely certain that this will work transparently
with different mpich mpis from a single build.  You have to specify the default
communication method in the configure script, as well as a few other options. 
I haven't looked into it, but I know that mvapich and mvapich2 are enough
different from mpich and mpich2 that I'm not certain that the same build
parameters will work with both.  If it doesn't, then you might have to build it
more than once in the spec file with different options, create an mpiexec base
package, and then mpiexec-%{mpiname} sub packages that have the files specific
to that particular mpi implementation.

As an example, if needed, you could create an mpiexec shell script and place it
in the directory specified in your environment modules file.  This shell script
could then execute %{mpidir}/bin/mpiexec-pbs.  You would then place the mpiexec
binary this build process spits out into %{mpidir}/bin/mpiexec-pbs for each of
the mpis you intend to support and place the files into subpackages of mpiexec
specific to each mpi implementation.  That way, if you need different options
for different mpis, it can be done.  However, as I haven't tried to use this
program, I don't know if this is even necessary.  Before the package goes into
Fedora as is though, this needs to be tested.  It's much easier to fix this
before it hits repos than it is after.

-- 
Configure bugmail: https://bugzilla.redhat.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
_______________________________________________
package-review mailing list
package-review@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/package-review


[Index of Archives]     [Fedora Legacy]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]