On Wed, 2009-01-07 at 16:51 -0800, Toshio Kuratomi wrote: > Doug Ledford wrote: > > On Wed, 2009-01-07 at 14:48 -0800, Toshio Kuratomi wrote: > >> It depends on how you interpret the FHS, I suppose. In the old > >> packages, the config files are in /etc, the arch independent data (help > >> files) are in a subdir or /usr/share/openmpi/, and most of the > >> arch-specific files are under /usr/lib/openmpi/. This satisfies the > >> overarching goal of the FHS, separation of sharable and unsharable data. > >> it also satisfies the goal of separating arch specific and arch > >> independent files. > >> > >> The question is whether the binaries can go there or have to go in > >> /usr/bin and whether the libraries can go there or must go directly in > >> /usr/lib. For the libraries, we often put private libraries in a > >> subdirectory of /usr/lib. These differ in that they're public > >> libraries. I lean towards this being okay. The binaries being in the > >> subdirectory of %{_libdir} doesn't have as much precedent. Perhaps we > >> need to make that usage explicit in the Guidelines just like %{_libexecdir}? > >> > >> Looking at the new package I see that there's config files under > >> %{_libdir}/openmpi. I think these need to go in %{_sysconfdir} instead. > >> This is more important than binaries and libraries for several reasons: > >> > >> 1) Having them in %{_libdir} breaks the sharable/unsharable boundary > > > > Not really, but that's due to typical usage of these specific files. I > > would tend to agree that files normally in /etc are something that are > > intended to be edited on a per machine basis. These files, even though > > they are in %{_libdir}/%{mpidir}/etc, are not something that you would > > edit on a per machine basis. If anything, things like the > > openmpi-default-hostfile would be edited on a per version basis (and > > with this layout they have a per version etc directory to be contained > > in). This is because on a large cluster, you are likely to either allow > > all the machines in the cluster to participate and would put all the > > machines in the cluster in this config file, or you would have a segment > > of the cluster that is dedicated to running this version of openmpi and > > only those machines would be in this file. Either way, for all the > > machines you want running this version of openmpi by default, the file > > would be the same (this assumes that a person might start the openmpi > > job from any machine in the cluster that's part of the appropriate > > group, you may have a control machine doing things instead, in which > > case you really only have to edit the file on that one machine and all > > the others will be passive clients and not care about the contents of > > this file). > > > Okay.. but then you preclude the possibility of running multiple > instances of one mpi version within a cluster. It sounds like that's > not typical in your experience but it doesn't sound like a necessary > limitation. No, nothing's precluded. None of these files are essential to the openmpi operation (well, the mpivars.* files are essential to mpi-selector operation, but they are files that should never be edited by admins, they are created during the build process and are static, I just stuck them there) and every single one of them has the option to be overridden by an instance specific version of the file. > > Now, the even more common scenario is that you have multiple different > > MPI apps. The admins typically would do a login per app so that the > > default login environment for a given app is already pre-configured. > > Amongst that would be things like selection of the right mpi, and host > > files specific to what machines that app is allowed to run on. Those > > would all be in the home directory for the login and wouldn't require > > editing the system wide etc files in here. > > > Despite the environment being somewhat different than normal this kind > of configuration is normal for any apps. > > >> 2) They are files edited by system admins and looked at by the user. > >> They should be in a predictable place for this reason. > > > > In truth, they aren't edited much at all, and relying upon them is > > frowned upon. But, as I noted above, even if they are edited, they are > > still generally shareable due to the nature of MPI clusters. > > > > This is true of other applications as well, though.... > > So even if we don't care about people having multiple different openmpi > instances within their cluster, this still doesn't answer what breaks by > putting the config files in /etc. Which is important because deviating > breaks other sysadmin assumptions. No, putting the files in /etc would actually break more sysadmin assumptions that anything else. OpenMPI and the mvapich stacks have been installed under static prefix installations for far longer than we've been interested in shipping them. People totally new to the openmpi realm/usage might have some assumptions broken, but people who have been using openmpi and similar packages for years would have theirs broken by our changes. And it's not just the users. The package itself has a configure option to enable --prefix behavior by default, and if you read the man page you'll see that there is a specific option for passing in the --prefix to the run time environment, and in fact if you even just start the run time environment using a full path such as /usr/local/bin/mpirun, it automatically enables --prefix mode and sets the prefix to one directory up from the one the binary is in, and then it passes %{prefix}/bin in the path and %{prefix}/lib in the ld library path to the remote nodes. > For instance, if I was backing up > all configuration files on these machines by backing up /etc, this would > miss the openmpi configs. If I was mounting the /usr filesystem > read-only, this would prevent me from updating the config file on-the-fly. True on both counts. Of course, given clusters, totally irrelevant points. They don't back up individual /etc directories on any of the cluster nodes. All the nodes are set up so that they can be installed by the cluster manager, and they can be reinstalled as needed. Nodes are disposable. Also, they *do* mount /usr read-only in lots of clusters, and they couldn't care less that default files are in there. If they need to edit them, they go to the cluster disk controller node and edit it where it isn't read-only and then all the nodes see the changes. More commonly, the batch scheduler they use has its own private data directory on the controlling node and it writes the necessary files on the fly (or passes the options entirely on the command line) based upon what nodes it intends to start the job on. My point is that these "we care about single node issues" simply do not exist in clusters, and *can't* exist or they make the cluster unmaintainable. > >> As you noted, there's also some FHS regressions compared to the current > >> package: > >> > >> - include files are under %{_libdir} instead of under %{_includedir} -- > >> If these are arch specific include files then this makes sense. If not, > >> they belong in %{_includedir}. What things were broken by doing that? > > > > Two things here. Remember that we allow simultaneous installs of > > different versions of OpenMPI (you can't get it out of the yum channel > > this way, and you can't do upgrades of OpenMPI or it wipes older > > versions out, but you can download anything after the openmpi-1.2.5 I > > think and install different copies of different versions, although that > > does not include multiple releases of the same version, I only use n-v > > in the naming, not full n-v-r, so for instance you couldn't have 1.2.7-5 > > and 1.2.7-6 installed, but you can have 1.1.8 and 1.2.7 installed at the > > same time) in order to meet user requirements. Differing versions can > > have differing header files, so we can't just use %{_includedir}/%{name} > > or they might conflict. Putting the includes alongside the libs works > > for just about any devel package that needs to use it because you can > > just use --prefix to configure it to the right place. Of course, the > > gcc wrappers also know about where the right include files are, so it > > works with mpicc without doing anything. The second reason is that for > > fortran use in particular, the header file produced during build is > > different for different arches. > > The correct way to do this is by having the version in the includedir: > %{_includedir}/openmpi-1.2.7/*.h > > > So aside from the multi-install issue, > > there is an arch specific component to the headers that can't be worked > > around due to limitations in the fortran language (or that's my > > understanding, I haven't touched fortran since 1991 or so). > > > So is it only the fortran headers that are arch specific or all all of > them arch specific and only fortran doesn't have a way to workaround that? All the headers except fortran have the ability to do things like #ifdef __i386__ so that a single header works on all arches. The fortan header can't and must be specific to the arch it's referencing. In the past, what I tried to do was put the headers under %{_includedir}/openmpi and then I created an arch specific dir and moved the arch specific header into that. Because of how openmpi's mpicc works, this then meant that I had to run a sed script during the install on the files in %{_datadir}/%{name}/help/*-wrapper.txt to edit in the additional include directory into the default include search list (I also had to edit in -m %{mode} on multilib capable arches). This is why the datadir help files had to be placed in an arch specific location. > arch specific headers do belong in a subdir of %{_libdir}. But most of > the times just that file goes into %{_libdir}. If you take a look at > glib-devel, for instance, you have: > /usr/lib/glib/include/glibconfig.h > /usr/include/glib-1.2 > /usr/include/glib-1.2/glib.h > /usr/include/glib-1.2/gmodule.h I actually think that doing the above is worse than what I'm doing with all of the openmpi include files in one place. And when I *did* have the openmpi includes inside %{_includedir}, I *still* kept all the includes there and just created bit size specific include dirs under the main openmpi include dir. I find either of those alternatives superior to the junk above. > >> - man dirs are now under %{_libdir} instead of under %{_datadir}. What > >> broke by having these under %{_datadir}? > > > > Multiple installs > > This shouldn't be the case. Once again, the correct solution to this > problem is including the version in the directory name. > > > and also if we put it under datadir, then we have to > > fiddle with manpath when we set up the environment. With them where > > they are, the presence of %{_libdir}/%{mpidir}/bin in the exec path is > > enough for man to track down the right man page automatically. > > > And this should be something that environment modules takes care of. Sorry, not convincing. The openmpi package has unique requirements. It has assumptions about being in its own prefix coded into its actual runtime operation. And although openmpi might be able to do these things, neither mvapich or mvapich2 even allow installing their files in anything other than their own private prefix. So making all these changes to openmpi wouldn't solve the issue on the other two and would simply serve to fragment how people handle the various MPI implementations, taking us from one standard to two. And this all because the people that put out the FHS decided that if you are an ISV then you can put code into /opt under a private prefix, but if you are the OS vendor, then even if the code really *is* highly optional and not shipped by default and really *should* be in /opt with its own prefix, you can't do it. My response to that is that the FHS people were being dumbasses and should have left us with a bit more flexibility to do the right thing depending on circumstances. In fact, if I were to do *anything* with the openmpi package, it would be to go ahead and move it under /opt in defiance of FHS because that's where it really needs to be. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband
Attachment:
signature.asc
Description: This is a digitally signed message part
-- fedora-devel-list mailing list fedora-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-devel-list