Is this a bug, or does anyone have a suggestion on how to fix this error?
We have a job that runs across several compute nodes, and all but
the last process starts. The last process (on the master node) fails
with a "Text file busy" error on the program binary.
Our workaround is to run the program binary from a NFS partition mounted
via GbE while continuing to access data files and doing MPI/IO in glusterfs.
All nodes run CentOS 5.2. Glusterfs (ver 2.0.4) is mounted via
Infiniband, and the job runs under bash via SGE.
Our config files:
# client config
volume pf1
type protocol/client
option transport-type ib-verbs
option remote-port 699X
option remote-host XXX.XXX.XXX.XXX
option remote-subvolume brick
end-volume
#
volume pf2
type protocol/client
option transport-type ib-verbs
option remote-port 699X
option remote-host XXX.XXX.XXX.XXX
option remote-subvolume brick
end-volume
#
volume distribute
type cluster/distribute
subvolumes pf1 pf2
end-volume
#
# server config pf1
volume posix
type storage/posix
option directory /data
end-volume
#
volume locks
type features/locks
subvolumes posix
end-volume
#
volume brick
type performance/io-threads
option thread-count 2
subvolumes locks
end-volume
#
volume server
type protocol/server
option transport-type ib-verbs
option transport.ib-verbs.listen-port 699X
option client-volume-filename /usr/etc/glusterfs/glusterfs-client.vol
option auth.addr.brick.allow *
subvolumes brick
end-volume
--
Dennis Michael
Manager, High Productivity Technical Computing
Stanford Center for Computational Earth and Environmental Science (CEES)
School of Earth Sciences
Stanford University
397 Panama Mall Mitchell Building room 415
http://cees.stanford.edu/
phone # (650) 723 2014