Hello! 2008/11/25 Fred Hucht <fred@xxxxxxxxxxxxxx>: > Hello Harald! > > I didn't test Infiniband transport until now, as I don't want to interfere > with the parallel applications which are running over Infiniband. Gigabit > Ethernet throughput would be sufficient for us at the moment. > > Today "only" three nodes were affected, yesterday it were nine nodes. The > problems only occur on nodes to which jobs are scheduled which use /scratch > as working directory: We test the filesystem in normal operation, one user > submits jobs to the queueing system which use /scratch/... as working > directory. While some of his jobs run without problems, other jobs fail due > to FS problems. No problems occur over the usual NFS home directory. IMHO, the fact that everything else works rules out the "network problem". Sorry for wasting your time. > When I test the FS with, e.g., dd on all nodes in parallel, no problems > occur.h > > Which timeout shall I increase? I had some "transport-timeout" in the back of my mind but the doc (http://www.gluster.org/docs/index.php/GlusterFS_Translators_v1.3#client) says that the default already is 30 seconds. I'd not change anything there without request from the developers. Harald Stürzebecher