I haven't seen any replies to this question, so I'll offer my suggestion.
I'm unaware of any way to release memory other than a reboot of the
system, however I don't have any experience with a clustered system such
as yours. I would think that best solution is to improve the program
that's causing your memory leak. If it's written in house, you should
rewrite the code to release memory. If the program is purchased, then
call the publisher and get them to improve it.
Most programmers are aware of memory leakage and are writing improved
code these days. I hope this helps.
Daneil Goodman wrote:
Hi There,
I have a Dell PoweEdge 1950 cluster running RHEL AS 4. The nodes are
internally connected using InfiniBand switch to let users to run the
parallel jobs. The cluster has serious memory leak issue which left a lot of
RAM and swap space behind after the job finished. I usually reboot the nodes
to clean up the garbage after I confirmed there is no job running on these
nodes. But recently I found sometimes other user's job could be killed when
I reboot the nodes although the killed job is running on other nodes. This
weird issue bothered me a lot. Therefore, I would like to find a way to
release the leaked memory and clean up the swap space without rebooting the
system. Can someone give me a suggestion?
Thanks a lot,
Goodman
--
veritatas simplex oratio est
-Seneca
Andrew Bacchi
Systems Programmer
Rensselaer Polytechnic Institute
phone: 518.276.6415 fax: 518.276.2809
http://www.rpi.edu/~bacchi/
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list