2 node CentOS 4.8 cluster on ESX 4 cluster (cluster across
boxes) [root@host ~]# uname -a Linux hostname 2.6.9-89.0.19.ELlargesmp 2 GB RAM 2 vCPU 1 200 GB RDM - GFS1 VMware fencing Member Status: Quorate Member
Name
Status ------
----
------ Host1
Online, Local, rgmanager Host2
Online, rgmanager Service
Name Owner
(Last)
State -------
---- -----
------
-----
www-http
host1
started
www-nfs
host2
started
vhostip-http host2
started
vhost-http host2
started [root@host ~]# rpm -qa | grep cman cman-kernel-2.6.9-56.7.el4_8.10 cman-kernel-smp-2.6.9-56.7.el4_8.10 cman-devel-1.0.24-1 cman-kernel-largesmp-2.6.9-56.7.el4_8.10 cman-1.0.24-1 cman-kernheaders-2.6.9-56.7.el4_8.10 /var/log/messages Apr 26 18:45:32 tesla kernel: oom-killer: gfp_mask=0xd0 Apr 26 18:45:32 tesla kernel: Mem-info: Apr 26 18:45:32 tesla kernel: Node 0 DMA per-cpu: Apr 26 18:45:32 tesla kernel: cpu 0 hot: low 2, high 6,
batch 1 Apr 26 18:45:32 tesla kernel: cpu 0 cold: low 0, high 2,
batch 1 Apr 26 18:45:32 tesla kernel: cpu 1 hot: low 2, high 6,
batch 1 Apr 26 18:45:32 tesla kernel: cpu 1 cold: low 0, high 2,
batch 1 Apr 26 18:45:32 tesla kernel: Node 0 Normal per-cpu: Apr 26 18:45:32 tesla kernel: cpu 0 hot: low 32, high 96,
batch 16 Apr 26 18:45:32 tesla kernel: cpu 0 cold: low 0, high 32,
batch 16 Apr 26 18:45:32 tesla kernel: cpu 1 hot: low 32, high 96,
batch 16 Apr 26 18:45:32 tesla kernel: cpu 1 cold: low 0, high 32,
batch 16 Apr 26 18:45:32 tesla kernel: Node 0 HighMem per-cpu:
empty Apr 26 18:45:32 tesla kernel: Apr 26 18:45:32 tesla kernel: Free
pages: 6352kB (0kB HighMem) Apr 26 18:45:32 tesla kernel: Active:3245 inactive:3129
dirty:0 writeback:0 unstable:0 free:1588 slab:499421 mapped:4514 pagetables:914 Apr 26 18:45:32 tesla kernel: Node 0 DMA free:752kB
min:44kB low:88kB high:132kB active:0kB inactive:0kB present:15996kB
pages_scanned:0 all_unreclaimable? yes Apr 26 18:45:32 tesla kernel: protections[]: 0 286000
286000 Apr 26 18:45:32 tesla kernel: Node 0 Normal free:5600kB min:5720kB
low:11440kB high:17160kB active:12980kB inactive:12516kB present:2080704kB
pages_scanned:20031 all_unreclaimable? yes Apr 26 18:45:32 tesla kernel: protections[]: 0 0 0 Apr 26 18:45:32 tesla kernel: Node 0 HighMem free:0kB
min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
pages_scanned:0 all_unreclaimable? no Apr 26 18:45:32 tesla kernel: protections[]: 0 0 0 Apr 26 18:45:32 tesla kernel: Node 0 DMA: 4*4kB 4*8kB
2*16kB 3*32kB 3*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 752kB Apr 26 18:45:32 tesla kernel: Node 0 Normal: 0*4kB 0*8kB
0*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 1*4096kB =
5600kB Apr 26 18:45:32 tesla kernel: Node 0 HighMem: empty Apr 26 18:45:32 tesla kernel: 6192 pagecache pages Every 4 days the host2 system (running NFS service)
starts running oom-killer, goes brain dead, and gets fenced. The http
processes are restarted every morning at 4:00 AM for log rotates so I don’t
think they are the problem. Attempts to fix: http://kbase.redhat.com/faq/docs/DOC-3993 http://kbase.redhat.com/faq/docs/DOC-7317 Release Found: Red Hat Enterprise Linux 4 Update 4 Symptom: The command top shows a lot of
memory is being cached and swap is hardly being used. Solution: On Red Hat Enterprise Release 4 Update 4, a workaround to the oom killer
kills random processess while there is still memory available, is to issue the
following commend: This will cause page reclamation to happen sooner, thus providing more
'protection' for the zones. Changes to Tesla : Anybody have any ideas? Thanks, Eric |
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster