Thank you for your postings. I tried to change the bdflush parameters, but with no difference in the behaviour of Linux.
I attached a file where I simulated my problem. There you see the memory status after different actions as well as a "ps -ef".
Could it be that this is a "normal" behavior of Linux?
Thanks a lot!
Marcel
> Marcel,
>
> Your problem sounds like the Cache Swap bug:
>
>https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=89226#c15
>
>Try doing the following:
>
>From the command line (as root) type:
>
>/sbin/sysctl -w vm.bdflush="30 500 0 0 2560 15360 60 20 0"
>
>Then edit your /etc/sysctl.conf file and add the following line:
>
>vm.bdflush="30 500 0 0 2560 15360 60 20 0"
We have been experiencing similar issues on our web server, whereby
sudden swap usage spikes basically bring the server to a halt. While the
blame has been placed on high traffic by those helping to
troubleshooting the problem, I have felt that there was something else
going on, since most of the crashes we have experienced have actually
been random, often at low peak times, and the symptoms are nearly
identical to those documented in the bug report. While granted our web
traffic has been increasing steadily over the past few months, the
beginning of our crash problems was quite sudden, starting after we
migrated from RH 7.1 to 7.3 and then updated the kernel. It has been
extremely difficult to pinpoint the issue since there are virtual no
errors logged when this occurs, and there seems to be little to go on by
way of finding a common denominator between the incidents.
A 'uname -a ; cat /proc/sys/vm/bdflush' returns:
Linux server1.myserver.net 2.4.20-20.7smp #1 SMP Mon Aug 18 14:46:14 EDT
2003 i686 unknown
30 500 0 0 500 3000 60 20 0
A couple questions, first - all of the comments regarding this bug appeared to reference RH9. Does the same apply to RH 7.3?
Yes, this is a *kernel* issue, so whatever version of RH you're using, if
you'v updated teh kernel, you can be susceptible. In fact, in comment #15,
another user mentions that they started seeing this issue in RH 7.3 after
they also updated their kernel.
Second, what exactly do these bdflush figures represent, and how does the recommended edit change the behavior of the system. Needless to say I am desperate for a solution, but I am reluctant to make any changes without understanding the potential effect on the system as a whole.
As I understand it, the bdflush parameters deal with when the kernel flushes
cache to disk and how it deals with virtual memory. I haven't dug deeply
into it, as the options I detailed in my previous post have worked just
dandy on our web server.
Nevertheless, a google on "bdflush parameters" pulled up many hits,
including:
http://www.faqs.org/docs/securing/chap6sec68.html
https://listman.redhat.com/archives/ext3-users/2003-May/msg00035.html
https://listman.redhat.com/archives/ext3-users/2002-November/msg00073.html
and many others.
Ben
-------------------------------------------------------------------------- -before systemstart- Memory - Control Center: 98% Physical, 9% Virtual [root@lux2 /]# free total used free shared buffers cached Mem: 514692 502348 12344 0 25740 424204 -/+ buffers/cache: 52404 462288 Swap: 337328 28892 308436 -------------------------------------------------------------------------- -after systemstart- Memory - Control Center: 23% Physical, 0% Virtual [ep@lux2 ep]$ free total used free shared buffers cached Mem: 514692 107464 407228 0 4052 56328 -/+ buffers/cache: 47084 467608 Swap: 337328 0 337328 -------------------------------------------------------------------------- -copied 58MB data from CDROM to disk (5 rpm files)- Memory - Control Center: 47% Physical, 0% Virtual [root@lux2 ep]# free total used free shared buffers cached Mem: 514692 236948 277744 0 4332 178004 -/+ buffers/cache: 54612 460080 Swap: 337328 0 337328 -------------------------------------------------------------------------- -deleted the just copied files again- Memory - Control Center: 35% Physical, 0% Virtual [root@lux2 ep]# free total used free shared buffers cached Mem: 514692 179604 335088 0 4332 118764 -/+ buffers/cache: 56508 458184 Swap: 337328 0 337328 -------------------------------------------------------------------------- -created tarfile from "ep-homedir" (61MB) Memory - Control Center: 59% Physical, 0% Virtual [root@lux2 /home]# free total used free shared buffers cached Mem: 514692 302604 212088 0 5032 241444 -/+ buffers/cache: 56128 458564 Swap: 337328 0 337328 -------------------------------------------------------------------------- -deleted the just created tarfile again- Memory - Control Center: 48% Physical, 0% Virtual [root@lux2 /home]# free total used free shared buffers cached Mem: 514692 244924 269768 0 5032 182056 -/+ buffers/cache: 57836 456856 Swap: 337328 0 337328 -------------------------------------------------------------------------- -starting OpenOffice- Memory - Conrtol Center: 59% Physical, 0% Virtual [root@lux2 /home]# free total used free shared buffers cached Mem: 514692 300616 214076 0 5492 230492 -/+ buffers/cache: 64632 450060 Swap: 337328 0 337328 -------------------------------------------------------------------------- -closing OpenOffice- Memory - Conrtol Center: 58% Physical, 0% Virtual [root@lux2 /home]# free total used free shared buffers cached Mem: 514692 295264 219428 0 5500 231128 -/+ buffers/cache: 58636 456056 Swap: 337328 0 337328 -------------------------------------------------------------------------- -starting OpenOffice again- Memory - Control Center: 59% Physical, 0% Virtual [root@lux2 /home]# free total used free shared buffers cached Mem: 514692 301420 213272 0 5500 231144 -/+ buffers/cache: 64776 449916 Swap: 337328 0 337328 -------------------------------------------------------------------------- -starting XWinNmr program- Memory - Control Center: 62% Physical, 0% Virtual [root@lux2 /home]# free total used free shared buffers cached Mem: 514692 314120 200572 0 5692 237076 -/+ buffers/cache: 71352 443340 Swap: 337328 0 337328 -------------------------------------------------------------------------- -processing Data in XWinNmr (writing data on disk) (7 2D's xfb)- Memory - Control Center: 88% Physical, 0% Virtual [root@lux2 /home]# free total used free shared buffers cached Mem: 514692 448800 65892 0 7856 355724 -/+ buffers/cache: 85220 429472 Swap: 337328 0 337328 -------------------------------------------------------------------------- -tarfile created (184MB data)- Memory - Control Center: 99% Physical, 0% Virtual [root@lux2 data]# free total used free shared buffers cached Mem: 514692 508328 6364 0 12276 399620 -/+ buffers/cache: 96432 418260 Swap: 337328 0 337328 -------------------------------------------------------------------------- -tarfile created (526MB data)- Memory - Conrtol Center: 99% Physical, 1% Virtual [root@lux2 /Bruker]# free total used free shared buffers cached Mem: 514692 508208 6484 0 14668 387904 -/+ buffers/cache: 105636 409056 Swap: 337328 1516 335812 -------------------------------------------------------------------------- -just created tarfile deleted- Memory - Control Center: 90% Physical, 1% Virtual [root@lux2 /Bruker]# free total used free shared buffers cached Mem: 514692 462324 52368 0 14732 340304 -/+ buffers/cache: 107288 407404 Swap: 337328 1572 335756 -------------------------------------------------------------------------- -same tarfile like before created (526MB data) Memory - Control Center: 99% Physical, 1% Virtual [root@lux2 /Bruker]# free total used free shared buffers cached Mem: 514692 508268 6424 0 12240 392452 -/+ buffers/cache: 103576 411116 Swap: 337328 2932 334396 -------------------------------------------------------------------------- -program "Gimp" opened- Memory - Control Center: 99% Physical, 1% Virtual [root@lux2 /Bruker]# free total used free shared buffers cached Mem: 514692 508164 6528 0 12668 385240 -/+ buffers/cache: 110256 404436 Swap: 337328 2980 334348 -------------------------------------------------------------------------- -all opened application closed- Memory - Control Center: 95% Physical, 1% Virtual [root@lux2 /]# free total used free shared buffers cached Mem: 514692 486796 27896 0 12852 385368 -/+ buffers/cache: 88576 426116 Swap: 337328 2988 334340 -------------------------------------------------------------------------- -tarfile created (ca. 1GB data) Memory - Control Center: 99% Physical, 2% Virtual [root@lux2 /]# free total used free shared buffers cached Mem: 514692 508332 6360 0 10720 410488 -/+ buffers/cache: 87124 427568 Swap: 337328 3960 333368 -------------------------------------------------------------------------- -just created tarfile deleted- Memory - Control Center: 82% Physical, 2% Virtual [root@lux2 /]# free total used free shared buffers cached Mem: 514692 418992 95700 0 10720 319364 -/+ buffers/cache: 88908 425784 Swap: 337328 3960 333368 -------------------------------------------------------------------------- -"ef-ps"- [root@lux2 /]# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 09:46 ? 00:00:04 init [5] root 2 1 0 09:46 ? 00:00:00 [keventd] root 3 1 0 09:46 ? 00:00:00 [ksoftirqd_CPU0] root 4 1 0 09:46 ? 00:00:01 [kswapd] root 5 1 0 09:46 ? 00:00:02 [kscand] root 6 1 0 09:46 ? 00:00:00 [bdflush] root 7 1 0 09:46 ? 00:00:00 [kupdated] root 8 1 0 09:46 ? 00:00:00 [mdrecoveryd] root 75 1 0 09:46 ? 00:00:00 [khubd] root 551 1 0 09:46 ? 00:00:00 syslogd -m 0 root 556 1 0 09:46 ? 00:00:00 klogd -2 rpc 570 1 0 09:46 ? 00:00:00 portmap rpcuser 585 1 0 09:46 ? 00:00:00 rpc.statd root 706 1 0 09:46 ? 00:00:00 /usr/sbin/automount --timeout 60 /misc file /etc/auto.misc daemon 718 1 0 09:46 ? 00:00:00 /usr/sbin/atd root 740 1 0 09:46 ? 00:00:00 xinetd -stayalive -reuse -pidfile /var/run/xinetd.pid root 770 1 0 09:46 ? 00:00:00 rpc.bootparamd lp 786 1 0 09:46 ? 00:00:00 lpd Waiting root 810 1 0 09:46 ? 00:00:00 rpc.rquotad root 815 1 0 09:46 ? 00:00:00 rpc.mountd root 820 1 0 09:46 ? 00:00:00 [nfsd] root 821 1 0 09:46 ? 00:00:00 [nfsd] root 822 1 0 09:46 ? 00:00:00 [nfsd] root 823 1 0 09:46 ? 00:00:00 [lockd] root 824 823 0 09:46 ? 00:00:00 [rpciod] root 825 1 0 09:46 ? 00:00:00 [nfsd] root 826 1 0 09:46 ? 00:00:00 [nfsd] root 827 1 0 09:46 ? 00:00:00 [nfsd] root 828 1 0 09:46 ? 00:00:00 [nfsd] root 829 1 0 09:46 ? 00:00:00 [nfsd] root 865 1 0 09:46 ? 00:00:00 gpm -t ps/2 -m /dev/mouse root 891 1 0 09:46 ? 00:00:00 crond xfs 927 1 0 09:46 ? 00:00:00 xfs -droppriv -daemon root 939 1 0 09:46 ? 00:00:00 smbd -D root 944 1 0 09:46 ? 00:00:00 nmbd -D root 964 1 0 09:46 ? 00:00:00 /usr/java/jre1.3.1/bin/i386/native_threads/java gserver root 977 1 0 09:46 tty1 00:00:00 /sbin/mingetty tty1 root 978 1 0 09:46 tty2 00:00:00 /sbin/mingetty tty2 root 979 1 0 09:46 tty3 00:00:00 /sbin/mingetty tty3 root 980 1 0 09:46 tty4 00:00:00 /sbin/mingetty tty4 root 981 1 0 09:46 tty5 00:00:00 /sbin/mingetty tty5 root 982 1 0 09:46 tty6 00:00:00 /sbin/mingetty tty6 root 983 1 0 09:46 ? 00:00:00 /usr/bin/gdm -nodaemon root 1016 983 0 09:46 ? 00:00:14 /etc/X11/X -cc 4 -depth 24 vt07 -auth /var/gdm/:0.Xauth :0 root 1017 983 0 09:46 ? 00:00:00 /usr/bin/gdm -nodaemon root 1018 964 0 09:47 ? 00:00:00 /usr/java/jre1.3.1/bin/i386/native_threads/java gserver root 1019 1018 0 09:47 ? 00:00:00 /usr/java/jre1.3.1/bin/i386/native_threads/java gserver root 1020 1018 0 09:47 ? 00:00:00 /usr/java/jre1.3.1/bin/i386/native_threads/java gserver root 1021 1018 0 09:47 ? 00:00:00 /usr/java/jre1.3.1/bin/i386/native_threads/java gserver root 1022 1018 0 09:47 ? 00:00:00 /usr/java/jre1.3.1/bin/i386/native_threads/java gserver root 1023 1018 0 09:47 ? 00:00:00 /usr/java/jre1.3.1/bin/i386/native_threads/java gserver root 1024 1018 0 09:47 ? 00:00:00 /usr/java/jre1.3.1/bin/i386/native_threads/java gserver root 1025 1018 0 09:47 ? 00:00:00 /usr/java/jre1.3.1/bin/i386/native_threads/java gserver root 1028 1018 0 09:47 ? 00:00:00 /usr/java/jre1.3.1/bin/i386/native_threads/java gserver root 1029 1018 0 09:47 ? 00:00:00 /usr/java/jre1.3.1/bin/i386/native_threads/java gserver root 1030 1018 0 09:47 ? 00:00:00 /usr/java/jre1.3.1/bin/i386/native_threads/java gserver root 1031 1018 0 09:47 ? 00:00:00 /usr/java/jre1.3.1/bin/i386/native_threads/java gserver ep 1041 1017 0 09:47 ? 00:00:00 ksmserver --restore ep 1155 1 0 09:47 ? 00:00:00 kdeinit: dcopserver --nosid ep 1157 1 0 09:47 ? 00:00:00 kdeinit: klauncher ep 1159 1 0 09:47 ? 00:00:00 kdeinit: kded ep 1162 1 0 09:48 ? 00:00:03 artsd -F 10 -S 4096 ep 1167 1 0 09:48 ? 00:00:00 kdeinit: kxmlrpcd ep 1170 1 0 09:48 ? 00:00:00 kdeinit: kaccess ep 1181 1 0 09:48 ? 00:00:00 kdeinit: Running... ep 1183 1 0 09:48 ? 00:00:00 knotify ep 1184 1181 0 09:48 ? 00:00:00 kdeinit: kwin -session 1196cd30b7000101558196400000024220000 ep 1186 1 0 09:48 ? 00:00:00 kdeinit: kdesktop ep 1188 1 0 09:48 ? 00:00:00 kdeinit: kicker ep 1204 1 0 09:48 ? 00:00:00 /usr/bin/autorun -l --interval=1000 --cdplayer=/usr/bin/kscd ep 1207 1 0 09:48 ? 00:00:00 kdeinit: klipper -icon klipper -miniicon klipper ep 1209 1 0 09:48 ? 00:00:00 kdeinit: khotkeys ep 1212 1 0 09:48 ? 00:00:00 kdeinit: kwrited ep 1213 1212 0 09:48 pts/0 00:00:00 /bin/cat ep 1217 1181 0 09:48 ? 00:00:06 kdeinit: konsole -icon konsole -miniicon konsole -caption Terminal ep 1218 1217 0 09:48 pts/1 00:00:00 /bin/bash ep 1243 1 0 09:48 ? 00:00:00 kdeinit: kcontrol -caption Control Center -icon kcontrol -miniicon kcontrol ep 1245 1181 0 09:48 ? 00:00:01 kdeinit: konsole -icon konsole -miniicon konsole -caption Terminal ep 1246 1245 0 09:48 pts/2 00:00:00 /bin/bash ep 1266 1246 0 09:48 pts/2 00:00:00 vi free.txt root 1268 1218 0 09:50 pts/1 00:00:00 su root 1272 1268 0 09:50 pts/1 00:00:00 bash ep 1687 1181 0 10:39 ? 00:00:00 kdeinit: konsole -icon konsole -miniicon konsole -caption Terminal ep 1688 1687 0 10:39 pts/3 00:00:00 /bin/bash root 1713 1272 0 10:42 pts/1 00:00:00 ps -ef ----------------------------------------------------------------------------------------------------------------------------