Gareth, See if frequent flushing of pdflush rather than letting it aggregate changes the situation. here is a link with some interesting tips. http://www.westnet.com/~gsmith/content/linux-pdflush.htm avati 2008/1/9, Gareth Bult <gareth@xxxxxxxxxxxxx>: > > Ok, this looks like being a XEN/kernel issue as I can reproduce it without > actually "using" the glusterfs, even though it's there and mounted. > > I've included my XEN mailing list post here as this problem could well > effect anyone else using gluster and XEN and it's a bit nasty in that it > becomes more frequent the less memory you have .. so the more XEN instances > you add, the more unstable your server becomes. > > (and I'm fairly convinced gluster is "the" FS to use with XEN .. > especially when the current feature requests are processed) > > :) > > Regards, > Gareth. > > ----------- > > Posting to XEN list; > > Ok, I've been chasing this for many days .. I have a server running 10 > instances that periodically freezes .. then sometimes "comes back." > > I tried many things to try to spot the problem and finally found it by > accident. > It's a little frustrating as typically the Dom0 and One (or two) instances > "go" and the rest carry on .. and there is diddley squat when it comes to > logging information or error messages. > > I'm now using 'watch "cat /proc/meminfo"' in the Dom0. > I watch the Dirty figure increase, and occasionally decrease. > > In an instance (this is just an easy way to reproduce it quickly) do; > dd if=/dev/zero of=/tmp/bigfile bs=1M count=1000 > > Watch the "dirty" rise and at some point you'll see "writeback" cut in. > All looks good. > > Give it a few seconds and your "watch" of /proc/meminfo will freeze. > On my system "Dirty" will at this point be reading about "500M" and > "writeback" will have gone down to zero. > "xm list" in another session will confirm that you have a major problem. > (it will hang) > > For some reason PDFLUSH is not working properly !!! > On another shell "sync" and the machine instantly jumps back to life! > > I'm running a stock Ubuntu XEN 3.1 kernel. > File back XEN instances, typically 5Gb with 1Gb swap. > Dual / Dual Core 2.8G Xeon (4 in total) with 6Gb RAM. > Twin 500Gb SATA HDD (software RAID1) > > To my way of thinking (!) when it runs out of memory, it should force a > sync (or similar) and it's not, it's just sitting there. If I wait for the > dirty_expire_centisecs timer to expire, I may get some life back, some > instances will survive and some will have hung. > > Here's a working "meminfo"; > > MemTotal: 860160 kB > MemFree: 22340 kB > Buffers: 49372 kB > Cached: 498416 kB > SwapCached: 15096 kB > Active: 92452 kB > Inactive: 491840 kB > SwapTotal: 4194288 kB > SwapFree: 4136916 kB > Dirty: 3684 kB > Writeback: 0 kB > AnonPages: 29104 kB > Mapped: 13840 kB > Slab: 45088 kB > SReclaimable: 25304 kB > SUnreclaim: 19784 kB > PageTables: 2440 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > CommitLimit: 4624368 kB > Committed_AS: 362012 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 3144 kB > VmallocChunk: 34359735183 kB > > Here's one where "xm list" hangs, but my "watch" is still updating the > /proc/meminfo display; > > MemTotal: 860160 kB > MemFree: 13756 kB > Buffers: 53656 kB > Cached: 502420 kB > SwapCached: 14812 kB > Active: 84356 kB > Inactive: 507624 kB > SwapTotal: 4194288 kB > SwapFree: 4136900 kB > Dirty: 213096 kB > Writeback: 0 kB > AnonPages: 28832 kB > Mapped: 13924 kB > Slab: 45988 kB > SReclaimable: 25728 kB > SUnreclaim: 20260 kB > PageTables: 2456 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > CommitLimit: 4624368 kB > Committed_AS: 361796 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 3144 kB > VmallocChunk: 34359735183 kB > > Here's a frozen one; > > MemTotal: 860160 kB > MemFree: 15840 kB > Buffers: 2208 kB > Cached: 533048 kB > SwapCached: 7956 kB > Active: 49992 kB > Inactive: 519916 kB > SwapTotal: 4194288 kB > SwapFree: 4136916 kB > Dirty: 505112 kB > Writeback: 3456 kB > AnonPages: 34676 kB > Mapped: 14436 kB > Slab: 64508 kB > SReclaimable: 18624 kB > SUnreclaim: 45884 kB > PageTables: 2588 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > CommitLimit: 4624368 kB > Committed_AS: 368064 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 3144 kB > VmallocChunk: 34359735183 kB > > Help!!! > > Gareth. > > -- > Managing Director, Encryptec Limited > Tel: 0845 25 77033, Mob: 07853 305393, Int: 00 44 1443205756 > Email: gareth@xxxxxxxxxxxxx > Statements made are at all times subject to Encryptec's Terms and > Conditions of Business, which are available upon request. > > ----- Original Message ----- > From: "Gareth Bult" <gareth@xxxxxxxxxxxxx> > To: "gluster-devel" <gluster-devel@xxxxxxxxxx> > Sent: Wednesday, January 9, 2008 3:40:49 PM (GMT) Europe/London > Subject: Major lock-up problem > > > Hi, > > I've been developing a new system (which is now "live", hence the lack of > debug information) and have been experiencing lots of inexplicable lock up > and pause problems with lots of different components, and I've been working > my way through the systems removing / fixing problems as I go. > > I seem to have a problem with gluster I can't nail down. > > When hitting the server with sustained (typically multi-file) writes, > after a while the server goes "D" state. > If I have io-threads running on the server, only ONE process goes "D" > state. > > Trouble is, it stays "D" state and starts to lock up other processes .. a > favourite is "vi". > > Funny thing is, the machine is a XEN server (glusterfsd in the Dom0) and > the XEN instances NOT using gluster are not affected. > Some of the instances using the glusterfs are affected, depending on > whether io-threads is used on the server. > > If I'm lucky, I kill the IO process and 5 mins later the machine springs > back to life. > If I'm not, I reboot. > > Anyone any ideas? > > glfs7 and tla. > > Gareth. > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel > -- If I traveled to the end of the rainbow As Dame Fortune did intend, Murphy would be there to tell me The pot's at the other end.