Since about any 3.x kernel release (I don't know since what release exactly) more or less randomly system starts to starve more or less because of dirty memory. When it happens I get /proc/meminfo looking as follows: MemTotal: 508392 kB MemFree: 24120 kB MemAvailable: 322184 kB Buffers: 0 kB Cached: 229884 kB SwapCached: 2380 kB Active: 178616 kB Inactive: 160760 kB Active(anon): 51172 kB Inactive(anon): 68860 kB Active(file): 127444 kB Inactive(file): 91900 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 524284 kB SwapFree: 502372 kB Dirty: 18446744073709551408 kB Writeback: 0 kB AnonPages: 107816 kB Mapped: 37096 kB Shmem: 10540 kB Slab: 105604 kB SReclaimable: 89340 kB SUnreclaim: 16264 kB KernelStack: 2128 kB PageTables: 4212 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 778480 kB Committed_AS: 875200 kB VmallocTotal: 34359738367 kB VmallocUsed: 4452 kB VmallocChunk: 34359704615 kB AnonHugePages: 0 kB DirectMap4k: 10228 kB DirectMap2M: 514048 kB As seen above Dirty is very large (looks like small negative value interpreted as unsigned - too large compare to available memory!). System is running under kvm/qemu with a single CPU and 512MB of RAM while making use of memory cgroup. The userspace process triggering it and being mostly affected is rrdcached (from rrdtool) as it's the one dirtying memory at regular intervals and in reasonable amounts - it may very well be affected by memory pressure in its cgroup (at least all the RRDs don't fit in its soft/hard cgroup memory limit). Probably most critical parts of the config are CONFIG_SMP=n in combination with pretty tight memory availability. As soon as it hits, wchan for rrdcached looks like this: grep . /proc/10045/task/*/wchan /proc/10045/task/10045/wchan:poll_schedule_timeout /proc/10045/task/10047/wchan:balance_dirty_pages_ratelimited /proc/10045/task/10048/wchan:balance_dirty_pages_ratelimited /proc/10045/task/10049/wchan:balance_dirty_pages_ratelimited /proc/10045/task/10050/wchan:balance_dirty_pages_ratelimited /proc/10045/task/10051/wchan:futex_wait_queue_me /proc/10045/task/10052/wchan:poll_schedule_timeout I can kill rrdcached (but must use -KILL for it not to take an eternity) to stop system thinking to be in permanent IO-wait but when retarting rrdcached I pretty quickly get back into original state. Only way I know to get out of this is to reboot. How is dirty memory accounting happening, would it be possible for the accounting to complain if dirty memory goes "negative" and in such case reset accounting to 0 (or pause everything the time needed to recalculate correct accounting value)? Of all the systems I'm monitoring these x86_64 systems are the only ones I've seen this issue on, though they are probably the only ones running under such tight memory constraints. Bruno -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>