> Help! I am having complaints from users about CPU spikes when writing to my > RAID 1 array. I can think of two answers: first, are you sure your drives are configured sanely? that is, using dma? with any reasonable kernel, they should be, but its possible to compile in the wrong driver or make some other mistake. hdparm -iv /dev/hda and hdc should show using_dma=1. you can also look at /proc/ide/hda/settings. second, perhaps you should simply make the kernel less lazy at starting writes. here's some basic settings from 2.4: [hahn@hahn hahn]$ cat /proc/sys/vm/bdflush 30 500 0 0 500 3000 60 20 0 Value Meaning nfract Percentage of buffer cache dirty to activate bdflush ndirty Maximum number of dirty blocks to write out per wake-cycle dummy Unused dummy Unused interval jiffies delay between kupdate flushes age_buffer Time for normal buffer to age before we flush it nfract_sync Percentage of buffer cache dirty to activate bdflush synchronously nfract_stop_bdflush Percetange of buffer cache dirty to stop bdflush dummy Unused in theory, this means: - wake up bdflush when 30% of buffers are dirty. - write up to 500 blocks per wakeup. - 5 seconds between wakeups. - let a buffer age for 30 seconds before flushing it. - if 60% of buffers are dirty, start throttling dirtiers. - stop bdflush when < 20% of buffers are dirty. of course, the code doesn't exactly do this, and 2.6 is very different. still, I'm guessing that: - 500 buffers (pages, right?) is too little - 5 seconds is to infrequent - 30 seconds is probably too long I have the fileserver for one of my clusters running much smoother with ndirty=1000, interval=200 and age_buffer=1000. my logic is that the disk system can sustain around 200 MB/s, so flushing 4MB per wakeup is pretty minimal. I also hate to see the typical burstiness of bdflush - no IO between bursts at 5 second intervals. I'd rather see a smoother stream of write-outs - perhaps even a 1-second interval. finally, Unix's traditional 30-second laziness is mainly done in the hopes that a temporary file will be deleted before ever hitting the disk (and/or writes will be combined). I think 30 seconds is an eternity nowadays, and 10 seconds is more reasonable. in short: echo '30 1000 0 0 200 1000 60 20 0' > /proc/sys/vm/bdflush perhaps: echo '30 1000 0 0 100 1000 60 20 0' > /proc/sys/vm/bdflush for extra credit, investigate whether nfract=30 is too high (I think so, on today's big-memory systems). whether higher ndirty improves balance (these writes would compete with application IO, so might hurt, albeit less with 2.6's smarter IO scheduler.) whether the sync/stop parameters make a difference, too - throttling dirtiers should probably kick in earlier, but if you lower nfract, also lower nfract_stop_bdflush... > Is there a way I can tune software RAID so that writing > updates doesn't interfere with other applications? remember also that many servers don't need atime updates; this can make a big difference in some cases. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html