See comments below... On Tuesday 17 November 2009 07:52:01 Todd Denniston wrote: > Benjamin Smith wrote, On 11/16/2009 10:56 PM: > > I have a 1TB USB drive plugged into a USB2 port that I use to back up the > > production drives (which are SCSI). It's working fine, but while doing > > backups (hourly) the load average on the server shoots up from the normal > > 0.5 - 1.5 or so up to a high between 10 and 30. Strangely, even though > > the "load is high" the server is completely responsive, even the USB > > drives being accessed are! > > > > Backup script is really simple, run via cron, pretty much just: > > > > #! /bin/sh > > hour=`date +%k`; > > pg_dump <options> mydatabase > /media/backups/mydatabase.$hour.pgsql; > > > > where /media/backups is the mount point for the USB drive. > > > > Using top to diagnose, nothing seems to be particularly high! IoWait > > seems reasonable (10-30%) and CPUs are 0.5%, Idle is 70-90%. Even > > accessing the USB partition while the load is "high" is responsive! > > > > I'm guessing that something changed in how load average is counted? > > > > Server Stats: > > Late model 8-way Xeon, SuperMicro brand. > > CentOS 4.x / 64 (all updates applied, booted after last kernel update) > > Kernel 2.6.9-89.0.16.ELsmp > > 4 GB ECC RAM > > 300 GB SCSI HDD. > > Standard Apache/PHP, Postgres 8.4. > > > > Any idea how to revert to the old load average tracking behavior short of > > using a stale and potentially insecure kernel? > Are you saying that when you were running a previous kernel the same > operations with the same devices did not have the high load? Correct! > Which > specific kernels worked as desired (if someone is going to bisect the > problem they need a start point)? kernel-smp-devel-2.6.9-89.0.15.EL (I always keep my machines updated on at least a weekly scheduule) > Are there other processes on the machine that are waiting to use the db > while the dump is occurring? No. Database is actually on a different machine and backups are being done over the network. > How many postgres processes are waiting for > the dump to finish (it has been a while since I ran postgres so I don't > recall how it deals with query's during a dump)? One - the one performing the backup. Postgres uses MVCC so pg_dump doesn't block any other connections from continuing/finishing. > As workarounds perhaps asking the kernel to schedule in a specific way > might help, i.e.: #1 set the backup on a particular set of processors, > # replace the pg_dump line above with > taskset -c 3-4 pg_dump <options> mydatabase > \ > /media/backups/mydatabase.$hour.pgsql; There are 8 cores on the machine, none of which are reporting more than 5% load. That's what has me perplexed. When I run top, I see a max of about 30% user. Everything else is zero. When I run the backup script to a non-USB drive, the load average is completely normal (below 0.50, often below 0.10) > #2 set the usb-storage on a particular set of processors, > # Note USBSTORPID= line prototyped on CentOS 5 machine not 4. > USBSTORPID=`ps aux |grep usb-storage|head -1 |awk '{print $2}'` > taskset -p -c 3-4 $USBSTORPID > #you might even go back and reduce the processor list > #to just 3 or 4 instead of both. Could you explain to me what this should accomplish? I'm curious as to why you went this route... > #3 don't update atime > # (should at worst be a minor thing, and you say that > # the usb mounted file system is responsive, > # but perhaps it would help some.) > mount -oremount,noatime /media/backups/ Already mounted noatime... here's the mount line in the backup script: # mount -o rw,noatime -t ext3 /dev/sdc1 /home/backup/localdb/ -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos