My backups are taking a very long time, and I am looking for ways to optimize the process, so suggestions are welcome. The clusters consist of: 2 x IBM x335 Dual Proc 3.6Ghz XEON 8GB RAM RHEL4U4 running off local 73GB U320 SCSI in hardware mirror. We are running RHCS with GFS and DLM. Data that is being backed up is on an IBM SAN. The slow backups are on an IBM DS4800, Fibre Channel 2Gbps. The DS4800 has 16GB cache, and is not being over utilized. The data is divided in to ten 2TB chunks, and mounted under /data/T1 through /data/T10. The backup software is IBM Tivoli Storage Manager. When the backups start, it processes every file to determine if it needs to be backed up or not. Last night, it took 15 hours to process 6.7 million files, then back up 4200 files (9GB) total. I do not currently know how long it takes to actually back up 9GB, but standard copies would be done relatively quick over the gig ethernet. The Tivoli backup servers caches the data on seperate SAN disks before backing up to tape, so the slowdown is not there. >From what I can tell, the slowness is only on the Red Hat servers, during processing. Comparing this to some AIX servers with large backups, the AIX servers can scan 12 million files in about 5 hours, and a Netware server scanned 17.2 million files in 16 hours. The AIX is difficult to compare, since it is totally different hardware, but the Netware server is the same model server, with only 1GB RAM, and using 1.5TB on FC SATA on an IBM FastT 100. If the Netware server takes the same time to scan more files with less RAM and slower disks, why are my Linux servers so slow. I know Netware has excellent disk I/O, but this seems to be more of a processing issue. I don't think the content or size of the files should matter, but according to our backup admin, Tivoli will check some attributes (file size, date, rights, etc.) to see if there are changes. I am looking for backup client optimizations, but would also like to see what others are doing or can suggest. The CPU is ranging between 80-100%, so I assume it is hitting both processors. If I try manual copies from this server during backups, a copy that should take 10 seconds takes 10 minutes. I moved the share to the server not performing backups, but using the same GFS storage locations, and the copy takes 10 seconds, so the SAN does not appear to be a problem. The slowness appears to be in the file scan stage, to determine what needs to be backed up. Is there any way I can optimize the disk access, RAM or processor that might benefit? I am considering adding a server to split up the load, so I could potentially have two servers with two samba shares each, and the third server could provide failover and backup services. If adding RAM would help, I am open to that as well. Additional CPUs might help, since the utilization is 80-100 during backups, but I will have to purchase new servers and move everything, which is not appealing. If I run free: total used free shared buffers cached Mem: 4040864 4020716 20148 0 20512 183012 -/+ buffers/cache: 3817192 223672 Swap: 2097144 224 2096920 Since swap is not really being used, I assume the RAM is being used for file cache, which makes it hard to determine how much RAM is actually available for processing. Are there any guidelines I can use to help me properly size the server (specifically RAM), based on the number of files or size of data? I recently upgraded from 4GB to 8GB, because I would occasionally run out of memory on the servers. Since Tivoli does some comparison of various attributes during processing, is it possible I am seeing problems related to the clustered file system(ie. du -sh on /data/T1 takes minutes the first time)? Any way to speed this up? Are others using snapshot pools or some other backup method? Thanks -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster