Hi, On our backup system (2 LTO4 drives/Tandberg library via LSISAS1068E, Kernel 2.6.36 with the stock Fusion MPT SAS Host driver 3.04.17 on debian/squeeze), we see reproducible tape read and write failures after the system was under memory pressure: [342567.297152] st0: Can't allocate 2097152 byte tape buffer. [342569.316099] st0: Can't allocate 2097152 byte tape buffer. [342570.805164] st0: Can't allocate 2097152 byte tape buffer. [342571.958331] st0: Can't allocate 2097152 byte tape buffer. [342572.704264] st0: Can't allocate 2097152 byte tape buffer. [342873.737130] st: from_buffer offset overflow. Bacula is spewing this message every time it tries to access the tape drive: 28-Nov 19:58 sd1.techfak JobId 2857: Error: block.c:1002 Read error on fd=10 at file:blk 0:0 on device "drv2" (/dev/nst0). ERR=Input/output error By memory pressure, I mean that the KVM processes containing the postgres-db (~20million files) and the bacula director have used all available RAM, one of them used ~4GiB of its 12GiB swap for an hour or so (by selecting a full restore, it seems that the whole directory tree of the 15mio files backup gets read into memory). After this, I wasn't able to read from the second tape drive anymore (/dev/st0); whereas the first tape drive was restoring the data happily (it is currently about halfway through a 3TiB restore from 5 tapes). This same behaviour appears when we're doing a few incremental backups; after a while, it just isn't possible to use the tape drives anymore - every I/O operation gives an I/O Error, even a simple dd bs=64k count=10. After a restart, the system behaves correctly until -seemingly- another memory pressure situation occured. I'd be delighted if somebody can help me debug this; my systemtap skills are non-existent unfortunatly. kind regads, Lukas Kolbe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html