This sounds like the recent lpfc bug found on the LTO-3 or LTO-4. The lpfc
driver had messed up tag handling (a long time bug). The patch was posted
at http://marc.info/?l=linux-scsi&m=119367024313743&w=2
and has been pulled into scsi-rc-fixes, and I believe will be in 2.6.24.
Try this, and if it doesn't fix it, let me know.
-- james s
Sven Rudolph wrote:
Hello,
I have a strange problem with writing to a fiberchannel LTO-4 tape
drive. First my environment:
From /proc/scsi/scsi:
Host: scsi8 Channel: 00 Id: 00 Lun: 00
Vendor: IBM Model: ULTRIUM-TD4 Rev: 74H4
Type: Sequential-Access ANSI SCSI revision: 03
Host: scsi8 Channel: 00 Id: 00 Lun: 01
Vendor: ADIC Model: Scalar i500 Rev: 400G
Type: Medium Changer ANSI SCSI revision: 03
From /var/log/dmesg:
ACPI: PCI Interrupt 0000:09:00.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:09:00.0 to 64
scsi8 : on PCI bus 09 device 00 irq 16
st 7:0:0:0: st2: try direct i/o: yes (alignment 512 B)
st 7:0:0:0: Attached scsi generic sg36 type 1
lpfc 0000:09:00.0: 2:1303 Link Up Event x1 received Data: x1 x1 x10 x2
lpfc 0000:09:00.0: 2:(0):0108 No retry ELS command x4 to remote NPORT xfffffe Retried:3 Error:x3/18
scsi 8:0:0:0: Sequential-Access IBM ULTRIUM-TD4 74H4 PQ: 0 ANSI: 3
ACPI: PCI Interrupt 0000:09:00.1[B] -> <5>st 8:0:0:0: Attached scsi tape st3
st 8:0:0:0: st3: try direct i/o: yes (alignment 512 B)
st 8:0:0:0: Attached scsi generic sg37 type 1
GSI 17 (level, low) -> IRQ 17
scsi 8:0:0:1: Medium Changer ADIC Scalar i500 400G PQ: 0 ANSI: 3
scsi 8:0:0:1: Attached scsi generic sg38 type 8
PCI: Setting latency timer of device 0000:09:00.1 to 64
The tape drive is directly attached to the Emulex Lightpulse FC
controller. It is part of a Quantum (ADIC) i500 library, the library
control interface is provided via LUN 1. Changing media works fine.
I use the backup system Amanda (<http://www.amanda.org>), it
worked/works fine with other tape drives (DLT-IV, SDLT-I and SDLT-II
(sometimes called SDLT220 and SDLT600) and DLT-S4). Amanda acesses the
tape drives via the non-rewinding device (/dev/nst0 etc.) As far as I
know it does nothing special. The only reason I mention Amanda is that
I was not able to reproduce the following problem with basic tools (dd
and mt).
The problem: While writing to the tape an error is returned. The
kernel reports the following (the message appears in
/var/log/messages):
Dec 11 16:26:37 uxrs74 kernel: st3: Sense Key : Illegal Request [current]
Dec 11 16:26:37 uxrs74 kernel: st3: Add. Sense: Invalid message error
I have no idea what this means and which steps I should take.
The amanda log file (/var/log/amanda/WeeklySet1/amflush) shows this:
taper: r: switching to next holding chunk '/var/spool/amanda/server._.0.144'
taper: r: switching to next holding chunk '/var/spool/amanda/server._.0.145'
taper: reader-side: got label Set1-5-04 filenum 1
driver: state time 1556.183 free kps: 60000 space: 5075004836 taper: writing idle-dumpers: 12 qlen tapeq: 3 runq: 0 roomq: 0 wakeup: 0 driver-idle: not-idle
driver: interface-state time 1556.183 if default: free 60000
driver: hdisk-state time 1556.183 hdisk 0: free 700190684 dumpers 0 hdisk 1: fre 950492456 dumpers 0 hdisk 2: free 950492460 dumpers 0 hdisk 3: free 798084516 dumpers 0 hdisk 4: free 530198076 dumpers 0 hdisk 5: free 530870476 dumpers 0 hdisk 6: free 614676168 dumpers 0
driver: result time 1556.183 from taper: DONE 00-00001 Set1-5-04 1 "[sec 1468.628 kb 152807296 kps 104047.6 {wr: writers 4775229 rdwait 444.006 wrwait 1007.537 filemark 4.438}]"
driver: finished-cmd time 1831.009 taper wrote server:/
driver: send-cmd time 1831.009 to taper: FILE-WRITE 00-00002 /var/spool/amanda/anotherserver._.0 anotherserver UNKNOWNFEATURE / 0 20071209 0
driver: startaflush: LARGESTFIT anotherserver / 45242148 615188062
taper: writing end marker. [Set1-5-04 ERR kb 152822528 fm 2]
Some explanation: It writes out 1 GB chunks to tape, and then the data
for "server" is written out. And at the end it starts writing out the
data for "anotherserver", and then the error arrives.
From an Amanda perspective this looks normal; just like if there were
a media error while writing to tape. Above example suggests that this
problem might be due to some control command when finishing or
starting a new data blob (they are separated by so-called file marks
(terminology as used in the mt manual page)). But I believe I have
seen the same problem in the middle of data blobs too, but right now I
do not find the right log file.
And an only partially related problem, discussing this could be
off-topic on linux-scsi: I tried to strace the taper process that
writes to the tape. (I hoped to see some magic control command sent to
the tape.) But strace instantly returned:
~# strace -p 9488 -fF
Process 9488 attached - interrupt to quit
--- SIGSTOP (Stopped (signal)) @ 0 (0) ---
--- SIGSTOP (Stopped (signal)) @ 0 (0) ---
restart_syscall(<... resuming interrupted call ...>) = 32768
read(4, ptrace: umoven: Input/output error
0xffffffff, 1690719488) = 0
_exit(64) = ?
Process 9488 detached
An Internet search for "ptrace: umoven" gave some hits, but none
explained what it means and what the potential reasons are. If you
have an idea please contact me.
So thats everything I tried. Next step might be to contact Quantum,
but I'd like to give them a more specific problem description. Ideas
to reproduce this with mt/dd would be helpful too.
Sven
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html