HP SureStore T20 streamer crashes RH9 Linux

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi.

I want to setup an HP SureStore T20 streamer on a box with a newly installed
RH9 Linux. The computer is Intel P3 800 MUh with 512 MB RAM. It's a
small-office server used to distribute files to the intranet and providing
internet capabilities (squid, apache, postfix, fetchmail, ...). I'm using
Travan tapes with 10/20 GB (compressed) size. The streamer is connected to
an SCSI controller DawiControl DC-2974 PCI (scsi0: Tekram DC390/AM53C974
V2.0f 2000-12-20). There is a second one (scsi1: 3ware Storage Controller)
for a 3ware RAID.  The system contains an IDE CD/DVD drive, two 3Com PCI
3c905C Tornado Vers LK1.1.18-ac cards, and several on-board devices (audio,
usb). The old OS was an SuSE Linux 7.0 and employees meant that the streamer
worked during this time. I can't imagine why it makes problems now :-(

The computer crashes if I'm storing big (several GB) directories. I'm
starting the backup and after some hours the system hangs and nothing is
able to reanimate the OS. I tested with these software packages: arkeia
light, amanda and mt. Simple tests using tar and small files can be written
and reread without any problems. A "mt rewind ; tar cf /dev/tape /data"
allows me to reproduce the bug. After some time nothing works anymore, so I
think it's not a client problem.

Tested kernels were the standard RH9 kernel and an update I got from the RHN
(version 2.4.20-20.9). The kernel log contains a lot of lines like the
following:

Nov  6 03:31:02 linux kernel: DC390: Pointer restored. Total -12736512, Bus
bf7d2800
Nov  6 03:31:04 linux kernel: DC390: Pointer restored. Total -12703744, Bus
bf7da800
Nov  6 03:31:04 linux kernel: DC390: Pointer restored. Total -12703744, Bus
bf7da800
Nov  6 03:31:06 linux kernel: DC390: Pointer restored. Total -12670976, Bus
bf7e2800
Nov  6 03:31:06 linux kernel: DC390: Pointer restored. Total -12670976, Bus
bf7e2800
Nov  6 03:31:08 linux kernel: DC390: Pointer restored. Total -12638208, Bus
bf7ea800
Nov  6 03:31:08 linux kernel: DC390: Pointer restored. Total -12638208, Bus
bf7ea800
Nov  6 03:31:10 linux kernel: DC390: Pointer restored. Total -12605440, Bus
bf7f2800
Nov  6 03:31:10 linux kernel: DC390: Pointer restored. Total -12605440, Bus
bf7f2800
Nov  6 03:31:12 linux kernel: DC390: Pointer restored. Total -12572672, Bus
bf7fa800

after some time I see:

Nov  6 20:04:06 linux kernel: scsi : aborting command due to timeout: pid
450046, scsi0, channel 0, id 4, lun 0 Prevent/Allow Medium Removal 00 00 00
01 00
Nov  6 20:04:06 linux kernel: DC390: Abort command (pid 450046, Device
04-00)
Nov  6 20:04:06 linux kernel: DC390: SRB: Xferred 00000000, Remain 00000000,
State 00000040, Phase 05
Nov  6 20:04:06 linux kernel: DC390: AdpaterStatus: 00, SRB Status 00
Nov  6 20:04:06 linux kernel: DC390: Status of last IRQ (DMA/SC/Int/IRQ):
2080c420
Nov  6 20:04:06 linux kernel: DC390: Register dump: SCSI block:
Nov  6 20:04:06 linux kernel: DC390: XferCnt  Cmd Stat IntS IRQS FFIS Ctl1
Ctl2 Ctl3 Ctl4
Nov  6 20:04:06 linux kernel: DC390:  000028   42   02  c3   00   60   17
48   08  84
Nov  6 20:04:06 linux kernel: DC390: Register dump: DMA engine:
Nov  6 20:04:06 linux kernel: DC390: Cmd   STrCnt    SBusA    WrkBC    WrkAC
Stat SBusCtrl
Nov  6 20:04:06 linux kernel: DC390:  00 00000040 1daa1710 00000024 1daa172c
00 03184500
Nov  6 20:04:06 linux kernel: DC390: Register dump: PCI Status: 0200
Nov  6 20:04:06 linux kernel: DC390: In case of driver trouble read
linux/drivers/scsi/README.tmscsim
Nov  6 20:04:06 linux kernel: DC390: Abort current command (pid 450046, SRB
dfe90144)
Nov  6 20:04:06 linux kernel: DC390: Aborted pid 450046 with status 3
Nov  6 20:04:06 linux kernel: SCSI host 0 abort (pid 450046) timed out -
resetting
Nov  6 20:04:06 linux kernel: SCSI bus is being reset for host 0 channel 0.
Nov  6 20:04:06 linux kernel: DC390: RESET ... done
Nov  6 20:04:06 linux kernel: DC390: Illegal Operation detected (20c38418)!
Nov  6 20:04:06 linux kernel: DC390: SRB: Xferred 00000000, Remain 00000000,
State 00000040, Phase 05
Nov  6 20:04:06 linux kernel: DC390: AdpaterStatus: 00, SRB Status 00
Nov  6 20:04:06 linux kernel: DC390: Status of last IRQ (DMA/SC/Int/IRQ):
20c38418

... shortly after that event everything stops.

Sometimes there is also a small stack trace:

Nov  6 20:29:34 linux kernel: do_IRQ: stack overflow: 956
Nov  6 20:29:34 linux kernel: d1bf4a6c 000003bc c030e530 000c3244 00000000
ddaa1600 dfe90000 c010d778
Nov  6 20:29:34 linux kernel:        000c3244 b131c277 00001fa6 00000000
ddaa1600 dfe90000 b1391cc0 ccca0068
Nov  6 20:29:34 linux kernel:        cd650068 ffffff00 c024b964 00000060
00000287 00000002 e080d8ed 000c3244
Nov  6 20:29:34 linux kernel: Call Trace:   [<c010d778>] call_do_IRQ
[kernel] 0x5 (0xd1bf4a88))
Nov  6 20:29:34 linux kernel: [<c024b964>] __rdtsc_delay [kernel] 0x14
(0xd1bf4ab4))
Nov  6 20:29:34 linux kernel: [<e080d8ed>] scsi_dispatch_cmd [scsi_mod]
0x33d (0xd1bf4ac4))
Nov  6 20:29:34 linux kernel: [<e0816446>] scsi_request_fn [scsi_mod] 0x1d6
(0xd1bf4afc))
Nov  6 20:29:34 linux kernel: [<e087be0c>] rh_init_int_timer [usb-uhci] 0x5c
(0xd1bf4b0c))
Nov  6 20:29:34 linux kernel: [<e0815818>] __scsi_insert_special [scsi_mod]
0x58 (0xd1bf4b34))
Nov  6 20:29:34 linux kernel: [<e0815898>] scsi_insert_special_req
[scsi_mod] 0x28 (0xd1bf4b44))
Nov  6 20:29:34 linux kernel: [<e080daab>] scsi_do_req_R1f341175 [scsi_mod]
0xeb (0xd1bf4b58))
Nov  6 20:29:34 linux kernel: [<e082e965>] dc390_SendSRB [tmscsim] 0x85
(0xd1bf4b6c))
Nov  6 20:29:34 linux kernel: [<e080d0b0>] scsi_wait_done [scsi_mod] 0x0
(0xd1bf4b8c))
Nov  6 20:29:34 linux kernel: [<e080d97d>] scsi_wait_req_R6f82968f
[scsi_mod] 0x6d (0xd1bf4ba8))
Nov  6 20:29:34 linux kernel: [<e080d0b0>] scsi_wait_done [scsi_mod] 0x0
(0xd1bf4bbc))

Nov  6 20:29:34 linux kernel: [<e0810a08>] ioctl_internal_command [scsi_mod]
0x68 (0xd1bf4be4))
Nov  6 20:29:34 linux kernel: [<e08110c1>] scsi_ioctl_R4cb00872 [scsi_mod]
0x101 (0xd1bf4c10))
Nov  6 20:29:34 linux kernel: [<e08165ce>] scsi_request_fn [scsi_mod] 0x35e
(0xd1bf4c40))
Nov  6 20:29:34 linux kernel: [<e0815a3d>] scsi_queue_next_request
[scsi_mod] 0x3d (0xd1bf4c78))
Nov  6 20:29:34 linux kernel: [<e080d599>] scsi_release_command_Rf1aeb218
[scsi_mod] 0x29 (0xd1bf4c90))
Nov  6 20:29:34 linux kernel: [<e080d9aa>] scsi_wait_req_R6f82968f
[scsi_mod] 0x9a (0xd1bf4ca0))
Nov  6 20:29:34 linux kernel: [<e080d0b0>] scsi_wait_done [scsi_mod] 0x0
(0xd1bf4cb4))
  (last block repeated multiple times)

I looked into the DC390 driver and I can't find a reason for the crash. The
last driver code change occured in 2000 I believe. With Google I couldn't
find any relevant information. Is this error known and do others have
problems with such a configuration? Is there a better driver I could use for
this streamer? If yes, where? What can I do to remove the problem? Every
hint and any idea is possibly helpful.

Thank you for reading this long mail.

Regards,

Gerrit Albrecht



-- 
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list

[Index of Archives]     [CentOS]     [Kernel Development]     [PAM]     [Fedora Users]     [Red Hat Development]     [Big List of Linux Books]     [Linux Admin]     [Gimp]     [Asterisk PBX]     [Yosemite News]     [Red Hat Crash Utility]


  Powered by Linux