2.6.17-rc6: libata WARN_ON() in ata_scsi_error

Mark Lord <liml@xxxxxx> · Wed, 07 Jun 2006 12:58:35 -0400

Jeff -- I'm trying to figure out where the race that causes this is:

ata6: status=0x51 { DriveReady SeekComplete Error }
ata6: error=0x40 { UncorrectableError }
BUG: warning at drivers/scsi/libata-scsi.c:792/ata_scsi_error()

Call Trace: <ffffffff80283430>{ata_scsi_error+144} <ffffffff802746cc>{scsi_error_handler+220}
      <ffffffff80181bb7>{__activate_task+39} <ffffffff80165a9f>{thread_return+0}
      <ffffffff802745f0>{scsi_error_handler+0} <ffffffff802745f0>{scsi_error_handler+0}
      <ffffffff801951c0>{keventd_create_kthread+0} <ffffffff8013569b>{kthread+219}
      <ffffffff801625ba>{child_rip+8} <ffffffff801951c0>{keventd_create_kthread+0}
      <ffffffff801355c0>{kthread+0} <ffffffff801625b2>{child_rip+0}
PGD 75264067 PUD 75283067 PMD 0
CPU 0
Modules linked in: cpufreq_userspace cpufreq_stats freq_table cpufreq_powersave cpufreq_ondemand cpufreq_conservative vi
deo thermal processor fan container button battery ac dm_mod md_mod snd_seq_dummy snd_seq_oss ide_cd cdrom snd_seq_midi snd_seq_midi_event snd_seq af_p
acket mousedev snd_via82xx snd_via82xx_modem snd_ac97_codec snd_ac97_bus snd_pcm_oss snd_mixer_oss snd_mpu401_uart psmouse ehci_hcd snd_pcm snd_timer s
erio_raw snd_rawmidi snd_seq_device i2c_viapro sk98lin floppy pcspkr via82cxxx i2c_core snd snd_page_alloc uhci_hcd usbcore ide_core soundcore sata_mv
sg unix
Pid: 1693, comm: scsi_eh_5 Not tainted 2.6.17-rc5-git11 #7
RIP: 0010:[__nosave_end+129921632/2132602880] <ffffffff88018260>{:sata_mv:mv_eng_timeout+64}
RSP: 0018:ffff81007d54fe18  EFLAGS: 00010282
RAX: ffff81007ddbb1c0 RBX: ffff81007f601c68 RCX: 0000000000008000
RDX: ffff81007f601c68 RSI: 0000000000004e4f RDI: ffffffff88018cd8
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000033
R10: 0000000000000001 R11: 000000000000000a R12: 0000000000000286
R13: ffffffff802745f0 R14: ffff81007df59bc8 R15: ffffffff801951c0
FS:  00002b0e1bad6d60(0000) GS:ffffffff803fc000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000010 CR3: 0000000075270000 CR4: 00000000000006e0
Process scsi_eh_5 (pid: 1693, threadinfo ffff81007d54e000, task ffff81007f9032a0)
Stack: ffffffff802745f0 ffff81007f601c68 ffff81007f601800 ffffffff80283475
      00000000fffffffc ffff81007f601800 ffff81007f601800 ffffffff802746cc
      ffffffff80181bb7 ffff81007ea240c0
Call Trace: <ffffffff802745f0>{scsi_error_handler+0}
      <ffffffff80283475>{ata_scsi_error+213} <ffffffff802746cc>{scsi_error_handler+220}
      <ffffffff80181bb7>{__activate_task+39} <ffffffff80165a9f>{thread_return+0}
      <ffffffff802745f0>{scsi_error_handler+0} <ffffffff802745f0>{scsi_error_handler+0}
      <ffffffff801951c0>{keventd_create_kthread+0} <ffffffff8013569b>{kthread+219}
      <ffffffff801625ba>{child_rip+8} <ffffffff801951c0>{keventd_create_kthread+0}
      <ffffffff801355c0>{kthread+0} <ffffffff801625b2>{child_rip+0}

Code: 4c 8b 45 10 48 89 e9 48 8b 70 10 31 c0 4d 8d 48 70 e8 ca cd

This happens *after* several successful strides through error-handling
for the same (known) bad sector on a SATA drive attached to sata_mv.
My guess is that something from the earlier (successful) error-handling
is causing the later entry to have troubles.  2.6.17-rc6

Happens with/without the sata_mv eng_timeout patch that I also just posted.

Afterwards the drive is effectively locked-up.
I am recreating this with some "success" on an AMD64 kernel.

???? 
-
: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html