Re: Bugs on Linux 2.6.18-rc2 sg code?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Fajun Chen wrote:
> Hi Doug,
> 
> I ran into sg hang problem again while running direct IO read/write.
> Here are some logs collected. I set scsi_logging_level to 63 and dmesg
> was collected right after I noticed invalid elap number.  I hope this
> will provide some information for your debugging.

Fajun,
As I said earlier I don't think that this is a sg driver
problem (well not directly). The sg driver sets up a
a SCSI command, complete with a timeout (60 seconds
in your case) and a callback. The sg driver then waits
the whole weekend (judging from the elapsed time)
and hears nothing. The timeout is administered by
the midlevel and when it goes off the mid level asks
the LLD to abort the command (and if that fails it
tries a few other things). Alternatively the LLD
can define its own exception handler. Meanwhile the sg
driver waits passively for that command to finish.

So it looks like you are sending ATA read and write
commands through the ATA PASS THROUGH (16) SCSI command
(opcode 0x85). All the commands shown in the sg debug
output terminate with CHECK CONDITION (res=0x8000002).
That doesn't look correct for read and write commands.
Have you set the CK_COND bit? I can see if an ATA read
fails you may need to fetch the lba registers. Surely
the SATL should give you an ATA Return (sense) descriptor
if an ATA command fails in command 0x85, but I couldn't
see that written in the SAT draft.

I'm just curious about that point. It probably has no
bearing on the hang, but returning sense data when there
is no need may impact performance.

Doug Gilbert

> Logs:
> ~ $ cat /proc/scsi/sg/debug
> dev_max(currently)=32 max_active_device=2 (origin 1)
> def_reserved_size=32768
>>>> device=sg0 scsi0 chan=0 id=0 lun=0   em=1 sg_tablesize=128 excl=0
>   FD(1): timeout=60000ms bufflen=131072 (res)sgat=4 low_dma=0
>   cmd_q=1 f_packid=0 k_orphan=0 closed=0
>     act: id=0 blen=0 t_o/elap=60000/458503350ms sgat=0 op=0x85

BTW The above is normal (i.e. indirect) IO trying to move up to
128 KB of data (with a four element scatter gather list, each
element 32 KB). An ATA command is being sent via the SAT
defined pass through. The elapsed time is 127 hours (over 5 days).

> ~ $ dmesg
> ne: sg0, pack_id=0, res=0x8000002
> sg_finish_rem_req: res_used=0
> sg_remove_scat: k_use_sg=16
> sg_ioctl: sg0, cmd=0x2285
> scsi_block_when_processing_errors: rtn: 1
> sg_common_write:  scsi opcode=0x85, cmd_size=16
> sg_start_req: dxfer_len=65536
> scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
> scsi_delete_timer: scmd: c080f560, rtn: 1
> sg_cmd_done: sg0, pack_id=0, res=0x8000002
                               ^^^^^^^^^^^^^
<snip>
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux