Kai Makisara wrote:
On Mon, 19 Sep 2005, Mike Christie wrote:
Kai Makisara wrote:
st sends the command using a buffer of one segment. The command is passed
to the HBA driver and it sees 8 segments. Clustering seems to work
properly (the maximum segment size is set to 65536 bytes by default).
Here is what is seen when the block size is increased to 513 kB:
dd if=tdata of=/dev/nst0 bs=513k count=1
The dd process hangs in device wait. It turns out that
scsi_execute_async() fails. This is an async write and the process later
waits for the failed write to finish. The patch at the end of this
message fixes this st bug (don't worry about the line shifts, I have some
debugging printks in this driver).
The real problem is that scsi_execute_async() fails. The 513 kB request
is 129 pages. Could the reason be related to these defaults in
include/linux?
# define MAX_PHYS_SEGMENTS 128
# define MAX_HW_SEGMENTS 128
Yeah I think this is due to the MAX_PHYS_SEGMENTS limit. The hw segments is
set by scsi_lib in scsi_alloc_queue and is the sg_tablesize value on the host.
Right now all I do is return a error when someone violates one of the limits,
but I think the right thing to do is have the ULDs take some of these values
into account when they build their lists. However if I do that we will not be
able to make large requests since the MAX_PHYS_SEGMENTS/SCSI_MAX_PHYS_SEGMENTS
will limit them. Umm let me rethink.
I have done some additional debugging. Submitting the large write fails in
bio_map_pages() called from scsi_req_map_sg(). The first reason is not
phys_segments or hw_segments limit but max_sectors. The sym53c8xx_2 uses
the default SCSI_DEFAULT_MAX_SECTORS which is 1024 (512 kB).
I increased max_sectors in sym_glue.c to 10240 in sym53c8xx and now I can
write blocks up to 1024 kB. Then bio_map_page() fails again but this time
in bio_alloc(). This is because st can allocate chunks of more than 1 MB
and this is too much for one bio. I added the code in the patch at the end
of this message to limit the chunk size and this allowed writing of blocks
up to 5120 kB.
If I try 5121 kB, write fails as is expected but not completely nicely.
Here are some test printks:
scsi_req_map_sg: q->back_merge failed, i=10
st 1:0:5:0: extraneous data discarded.
st 1:0:5:0: COMMAND FAILED (87 0 1).
sym: cmd: 0x0a 0x00 0x50 0x04 0x00 0x00
st: cmd=0x0a result=0x70000 resid=0 sense[0]=0x00 sense[2]=0x00
scsi_req_map_sg fails as it should but still a bogus SCSI command is sent.
I think the reason for this is simple but I don't want to delay the good
news by trying to debug this.
So, now st can write as large blocks as it should. Good work, Mike!
What I don't quite like is that being able to do this requires setting
SCSI adapter parameters (use_clustering, max_sectors) to values that are
not used by most drivers today. Changing is in most cases trivial but this
has to be done. Otherwise the users needing large block sizes feel that
these enhancements are a regression.
Yes, I agree. I will try to see what I can do to make those functions
handle some things for the ULDs. I made some bad assumtions, so i did
not handle some failures gracefully. I think it should be possible to
start new bios when we hit a queue sector limit then ignore the
max_sectors when creating requests so that you do not have to manually
adjust the queues limits for example (this will be useful when sectors
do not make sense for commands like with SG too). This was one of the
reasons Jens wanted those functions in the scsi layer after all.
I think we are screwed if someone does not support clustering and they
hit the phys segments limits though. This is due to the scsi sg pools
that are preallocated. Without my patches you would have not hit the
phys segment limits becuase st and sg were allocating the scatterlist
that get sent.
An alternative would be to just set the scatterlist getting passed to
scsi_execute_async to the requests data field then add some bits so that
that scatterlist gets used later instead of blk_rq_map_sg making a new
one. This would completely remove the need for scsi_req_map_sg and allow
SG and ST to avoid all those limits like before. I think some may
consider this to be hack on the block layer though?
-
: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html