On Wed, 2008-01-16 at 18:37 +0200, Boaz Harrosh wrote: > On Wed, Jan 16 2008 at 18:11 +0200, Boaz Harrosh <bharrosh@xxxxxxxxxxx> wrote: > > On Wed, Jan 16 2008 at 17:09 +0200, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > >> On Wed, 2008-01-16 at 16:01 +0200, Boaz Harrosh wrote: > >>> On Tue, Jan 15 2008 at 19:35 +0200, Boaz Harrosh <bharrosh@xxxxxxxxxxx> wrote: > >>>> On Tue, Jan 15 2008 at 18:49 +0200, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > >>>>> On Tue, 2008-01-15 at 18:09 +0200, Boaz Harrosh wrote: > >>>>>> On Tue, Jan 15 2008 at 17:52 +0200, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > >>>>>>> I thought, now we had this new shiny code to increase the scatterlist > >>>>>>> table size I'd try it out. It turns out there's a pretty vast block > >>>>>>> conspiracy that prevents us going over 128 entries in a scatterlist. > >>>>>>> > >>>>>>> The first problems are in SCSI: The host parameters sg_tablesize and > >>>>>>> max_sectors are used to set the queue limits max_hw_segments and > >>>>>>> max_sectors respectively (the former is the maximum number of entries > >>>>>>> the HBA can tolerate in a scatterlist for each transaction, the latter > >>>>>>> is a total transfer cap on the maxiumum number of 512 byte sectors). > >>>>>>> The default settings, assuming the HBA doesn't vary them are > >>>>>>> sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS > >>>>>>> (1024). A quick calculation shows the latter is actually 512k or 128 > >>>>>>> pages (at 4k pages), hence the persistent 128 entry limit. > >>>>>>> > >>>>>>> However, raising max_sectors and sg_tablesize together still doesn't > >>>>>>> help: There's actually an insidious limit sitting in the block layer as > >>>>>>> well. This is what blk_queue_max_sectors says: > >>>>>>> > >>>>>>> void blk_queue_max_sectors(struct request_queue *q, unsigned int > >>>>>>> max_sectors) > >>>>>>> { > >>>>>>> if ((max_sectors << 9) < PAGE_CACHE_SIZE) { > >>>>>>> max_sectors = 1 << (PAGE_CACHE_SHIFT - 9); > >>>>>>> printk("%s: set to minimum %d\n", __FUNCTION__, max_sectors); > >>>>>>> } > >>>>>>> > >>>>>>> if (BLK_DEF_MAX_SECTORS > max_sectors) > >>>>>>> q->max_hw_sectors = q->max_sectors = max_sectors; > >>>>>>> else { > >>>>>>> q->max_sectors = BLK_DEF_MAX_SECTORS; > >>>>>>> q->max_hw_sectors = max_sectors; > >>>>>>> } > >>>>>>> } > >>>>>>> > >>>>>>> So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is > >>>>>>> defined in blkdev.h to .... 1024, thus also forcing the queue down to > >>>>>>> 128 scatterlist entries. > >>>>>>> > >>>>>>> Once I raised this limit as well, I was able to transfer over 128 > >>>>>>> scatterlist elements during benchmark test runs of normal I/O (actually > >>>>>>> kernel compiles seem best, they hit 608 scatterlist entries). > >>>>>>> > >>>>>>> So my question, is there any reason not to raise this limit to something > >>>>>>> large (like 65536) or even eliminate it altogether? > >>>>>>> > >>>>>>> James > >>>>>>> > >>>>>> I have an old branch here where I've swiped through the scsi drivers just > >>>>>> to remove the SG_ALL limit. Unfortunately some drivers mean laterally > >>>>>> 255 when using SG_ALL. So I passed driver by driver and carfully inspected > >>>>>> the code to change it to something driver specific if they really meant > >>>>>> 255. > >>>>>> > >>>>>> I have used sg_tablesize = ~0; to indicate, I don't care any will do, > >>>>>> and some driver constant if there is a real limit. Though removing > >>>>>> SG_ALL at the end. > >>>>>> > >>>>>> Should I freshen up this branch and send it. > >>>>> By all means; however, I think having the defined constant SG_ALL is > >>>>> useful (even if it is eventually just set to ~0) it means I can support > >>>>> any scatterlist size. Having the drivers set sg_tablesize correctly > >>>>> that can't support SG_ALL is pretty vital. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> James > >>>> OK will do. > >>>> > >>>> I have found the old branch and am looking. I agree with you about the > >>>> SG_ALL. I will fix it to have a patch per changed driver, with out changing > >>>> SG_ALL, and then final patch to just change SG_ALL. > >>>> > >>>> Boaz > >>> James hi > >>> reinspecting the code, what should I do with drivers that do not support chaining > >>> do to SW that still do sglist++? > >>> > >>> should I set their sg_tablesize to SG_MAX_SINGLE_ALLOC, or hard code to 128, and put > >>> a FIXME: in the submit message? > >>> > >>> or should we fix them first and serialize this effort on top of those fixes. > >>> (also in light of the other email where you removed the chaining flag) > >> How many of them are left? > >> > >> The correct value is clearly SCSI_MAX_SG_SEGMENTS which fortunately > >> "[PATCH] remove use_sg_chaining" moved into a shared header. Worst > >> case, just use that and add a fixme comment giving the real value (if > >> there is one). > >> > >> James > >> > >> > > > > I have 9 up to now and 10 more drivers to check. All but one are > > SW, one by one SCp.buffer++, so once it's fixed they should be able > > to go back to SG_ALL. But for now I will set them to SCSI_MAX_SG_SEGMENTS > > as you requested. I have not checked drivers that did not use SG_ALL > > but I trust these are usually smaller. > > > > Boaz > > > > > James Hi. > > Looking at the patches I just realized that I made a mistake and did > not work on top of your: "[PATCH] remove use_sg_chaining" . > Now rebasing should be easy but I think my patch should go first because > there are some 10-15 drivers that are not chained ready but will work > perfectly after my patch that sets sg_tablesize to SCSI_MAX_SG_SEGMENTS > > should I rebase or should "[PATCH] remove use_sg_chaining" be rebased? The order doesn't matter; the two patches are completely orthogonal. Just send the list what you have ... I'm rebasing a lot of stuff fairly often at this stage in the merge cycle. James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html