On Fri, Jul 01 2005, Miller, Mike (OS Dev) wrote: > > > This has been tested in our labs. There is no difference in > > > performance, it just fixes the Oops. Please consider this patch for > > > inclusion. > > > > Going to per-disk queues is undoubtedly a good idea, > > performance will be much better this way. So far, so good. > > > > But you need to do something about the queueing for this to > > be acceptable, imho. You have a per-controller queueing limit > > in place, you need to enforce some fairness to ensure an > > equal distribution of commands between the disks. > > Sometime back I submitted what we called the "fair enough" algorithm. I > don't know how I managed to separate the two. :( But it was accepted and > is in the mainstream kernel. Here's a snippet: > ------------------------------------------------------------------------ > - > - /* > - * See if we can queue up some more IO > - * check every disk that exists on this controller > - * and start it's IO > - */ > - for(j=0; j < NWD; j++){ > - /* make sure the disk has been added and the drive is > real */ > - /* because this can be called from the middle of > init_one */ > - if(!(h->gendisk[j]->queue) || !(h->drv[j].heads)) > + /* check to see if we have maxed out the number of commands that > can > + * be placed on the queue. If so then exit. We do this check > here > + * in case the interrupt we serviced was from an ioctl and did > not > + * free any new commands. > + */ > + if ((find_first_zero_bit(h->cmd_pool_bits, NR_CMDS)) == NR_CMDS) > + goto cleanup; > + > + /* We have room on the queue for more commands. Now we need to > queue > + * them up. We will also keep track of the next queue to run so > + * that every queue gets a chance to be started first. > + */ > + for (j=0; j < NWD; j++){ > + int curr_queue = (start_queue + j) % NWD; > + /* make sure the disk has been added and the drive is > real > + * because this can be called from the middle of > init_one. > + */ > + if(!(h->gendisk[curr_queue]->queue) || > + !(h->drv[curr_queue].heads)) I suppose that will be fair-enough, at least as long as NR_CMDS >> q->nr_requests. With eg 4 volumes, I don't think this will be the case. Just making sure you round-robin start the queues is not enough to ensure fair queueing between them. > > Perhaps just limit the per-queue depth to something sane, > > instead of the huuuge 384 commands you have now. I've had > > several people complain to me, that ciss is doing some nasty > > starvation with that many commands in flight and we've > > effectively had to limit the queueing depth to something > > really low to get adequate read performance in presence of writes. > > The combination of per disk queues and this "fairness" prevents > starvation on different LV's. Ex: if volumes 0 & 1 are being written and > volume 3 is being read all will get _almost_ equal time. We believe the Different thing, I'm talking about single volume starvation, not volume-to-volume. > elevator algoritm(s) may be causing writes to starve reads on the same > logical volume. We continue to investigate our other performance issues. I completely disagree. Even with an intelligent io scheduler, starvation is seen on ciss that does not happen on other queueing hardware such as 'normal' scsi controllers/drives. So something else is going on, and the only 'fix' so far is to limit the ciss queue depth heavily. -- Jens Axboe - : send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html