Il 04/09/2012 10:46, Michael S. Tsirkin ha scritto: >>>> +static int virtscsi_queuecommand_multi(struct Scsi_Host *sh, >>>> + struct scsi_cmnd *sc) >>>> +{ >>>> + struct virtio_scsi *vscsi = shost_priv(sh); >>>> + struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; >>>> + unsigned long flags; >>>> + u32 queue_num; >>>> + >>>> + /* Using an atomic_t for tgt->reqs lets the virtqueue handler >>>> + * decrement it without taking the spinlock. >>>> + */ > > Above comment is not really helpful - reader can be safely assumed to > know what atomic_t is. Sure, the comment explains that we use an atomic because _elsewhere_ the tgt_lock is not held while modifying reqs. > Please delete, and replace with the text from commit log > that explains the heuristic used to select req_vq. Ok. > Also please add a comment near 'reqs' definition. > Something like "number of outstanding requests - used to detect idle > target". Ok. > >>>> + spin_lock_irqsave(&tgt->tgt_lock, flags); > > Looks like this lock can be removed - req_vq is only > modified when target is idle and only used when it is > not idle. If you have two incoming requests at the same time, req_vq is also modified when the target is not idle; that's the point of the lock. Suppose tgt->reqs = 0 initially, and you have two processors/queues. Initially tgt->req_vq is queue #1. If you have this: queuecommand on CPU #0 queuecommand #2 on CPU #1 -------------------------------------------------------------- atomic_inc_return(...) == 1 atomic_inc_return(...) == 2 virtscsi_queuecommand to queue #1 tgt->req_vq = queue #0 virtscsi_queuecommand to queue #0 then two requests are issued to different queues without a quiescent point in the middle. >>>> + if (atomic_inc_return(&tgt->reqs) == 1) { >>>> + queue_num = smp_processor_id(); >>>> + while (unlikely(queue_num >= vscsi->num_queues)) >>>> + queue_num -= vscsi->num_queues; >>>> + tgt->req_vq = &vscsi->req_vqs[queue_num]; >>>> + } >>>> + spin_unlock_irqrestore(&tgt->tgt_lock, flags); >>>> + return virtscsi_queuecommand(vscsi, tgt, sc); >>>> +} >>>> + >>>> + > > ..... > >>>> +static int virtscsi_queuecommand_single(struct Scsi_Host *sh, >>>> + struct scsi_cmnd *sc) >>>> +{ >>>> + struct virtio_scsi *vscsi = shost_priv(sh); >>>> + struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; >>>> + >>>> + atomic_inc(&tgt->reqs); >>>> + return virtscsi_queuecommand(vscsi, tgt, sc); >>>> +} >>>> + > > Here, reqs is unused - why bother incrementing it? > A branch on completion would be cheaper IMHO. Well, I could also let tgt->reqs go negative, but it would be a bit untidy. Another alternative is to access the target's target_busy field with ACCESS_ONCE, and drop reqs altogether. Too tricky to do this kind of micro-optimization so early, though. >> virtio-scsi multiqueue has a performance benefit up to 20% > > To be fair, you could be running in single queue mode. > In that case extra atomics and indirection that this code > brings will just add overhead without benefits. > I don't know how significant would that be. Not measurable in my experiments. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html