On Tue, Sep 04, 2012 at 12:25:03PM +0200, Paolo Bonzini wrote: > Il 04/09/2012 10:46, Michael S. Tsirkin ha scritto: > >>>> +static int virtscsi_queuecommand_multi(struct Scsi_Host *sh, > >>>> + struct scsi_cmnd *sc) > >>>> +{ > >>>> + struct virtio_scsi *vscsi = shost_priv(sh); > >>>> + struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; > >>>> + unsigned long flags; > >>>> + u32 queue_num; > >>>> + > >>>> + /* Using an atomic_t for tgt->reqs lets the virtqueue handler > >>>> + * decrement it without taking the spinlock. > >>>> + */ > > > > Above comment is not really helpful - reader can be safely assumed to > > know what atomic_t is. > > Sure, the comment explains that we use an atomic because _elsewhere_ the > tgt_lock is not held while modifying reqs. > > > Please delete, and replace with the text from commit log > > that explains the heuristic used to select req_vq. > > Ok. > > > Also please add a comment near 'reqs' definition. > > Something like "number of outstanding requests - used to detect idle > > target". > > Ok. > > > > >>>> + spin_lock_irqsave(&tgt->tgt_lock, flags); > > > > Looks like this lock can be removed - req_vq is only > > modified when target is idle and only used when it is > > not idle. > > If you have two incoming requests at the same time, req_vq is also > modified when the target is not idle; that's the point of the lock. > > Suppose tgt->reqs = 0 initially, and you have two processors/queues. > Initially tgt->req_vq is queue #1. If you have this: > > queuecommand on CPU #0 queuecommand #2 on CPU #1 > -------------------------------------------------------------- > atomic_inc_return(...) == 1 > atomic_inc_return(...) == 2 > virtscsi_queuecommand to queue #1 > tgt->req_vq = queue #0 > virtscsi_queuecommand to queue #0 > > then two requests are issued to different queues without a quiescent > point in the middle. What happens then? Does this break correctness? > >>>> + if (atomic_inc_return(&tgt->reqs) == 1) { > >>>> + queue_num = smp_processor_id(); > >>>> + while (unlikely(queue_num >= vscsi->num_queues)) > >>>> + queue_num -= vscsi->num_queues; > >>>> + tgt->req_vq = &vscsi->req_vqs[queue_num]; > >>>> + } > >>>> + spin_unlock_irqrestore(&tgt->tgt_lock, flags); > >>>> + return virtscsi_queuecommand(vscsi, tgt, sc); > >>>> +} > >>>> + > >>>> + > > > > ..... > > > >>>> +static int virtscsi_queuecommand_single(struct Scsi_Host *sh, > >>>> + struct scsi_cmnd *sc) > >>>> +{ > >>>> + struct virtio_scsi *vscsi = shost_priv(sh); > >>>> + struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; > >>>> + > >>>> + atomic_inc(&tgt->reqs); > >>>> + return virtscsi_queuecommand(vscsi, tgt, sc); > >>>> +} > >>>> + > > > > Here, reqs is unused - why bother incrementing it? > > A branch on completion would be cheaper IMHO. > > Well, I could also let tgt->reqs go negative, but it would be a bit untidy. > > Another alternative is to access the target's target_busy field with > ACCESS_ONCE, and drop reqs altogether. Too tricky to do this kind of > micro-optimization so early, though. So keep it simple and just check a flag. > >> virtio-scsi multiqueue has a performance benefit up to 20% > > > > To be fair, you could be running in single queue mode. > > In that case extra atomics and indirection that this code > > brings will just add overhead without benefits. > > I don't know how significant would that be. > > Not measurable in my experiments. > > Paolo _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization