On Tue, Feb 09, 2021 at 02:13:38PM +0800, JeffleXu wrote: > > > On 2/9/21 11:11 AM, Ming Lei wrote: > > On Mon, Feb 08, 2021 at 04:52:41PM +0800, Jeffle Xu wrote: > >> DM will iterate and poll all polling hardware queues of all target mq > >> devices when polling IO for dm device. To mitigate the race introduced > >> by iterating all target hw queues, a per-hw-queue flag is maintained > > > > What is the per-hw-queue flag? > > Sorry I forgot to update the commit message as the implementation > changed. Actually this mechanism is implemented by patch 10 of this > patch set. It is hard to associate patch 10's spin_trylock() with per-hw-queue flag. Also scsi's poll implementation is in-progress, and scsi's poll may not be implemented in this way. > > > > >> to indicate whether this polling hw queue currently being polled on or > >> not. Every polling hw queue is exclusive to one polling instance, i.e., > >> the polling instance will skip this polling hw queue if this hw queue > >> currently is being polled by another polling instance, and start > >> polling on the next hw queue. > > > > Not see such skip in dm_poll_one_dev() in which > > queue_for_each_poll_hw_ctx() is called directly for polling all POLL > > hctxs of the request queue, so can you explain it a bit more about this > > skip mechanism? > > > > It is implemented as patch 10 of this patch set. When spin_trylock() > fails, the polling instance will return immediately, instead of busy > waiting. > > > > Even though such skipping is implemented, not sure if good performance > > can be reached because hctx poll may be done in ping-pong style > > among several CPUs. But blk-mq hctx is supposed to have its cpu affinities. > > > > Yes, the mechanism of iterating all hw queues can make the competition > worse. > > If every underlying data device has **only** one polling hw queue, then > this ping-pong style polling still exist, even when we implement split > bio tracking mechanism, i.e., acquiring the specific hw queue the bio > enqueued into. Because multiple polling instance has to compete for the > only polling hw queue. > > But if multiple polling hw queues per device are reserved for multiple > polling instances, (e.g., every underlying data device has 3 polling hw > queues when there are 3 polling instances), just as what we practice on > mq polling, then the current implementation of iterating all hw queues > will indeed works in a ping-pong style, while this issue shall not exist > when accurate split bio tracking mechanism could be implemented. In reality it could be possible to have one hw queue for each numa node. And you may re-use blk_mq_map_queue() for getting the proper hw queue for poll. -- Ming