On 12/27/18 12:33 AM, James Smart wrote:
The XRI get/put lists were partitioned per hardware queue. However,
the adapter rarely had sufficient resources to give a large number
of resources per queue. As such, it became common for a cpu to
encounter a lack of XRI resource and request the upper io stack to
retry after returning a BUSY condition. This occurred even though
other cpus were idle and not using their resources.
Create as efficient a scheme as possible to move resources
to the cpus that need them. Each cpu maintains a small private pool
which it allocates from for io. There is a watermark that the cpu
attempts to keep in the private pool. The private pool, when empty,
pulls from a global pool from the cpu. When the cpu's global pool is
empty it will pull from other cpu's global pool. As there many cpu
global pools (1 per cpu or hardware queue count) and as each cpu
selects what cpu to pull from at different rates and at different
times, it creates a radomizing effect that minimizes the number of
cpu's that will contend with each other when the steal XRI's from
another cpu's global pool.
On io completion, a cpu will push the XRI back on to its private pool.
A watermark level is maintained for the private pool such that when
it is exceeded it will move XRI's to the CPU global pool so that other
cpu's may allocate them.
On NVME, as heartbeat commands are critical to get placed on the wire,
a single expedite pool is maintained. When a heartbeat is to be sent,
it will allocate an XRI from the expedite pool rather than the normal
cpu private/global pools. On any io completion, if a reduction in the
expedite pools is seen, it will be replenished before the XRI is
placed on the cpu private pool.
Statistics are added to aid understanding the XRI levels on each
cpu and their behaviors.
As indicated previously, once we would be using embedded xris none of
this would be necessary ...
Cheers,
Hannes