Michael Reed wrote:
Mike Christie wrote:
On 09/29/2009 08:34 PM, Giridhar Malavali wrote:
3) From your previous mail, I understand that you don't require a
combined limit per target. Say the total queue depth for all LUN's on a
particular target should not exceed some threshold.
James Smart had done this patch
http://marc.info/?l=linux-scsi&m=121070114018354&w=2
where it sets the starget->can_queue based on info we get from vendors.
The patch did not get merged. JamesB does not want the
starget->can_queue to be static, and wants code like the queue full
tracking code which dynamically ramps the device queue depth up and down.
Agree. Some amount of dynamic management of queue full seems desirable.
I believe any such dynamic management needs to acknowledge that it
exists in a multi-initiator environment, i.e., might get a QUEUE_FULL
with no other commands outstanding.
Completely agree - but there are multiple levels to the problem, many of
which are at odds with each other....
I am not sure if JamesB meant that he wants to ramp down the
starget->can_queue based on getting a QEUEU_FULL though. I thought he
just meant he wants it to be dynamic.
What does "be dynamic" mean if not adjusted based upon a target's
response to scsi commands?
If I am right, then I think we
could use JamesS's patch to set an initial starget->can_queue and add
another field for a max value. Then we could add some code that ramps
down/up based on something like command completion time or throughput or
some other value.
The desire of my patch was to aid single-initiator cases where multiple
luns share the target, and there is a per-target resource limit, such as
a maximum number of commands per target port. Thus, the can_queue caps
the number of commands allowed to be issued to the target.
In single-initiator, it would effectively stops QUEUE_FULLS from the
target, aiding two issues:
- If the lun queue levels overcommit the target, I have seen targets
that are so busy just handling the receive of the cmd frames, that they
don't have enough cycles to send QUEUE_FULLS back, or punt and drop
commands on the floor, or in the worse case, are so consumed they stop
work on everything, including i/o they had already received. Note: these
behaviors play havoc on any backoff algorithm dependent upon target
response. The OEMs try to solve this by providing configuration
formulas. However, these have so many variables they are very complex to
do right, and inexperienced admins may not know the formulas at all. If
we cap the outstanding i/o count, we never overcommit, and never see
these headaches.
- Assuming that backoff algorithms are done per-lun, and if per-lun
queue levels drop to outstanding loads and slowly ramp back up - there's
an implicit biasing that takes place toward the luns that already have
i/o outstanding or have yet to send i/o - meaning their queue levels are
higher than the lun that saw the QUEUE_FULL. As they have more credit to
submit i/o they may always consume more of the target than the
backed-off lun, never allow it to get back to a level playing field. If
we avoid the QUEUE_FULLS to begin with, this biasing is lessened (but
not removed as it moves it back into the io scheduling area, which we
always have).
In multi-initiator, it really doesn't change the problem, but it will
lessen the degree to which an over-committed target is overwhelmed,
which has to be goodness. Especially in cases where the target behaves
like I described above.
My intent is that the cap per-target is static. It would be initialized
from the device record at device detection. I have no problem allowing a
sysfs parameter to change it's value. However, I do not believe we want
a ramp-up/ramp-down on this value.
My ideas for the queuing algorithms is that we would have several
selectable policies - on both the target and lun. If we did do a
target-based ramp-up/ramp-down, I would have the device record value I
added be the "max" (and max can be unlimited), and when the algorithm
was selected, have the initial value (if necessary) and ramp up/down
parameters specified.
We don't necessarily need or want can_queue set by a value encoded into
a kernel table. Some of our raid devices' can_queue values vary based
upon the firmware they are running. A static table would, at best, be a
decent starting point. At worst, it could dramatically over-commit the
target. Our raid devices' max can_queue is either per raid controller
or per host port.
Whatever path we go down, I view having a user programmable upper bound
as a requirement.
Agree. If your device behaves as you stated - then don't set a maximum,
which is the default backward-compatible part of my patch.
If JamesS did mean that he wanted to ramp down the starget->can_queue
based on QUEUE_FULLs then JamesS and JamesB do not agree on that and we
are stuck.
I don't consider ramp up/down of starget->can_queue a requirement.
But I also don't consider its presence a problem.
Agreed. My preference is a ramp up/down on a per-lun basis. However,
you may select an algorithm that manipulates all luns at the same time
for a target.
Our requirements are pretty simple: the ability to limit the number
of commands queued to a target or lun in a multi-initiator environment
such that no individual initiator can fully consume the resources
of the target/lun. I.e., we want a user programmable upper bound
on all queue_depth and can_queue adjustments. (Yes, I've stated this
a few times. :)
Easy to state, not so easy to truly do. But I'm in agreement.
-- james s
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html