On 05/03/2016 09:22 AM, Jan Kara wrote:
On Tue 03-05-16 08:23:27, Jens Axboe wrote:
On 05/03/2016 03:34 AM, Jan Kara wrote:
On Thu 28-04-16 12:53:50, Jens Axboe wrote:
2) As far as I can see in patch 8/8, you have plugged the throttling above
the IO scheduler. When there are e.g. multiple cgroups with different IO
limits operating, this throttling can lead to strange results (like a
cgroup with low limit using up all available background "slots" and thus
effectively stopping background writeback for other cgroups)? So won't
it make more sense to plug this below the IO scheduler? Now I understand
there may be other problems with this but I think we should put more
though to that and provide some justification in changelogs.
One complexity is that we have to do this early for blk-mq, since once you
get a request, you're already sitting on the hw tag. CoDel should actually
work fine at each hop, so hopefully this will as well.
OK, I see. But then this suggests that any IO scheduling and / or
cgroup-related throttling should happen before we get a request for blk-mq
as well? And then we can still do writeback throttling below that layer?
Not necessarily. For IO scheduling, basically we care about two parts:
1) Are you allowed to allocate the resources to queue some IO
2) Are you allowed to dispatch
But then it seems suboptimal to waste a relatively scarce resource (which
HW tag is AFAIU) just because you happen to run from a cgroup that is
bandwidth limited and thus are not allowed to dispatch?
For some cases, you are absolutely right, and #1 is the main one. For
your case of QD=1, that's obviously the case. For SATA, it's a bit more
grey zone, and for others (nvme, scsi, etc), it's not really a scarce
resource so #2 is the bigger part of it.
--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html