Vivek Goyal <vgoyal@xxxxxxxxxx> writes: > So, IIUC, the only thing little different here is that throttling is > implemented by flusher thread. But it is still per device per cgroup. I > think that is just a implementation detail whether we implement it > in block layer, or in writeback or somewhere else. We can very well > implement it in block layer and provide per bdi/per_group congestion > flag in bdi so that flusher will stop pushing more IO if group on > a bdi is congested (because IO is throttled). > > I think first important thing is to figure out what is minimal set of > requirement (As jan said in another mail), which will solve wide > variety of cases. I am trying to list some of points. > > > - Throttling for buffered writes > - Do we want per device throttling limits or global throttling > limtis. > > - Exising direct write limtis are per device and implemented in > block layer. > > - I personally think that both kind of limits might make sense. > But a global limit for async write might make more sense at > least for the workloads like backup which can run on a throttled > speed. > > - Absolute throttling IO will make most sense on top level device > in the IO stack. > > - For per device rate throttling, do we want a common limit for > direct write and buffered write or a separate limit just for > buffered writes. Another aspect to this problem is 'dirty memory limiting'. First a quick refresher on memory.soft_limit_in_bytes... In memcg the soft_limit_in_bytes can be used as a way to overcommit a machine's memory. The idea is that the memory.limit_in_bytes (aka hard limit) specified a absolute maximum amount of memory a memcg can use, while the soft_limit_in_bytes indicates the working set of the container. The simplified equation is that if the sum(*/memory.soft_limit_in_bytes) < MemTotal, then all containers should be guaranteed their working set. Jobs are allowed to allocate more than soft_limit_in_bytes so long as they fit within limit_in_bytes. This attempts to provide a min and max amount of memory for a cgroup. The soft_limit_in_bytes is related to this discussion because it is desirable if all container memory above soft_limit_in_bytes is reclaimable (i.e. clean file cache). Using previously posted memcg dirty limiting and memcg writeback logic we have been able to set a container's dirty_limit to its soft_limit. While not perfect, this approximates the goal of providing min guaranteed memory while allowing for usage of best effort memory, so long as that best effort memory can be quickly reclaimed to satisfy another container's min guarantee. > - Proportional IO for async writes > - Will probably make most sense on bottom most devices in the IO > stack (If we are able to somehow retain the submitter's context). > > - Logically it will make sense to keep sync and async writes in > same group and try to provide fair share of disk between groups. > Technically CFQ can do that but in practice I think it will be > problematic. Writes of one group will take precedence of reads > of another group. Currently any read is prioritized over > buffered writes. So by splitting buffered writes in their own > cgroups, they can serverly impact the latency of reads in > another group. Not sure how many people really want to do > that in practice. > > - Do we really need proportional IO for async writes. CFQ had > tried implementing ioprio for async writes but it does not > work. Should we just care about groups of sync IO and let > all the async IO on device go in a single queue and lets > make suere it is not starved while sync IO is going on. > > > - I thought that most of the people cared about not impacting > sync latencies badly while buffered writes are happening. Not > many complained that buffered writes of one application should > happen faster than other application. > > - If we agree that not many people require service differentation > between buffered writes, then we probably don't have to do > anything in this space and we can keep things simple. I > personally prefer this option. Trying to provide proportional > IO for async writes will make things complicated and we might > not achieve much. > > - CFQ already does a very good job of prioritizing sync over async > (at the cost of reduced throuhgput on fast devices). So what's > the use case of proportion IO for async writes. > > Once we figure out what are the requirements, we can discuss the > implementation details. > > Thanks > Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html