> Il giorno 12 mar 2021, alle ore 12:08, brookxu <brookxu.cn@xxxxxxxxx> ha scritto: > > From: Chunguang Xu <brookxu@xxxxxxxxxxx> > Hi Chunguang, > Tasks in the production environment can be roughly divided into > three categories: emergency tasks, ordinary tasks and offline > tasks. Emergency tasks need to be scheduled in real time, such > as system agents. Offline tasks do not need to guarantee QoS, > but can improve system resource utilization during system idle > periods, such as background tasks. The above requirements need > to achieve IO preemption. At present, we can use weights to > simulate IO preemption, but since weights are more of a shared > concept, they cannot be simulated well. For example, the weights > of emergency tasks and ordinary tasks cannot be determined well, > offline tasks (with the same weight) actually occupy different > resources on disks with different performance, and the tail > latency caused by offline tasks cannot be well controlled. Using > ioprio's concept of preemption, we can solve the above problems > very well. Since ioprio will eventually be converted to weight, > using ioprio alone can also achieve weight isolation within the > same class. But we can still use bfq.weight to control resource, > achieving better IO Qos control. > > However, currently the class of bfq_group is always be class, and > the ioprio class of the task can only be reflected in a single > cgroup. We cannot guarantee that real-time tasks in a cgroup are > scheduled in time. Therefore, we introduce bfq.ioprio, which > allows us to configure ioprio class for cgroup. In this way, we > can ensure that the real-time tasks of a cgroup can be scheduled > in time. Similarly, the processing of offline task groups can > also be simpler. > I find this contribution very interesting. Anyway, given the relevance of such a contribution, I'd like to hear from relevant people (Jens, Tejun, ...?), before revising individual patches. Yet I already have a general question. How does this mechanism comply with per-process ioprios and ioprio classes? For example, what happens if a process belongs to BE-class group according to your mechanism, but to a RT class according to its ioprio? Does the pre-group class dominate the per-process class? Is all clean and predictable? > The bfq.ioprio interface now is available for cgroup v1 and cgroup > v2. Users can configure the ioprio for cgroup through this interface, > as shown below: > > echo "1 2"> blkio.bfq.ioprio Wouldn't it be nicer to have acronyms for classes (RT, BE, IDLE), instead of numbers? Thank you very much for this improvement proposal, Paolo > > The above two values respectively represent the values of ioprio > class and ioprio for cgroup. The ioprio of tasks within the cgroup > is uniformly equal to the ioprio of the cgroup. If the ioprio of > the cgroup is disabled, the ioprio of the task remains the same, > usually from io_context. > > When testing, using fio and fio_generate_plots we can clearly see > that the IO delay of the task satisfies RT> BE> IDLE. When RT is > running, BE and IDLE are guaranteed minimum bandwidth. When used > with bfq.weight, we can also isolate the resource within the same > class. > > The test process is as follows: > # prepare data disk > mount /dev/sdb /data1 > > # create cgroup v1 hierarchy > cd /sys/fs/cgroup/blkio > mkdir rt be idle > echo "1 0" > rt/blkio.bfq.ioprio > echo "2 0" > be/blkio.bfq.ioprio > echo "3 0" > idle/blkio.bfq.ioprio > > # run fio test > fio fio.ini > > # generate svg graph > fio_generate_plots res > > The contents of fio.ini are as follows: > [global] > ioengine=libaio > group_reporting=1 > log_avg_msec=500 > direct=1 > time_based=1 > iodepth=16 > size=100M > rw=write > bs=1M > [rt] > name=rt > write_bw_log=rt > write_lat_log=rt > write_iops_log=rt > filename=/data1/rt.bin > cgroup=rt > runtime=30s > nice=-10 > [be] > name=be > new_group > write_bw_log=be > write_lat_log=be > write_iops_log=be > filename=/data1/be.bin > cgroup=be > runtime=60s > [idle] > name=idle > new_group > write_bw_log=idle > write_lat_log=idle > write_iops_log=idle > filename=/data1/idle.bin > cgroup=idle > runtime=90s > > V2: > 1. Optmise bfq_select_next_class(). > 2. Introduce bfq_group [] to track the number of groups for each CLASS. > 3. Optimse IO injection, EMQ and Idle mechanism for CLASS_RT. > > Chunguang Xu (11): > bfq: introduce bfq_entity_to_bfqg helper method > bfq: limit the IO depth of idle_class to 1 > bfq: keep the minimun bandwidth for be_class > bfq: expire other class if CLASS_RT is waiting > bfq: optimse IO injection for CLASS_RT > bfq: disallow idle if CLASS_RT waiting for service > bfq: disallow merge CLASS_RT with other class > bfq: introduce bfq.ioprio for cgroup > bfq: convert the type of bfq_group.bfqd to bfq_data* > bfq: remove unnecessary initialization logic > bfq: optimize the calculation of bfq_weight_to_ioprio() > > block/bfq-cgroup.c | 99 +++++++++++++++++++++++++++++++---- > block/bfq-iosched.c | 47 ++++++++++++++--- > block/bfq-iosched.h | 28 ++++++++-- > block/bfq-wf2q.c | 124 +++++++++++++++++++++++++++++++++----------- > 4 files changed, 244 insertions(+), 54 deletions(-) > > -- > 2.30.0 >