On Mon, Aug 28, 2017 at 03:54:59PM -0700, Tejun Heo wrote: > Hello, Shaohua. > > On Wed, Aug 23, 2017 at 11:15:15AM -0700, Shaohua Li wrote: > > loop block device handles IO in a separate thread. The actual IO > > dispatched isn't cloned from the IO loop device received, so the > > dispatched IO loses the cgroup context. > > Ah, right. Thanks for spotting this. > > > I'm ignoring buffer IO case now, which is quite complicated. Making the > > loop thread aware of cgroup context doesn't really help. The loop device > > only writes to a single file. In current writeback cgroup > > implementation, the file can only belong to one cgroup. > > Yeah, writeback tracks the most active cgroup and associates writeback > ios with that cgroup. For buffered loop devices, I think it'd be fine > to make the loop driver forward the cgroup association and let the > writeback layer determine the matching association as usual. Doing this means we must forward cgroup info to page, not bio. I need to check if we can make the mechanism work for memcg. > > For direct IO case, we could workaround the issue in theory. For > > example, say we assign cgroup1 5M/s BW for loop device and cgroup2 > > 10M/s. We can create a special cgroup for loop thread and assign at > > least 15M/s for the underlayer disk. In this way, we correctly throttle > > the two cgroups. But this is tricky to setup. > > > > This patch tries to address the issue. When loop thread is handling IO, > > it declares the IO is on behalf of the original task, then in block IO > > we use the original task to find cgroup. The concept probably works for > > other scenarios too, but right now I don't make it generic yet. > > The overall approach looks sound to me. Some comments below. > > > @@ -2058,7 +2058,10 @@ int bio_associate_current(struct bio *bio) > > > > get_io_context_active(ioc); > > bio->bi_ioc = ioc; > > - bio->bi_css = task_get_css(current, io_cgrp_id); > > + if (current->cgroup_task) > > + bio->bi_css = task_get_css(current->cgroup_task, io_cgrp_id); > > + else > > + bio->bi_css = task_get_css(current, io_cgrp_id); > > Do we need a full pointer field for this? I think we should be able > to mark lo writers with a flag (PF or whatever) and then to > kthread_data() to get the lo struct which contains the target css. > Oh, let's do csses instead of tasks for consistency, correctness > (please see below) and performance (csses are cheaper to pin). Forwarding css sounds better. > > @@ -502,11 +504,16 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, > > cmd->iocb.ki_complete = lo_rw_aio_complete; > > cmd->iocb.ki_flags = IOCB_DIRECT; > > > > + if (cmd->cgroup_task) > > + current->cgroup_task = cmd->cgroup_task; > > + > > if (rw == WRITE) > > ret = call_write_iter(file, &cmd->iocb, &iter); > > else > > ret = call_read_iter(file, &cmd->iocb, &iter); > > > > + current->cgroup_task = NULL; > > + > > if (ret != -EIOCBQUEUED) > > cmd->iocb.ki_complete(&cmd->iocb, ret, 0); > > return 0; > > Hmm... why do we need double forwarding (original issuer -> aio cmd -> > ios) in loop? If we need this, doesn't this mean that we're missing > ownership forwarding in aio paths and should make the forwarding > happen there? what do you mean double forwarding? > > > @@ -1705,6 +1712,11 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx, > > break; > > } > > > > + if (cmd->use_aio) { > > + cmd->cgroup_task = current; > > + get_task_struct(current); > > + } else > > + cmd->cgroup_task = NULL; > > What if %current isn't the original issuer of the io? It could be a > writeback worker trying to flush to a loop device and we don't want to > attribute those ios to the writeback worker. We should forward the > bi_css not the current task. Makes sense. Thanks, Shaohua