Re: [PATCH 10/10] blkcg: implement BPF_PROG_TYPE_IO_COST

Tejun Heo <tj@xxxxxxxxxx> · Fri, 14 Jun 2019 10:09:14 -0700

Hello, Alexei.

On Fri, Jun 14, 2019 at 04:35:35PM +0000, Alexei Starovoitov wrote:
> the example bpf prog looks flexible enough to allow some degree
> of experiments. The question is what kind of new algorithms you envision
> it will do? what other inputs it would need to make a decision?
> I think it's ok to start with what it does now and extend further
> when need arises.

I'm not sure right now.  The linear model worked a lot better than I
originally expected and looks like it can cover most of the current
use cases.  It could easily be that we just haven't seen enough
different cases yet.

At one point, quadratic model was on the table in case the linear
model wasn't good enough.  Also, one area which may need improvements
could be factoring in r/w mixture into consideration.  Some SSDs'
performance nose-dive when r/w commands are mixed in certain
proportions.  Right now, we just deal with that by adjusting global
performance ratio (vrate) but I can imagine a model which considers
the issue history in the past X seconds of the cgroup and bumps the
overall cost according to r/w mixture.

> > * Is block ioctl the right mechanism to attach these programs?
> 
> imo ioctl is a bit weird, but since its only one program per block
> device it's probably ok? Unless you see it being cgroup scoped in
> the future? Then cgroup-bpf style hooks will be more suitable
> and allow a chain of programs.

As this is a device property, I think there should only be one program
per block device.

> > * Are there more parameters that need to be exposed to the programs?
> > 
> > * It'd be great to have efficient access to per-blockdev and
> >    per-blockdev-cgroup-pair storages available to these programs so
> >    that they can keep track of history.  What'd be the best of way of
> >    doing that considering the fact that these programs will be called
> >    per each IO and the overhead can add up quickly?
> 
> Martin's socket local storage solved that issue for sockets.
> Something very similar can work for per-blockdev-per-cgroup.

Cool, that sounds great in case we need to develop this further.  Andy
had this self-learning model which didn't need any external input and
could tune itself solely based on device saturation state.  If the
prog can remember states cheaply, it'd be pretty cool to experiment
with things like that in bpf.

Thanks.

-- 
tejun