Re: BLKSECDISCARD ioctl and hung tasks

Salman Qazi <sqazi@xxxxxxxxxx> · Fri, 14 Feb 2020 11:42:32 -0800

On Fri, Feb 14, 2020 at 1:23 AM Ming Lei <tom.leiming@xxxxxxxxx> wrote:
>
> On Fri, Feb 14, 2020 at 1:50 PM Bart Van Assche <bvanassche@xxxxxxx> wrote:
> >
> > On 2020-02-13 11:21, Salman Qazi wrote:
> > > AFAICT, This is not actually sufficient, because the issuer of the bio
> > > is waiting for the entire bio, regardless of how it is split later.
> > > But, also there isn't a good mapping between the size of the secure
> > > discard and how long it will take.  If given the geometry of a flash
> > > device, it is not hard to construct a scenario where a relatively
> > > small secure discard (few thousand sectors) will take a very long time
> > > (multiple seconds).
> > >
> > > Having said that, I don't like neutering the hung task timer either.
> >
> > Hi Salman,
> >
> > How about modifying the block layer such that completions of bio
> > fragments are considered as task activity? I think that bio splitting is
> > rare enough for such a change not to affect performance of the hot path.
>
> Are you sure that the task hung warning won't be triggered in case of
> non-splitting?

I demonstrated a few emails ago that it doesn't take a very large
secure discard command to trigger this.  So, I am sceptical that we
will be able to use splitting to solve this.

>
> >
> > How about setting max_discard_segments such that a discard always
> > completes in less than half the hung task timeout? This may make
> > discards a bit slower for one particular block driver but I think that's
> > better than hung task complaints.
>
> I am afraid you can't find a golden setting max_discard_segments working
> for every drivers. Even it is found, the performance  may have been affected.
>
> So just wondering why not take the simple approach used in blk_execute_rq()?

My colleague Gwendal pointed out another issue which I had missed:
secure discard is an exclusive command: it monopolizes the device.
Even if we fix this via your approach, it will show up somewhere else,
because other operations to the drive will not make progress for that
length of time.

For Chromium OS purposes, if we had a blank slate, this is how I would solve it:

* Under the assumption that the truly sensitive data is not very big:
    * Keep secure data on a separate partition to make sure that those
LBAs have controlled history
    * Treat the files in that partition as immutable (i.e. no
overwriting the contents of the file without first secure erasing the
existing contents).
    * By never letting more than one version of the file accumulate,
we can guarantee that the secure erase will always be fast for
moderate sized files.

But for all the existing machines with keys on them, we will need to
do something else.

>
> Thanks,
> Ming Lei