Re: [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call

Guenter Roeck <linux@xxxxxxxxxxxx> · Tue, 16 Apr 2019 10:08:47 -0700

On Tue, Apr 16, 2019 at 08:33:56AM +0200, Christoph Hellwig wrote:
> On Mon, Apr 15, 2019 at 02:07:31PM -0700, Guenter Roeck wrote:
> > On Mon, Apr 15, 2019 at 10:52:42PM +0200, Christoph Hellwig wrote:
> > > On Mon, Apr 15, 2019 at 12:44:35PM -0700, Guenter Roeck wrote:
> > > > This patch causes crashes with various boot tests. Most sparc tests crash, as
> > > > well as several arm tests. Bisect results in both cases point to this patch.
> > > 
> > > That just means we trigger an existing bug more easily now.  I'll see
> > > if I can help with the issues.
> > 
> > Code which previously worked reliably no longer does. I would be quite
> > hesitant to call this "trigger an existing bug more easily". "Regression"
> > seems to be a more appropriate term - even more so as it seems to cause
> > 'init' crashes, at least on arm.
> 
> Well, we have these sgls in the wild already, it just is that they

That is besides the point. Your code changes an internal API to be more
stringent and less forgiving. This causes failures, presumably because
callers of that API took advantage (on purpose or not) of it.
When changing an API, you are responsible for both ends. You can not claim
that the callers of that API are buggy. Taking advangage of a forgiving
API is not a bug. If you change an API, and that change causes a failure,
that is a regression, not a bug on the side of the caller.

On top of that, an API change causing roughly 4% of my boot tests to fail
is a serious regression. Those boot tests don't really do anything besides
trying to boot the system. If 4% of those tests fail, I don't even want to
know what else is going to fail when your patch (or patch series) hits
mainline. Your patch should be reverted until that is resolved. If making
the API more stringent / less forgiving indeed makes sense and improves code
quality and/or performance, the very least would be to change the code to
still accept what it used to accept before but generate a traceback.
That would let people fix the calling code without making systems unusable.
This is even more true with failures like the one I observed on arm,
where your patch causes init to crash without clear indication of the
root cause of that crash.

Guenter