Re: [PATCH 1/1] arm64: Accelerate Adler32 using arm64 SVE instructions.

Dave Martin <Dave.Martin@xxxxxxx> · Thu, 5 Nov 2020 17:56:48 +0000

On Wed, Nov 04, 2020 at 06:49:05PM +0000, Mark Brown wrote:
> On Wed, Nov 04, 2020 at 06:13:06PM +0000, Dave Martin wrote:
> > On Wed, Nov 04, 2020 at 05:50:33PM +0000, Mark Brown wrote:
> 
> > > I think at a minimum we'd want to handle the vector length explicitly
> > > for kernel mode SVE, vector length independent code will work most of
> > > the time but at the very least it feels like a landmine waiting to cause
> > > trouble.  If nothing else there's probably going to be cases where it
> > > makes a difference for performance.  Other than that I'm not currently
> 
> ...
> 
> > The main reasons for constraining the vector length are a) to hide
> > mismatches between CPUs in heterogeneous systems, b) to ensure that
> > validated software doesn't run with a vector length it wasn't validated
> > for, and c) testing.
> 
> > For kernel code, it's reasonable to say that all code should be vector-
> > length agnostic unless there's a really good reason not to be.  So we
> > may not care too much about (b).
> 
> > In that case, just setting ZCR_EL1.LEN to max in kernel_sve_begin() (or
> > whatever) probably makes sense.
> 
> I agree, that's most likely a good default.
> 
> > For (c), it might be useful to have a command-line parameter or debugfs
> > widget to constrain the vector length for kernel code; perhaps globally
> > or perhaps per driver or algo.
> 
> I think a global control would be good for testing, it seems simpler and
> easier all round.  The per thing tuning seems more useful for cases
> where we run into something like a performance reason to use a limited
> set of vector lengths but I think we should only add that when we have
> at least one user for it, some examples of actual restrictions we want
> would probably be helpful for designing the interface.

Ack; note that an algo that wants to use a particular vector length can
do so by means of the special predicate patterns VLnnn, POW2, MUL3 etc.
So setting an explicit limit in ZCR_EL1.LEN should hopefully be an
uncommon requirement.

> 
> > Nonetheless, working up a candidate algorithm to help us see whether
> > there is a good use case seems like a worthwhile project, so I don't
> > want to discourage that too much.
> 
> Definitely worth exploring.

Cheers
---Dave