Re: [PATCH 2/2] crypto: xor - use ktime for template benchmarking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Thu, Sep 24, 2020 at 11:40 AM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
>
> On Thu, 24 Sep 2020 at 20:22, Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > On Thu, Sep 24, 2020 at 8:36 AM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
> > >
> > > On Thu, 24 Sep 2020 at 17:28, Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
> > > >
> > > > On Thu, Sep 24, 2020 at 1:32 AM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
> > > > >
> > > ...
> > > > > > > +#define REPS           100
> > > > > >
> > > > > > Is this sufficient?  I'm not sure what the lower bound on what's
> > > > > > expected of ktime.  If I'm doing the math right, on your system
> > > > > > running 100 loops took 38802 ns in one case, since:
> > > > > >
> > > > > > (4096 * 1000 * 100) / 10556 = 38802
> > > > > >
> > > > > > If you happen to have your timer backed by a 32 kHz clock, one tick of
> > > > > > ktime could be as much as 31250 ns, right?  Maybe on systems backed
> > > > > > with a 32kHz clock they'll take longer, but it still seems moderately
> > > > > > iffy?  I dunno, maybe I'm just being paranoid.
> > > > > >
> > > > >
> > > > > No, that is a good point - I didn't really consider that ktime could
> > > > > be that coarse.
> > > > >
> > > > > OTOH, we don't really need the full 5 digits of precision either, as
> > > > > long as we don't misidentify the fastest algorithm.
> > > > >
> > > > > So I think it should be sufficient to bump this to 800. If my
> > > > > calculations are correct, this would limit any potential
> > > > > misidentification of algorithms performing below 10 GB/s to ones that
> > > > > only deviate in performance up to 10%.
> > > > >
> > > > > 800 * 1000 * 4096 / (10 * 31250) = 10485
> > > > > 800 * 1000 * 4096 / (11 * 31250) = 9532
> > > > >
> > > > > (10485/9532) / 10485 = 10%
> > > >
> > > > Seems OK to me.  Seems unlikely that super fast machine are going to
> > > > have a 32 kHz backed k_time and the worst case is that we'll pick a
> > > > slightly sub-optimal xor, I guess.  I assume your goal is to keep
> > > > things fitting in a 32-bit unsigned integer?  Looks like if your use
> > > > 1000 it also fits...
> > > >
> > >
> > > Yes, but the larger we make this number, the more time the test will
> > > take on such slow machines. Doing 1000 iterations of 4k on a low-end
> > > machine that only manages 500 MB/s (?) takes a couple of milliseconds,
> > > which is more than it does today when HZ=1000 I think.
> > >
> > > Not that 800 vs 1000 makes a great deal of difference in that regard,
> > > just illustrating that there is an upper bound as well.
> >
> > Would it make sense to use some type of hybrid approach?  I know
> > getting ktime itself has some overhead so you don't want to do it in a
> > tight loop, but maybe calling it every once in a while would be
> > acceptable and if it's been more than 500 us then stop early?
> >
>
> To be honest, I don't think we don't need complexity like this - if
> boot time is critical on such a slow system, you probable won't have
> XOR built in, assuming it even makes sense to do software XOR on such
> a system.
>
> It is indeed preferable to have a numerator that fits in a U32, and so
> 1000 would be equally suitable in that regard, but I think I will
> stick with 800 if you don't mind.

OK, fair enough.

-Doug



[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux