Re: [PATCH] libfdt: Remove special handling for unaligned reads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



On Thu, Jan 30, 2020 at 02:18:21PM -0600, Rob Herring wrote:
> On Thu, Jan 30, 2020 at 8:44 AM Patrice CHOTARD <patrice.chotard@xxxxxx> wrote:
> >
> > Hi Rob, Tom, David
> >
> > On 1/28/20 3:08 PM, Rob Herring wrote:
> > > On Tue, Jan 28, 2020 at 7:43 AM Tom Rini <trini@xxxxxxxxxxxx> wrote:
> > >> On Tue, Jan 28, 2020 at 07:59:18PM +1100, David Gibson wrote:
> > >>> On Mon, Jan 27, 2020 at 10:04:34AM -0500, Tom Rini wrote:
> > >>>> On Mon, Jan 27, 2020 at 02:23:51PM +1100, David Gibson wrote:
> > >>>>> On Thu, Jan 23, 2020 at 07:16:50AM -0500, Tom Rini wrote:
> > >>>>>> On Thu, Jan 23, 2020 at 08:14:40PM +1100, David Gibson wrote:
> > >>>>>>> On Fri, Jan 17, 2020 at 10:31:06AM -0500, Tom Rini wrote:
[snip]
> > > Why not just fix the helpers for the aligned case and be done with it:
> > >
> > > static inline uint32_t fdt32_ld(const fdt32_t *p)
> > > {
> > >        const uint8_t *bp = (const uint8_t *)p;
> > >
> > > +       if (!((unsigned long)p & 0x3))
> > > +               return fdt32_to_cpu(*p);
> > > +
> > >        return ((uint32_t)bp[0] << 24)
> > >                | ((uint32_t)bp[1] << 16)
> > >                | ((uint32_t)bp[2] << 8)
> > >                | bp[3];
> > > }
> >
> >
> > Here are the results, before and after your proposed patch:
> >
> > tested tag: V2020.04-RC1
> >
> > STM32MP> bootstage report
> > Timer summary in microseconds (12 records):
> >        Mark    Elapsed  Stage
> >           0          0  reset
> >      84,268     84,268  SPL
> >     962,921    878,653  end SPL
> >     965,800      2,879  board_init_f
> >   4,314,348  3,348,548  board_init_r
> >   4,863,763    549,415  id=64
> >   4,908,759     44,996  id=65
> >   4,909,459        700  main_loop
> >   5,322,309    412,850  id=175
> >
> > Accumulated time:
> >                 83,284  dm_r
> >                 95,842  dm_spl
> >              1,502,020  dm_f
> >
> >
> > -------------------------------------------------------------------------------
> >
> > tested tag : V2020.04-RC1 + following fdt32_ld patch :
> >
> > static inline uint32_t fdt32_ld(const fdt32_t *p)
> > {
> >        const uint8_t *bp = (const uint8_t *)p;
> >
> > +       if (!((unsigned long)p & 0x3))
> > +               return fdt32_to_cpu(*p);
> > +
> >        return ((uint32_t)bp[0] << 24)
> >                | ((uint32_t)bp[1] << 16)
> >                | ((uint32_t)bp[2] << 8)
> >                | bp[3];
> > }
> >
> >
> > STM32MP> bootstage report
> > Timer summary in microseconds (12 records):
> >        Mark    Elapsed  Stage
> >           0          0  reset
> >      84,264     84,264  SPL
> >     959,300    875,036  end SPL
> >     962,192      2,892  board_init_f
> >   4,310,598  3,348,406  board_init_r
> >   4,860,074    549,476  id=64
> >   4,905,119     45,045  id=65
> >   4,905,819        700  main_loop
> >   5,098,636    192,817  id=175
> >
> > Accumulated time:
> >                 83,202  dm_r
> >                 95,252  dm_spl
> >              1,501,950  dm_f
> >
> >
> > There is no gain in board_init_r(), the added alignment test is expensive itself.

Drat.

> That's odd. It should be just an AND and a conditional branch added.
> Those should be a few cycles compared to tens to hundreds of cycles
> for 4 loads instead of 1.

The later loads are very likely to hit in cache though, right?  Or is
this done before the caches are activated?

> Doesn't u-boot build with -Os typically? Maybe it's not actually
> inlining.

Apparently it is.  And yet... it seems pretty suspicious that the
numbers are so close, despite doing something quite different here.
Can we somehow verify that the fast path is really being executed?

For comparison purposes, what are the numbers if we change these to
unconditionally do an aligned load?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Device Tree]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]

  Powered by Linux