Re: [PATCH] libfdt: Remove special handling for unaligned reads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



On Fri, Jan 31, 2020 at 02:38:10PM +1100, David Gibson wrote:
> On Thu, Jan 30, 2020 at 02:18:21PM -0600, Rob Herring wrote:
> > On Thu, Jan 30, 2020 at 8:44 AM Patrice CHOTARD <patrice.chotard@xxxxxx> wrote:
> > >
> > > Hi Rob, Tom, David
> > >
> > > On 1/28/20 3:08 PM, Rob Herring wrote:
> > > > On Tue, Jan 28, 2020 at 7:43 AM Tom Rini <trini@xxxxxxxxxxxx> wrote:
> > > >> On Tue, Jan 28, 2020 at 07:59:18PM +1100, David Gibson wrote:
> > > >>> On Mon, Jan 27, 2020 at 10:04:34AM -0500, Tom Rini wrote:
> > > >>>> On Mon, Jan 27, 2020 at 02:23:51PM +1100, David Gibson wrote:
> > > >>>>> On Thu, Jan 23, 2020 at 07:16:50AM -0500, Tom Rini wrote:
> > > >>>>>> On Thu, Jan 23, 2020 at 08:14:40PM +1100, David Gibson wrote:
> > > >>>>>>> On Fri, Jan 17, 2020 at 10:31:06AM -0500, Tom Rini wrote:
> [snip]
> > > > Why not just fix the helpers for the aligned case and be done with it:
> > > >
> > > > static inline uint32_t fdt32_ld(const fdt32_t *p)
> > > > {
> > > >        const uint8_t *bp = (const uint8_t *)p;
> > > >
> > > > +       if (!((unsigned long)p & 0x3))
> > > > +               return fdt32_to_cpu(*p);
> > > > +
> > > >        return ((uint32_t)bp[0] << 24)
> > > >                | ((uint32_t)bp[1] << 16)
> > > >                | ((uint32_t)bp[2] << 8)
> > > >                | bp[3];
> > > > }
> > >
> > >
> > > Here are the results, before and after your proposed patch:
> > >
> > > tested tag: V2020.04-RC1
> > >
> > > STM32MP> bootstage report
> > > Timer summary in microseconds (12 records):
> > >        Mark    Elapsed  Stage
> > >           0          0  reset
> > >      84,268     84,268  SPL
> > >     962,921    878,653  end SPL
> > >     965,800      2,879  board_init_f
> > >   4,314,348  3,348,548  board_init_r
> > >   4,863,763    549,415  id=64
> > >   4,908,759     44,996  id=65
> > >   4,909,459        700  main_loop
> > >   5,322,309    412,850  id=175
> > >
> > > Accumulated time:
> > >                 83,284  dm_r
> > >                 95,842  dm_spl
> > >              1,502,020  dm_f
> > >
> > >
> > > -------------------------------------------------------------------------------
> > >
> > > tested tag : V2020.04-RC1 + following fdt32_ld patch :
> > >
> > > static inline uint32_t fdt32_ld(const fdt32_t *p)
> > > {
> > >        const uint8_t *bp = (const uint8_t *)p;
> > >
> > > +       if (!((unsigned long)p & 0x3))
> > > +               return fdt32_to_cpu(*p);
> > > +
> > >        return ((uint32_t)bp[0] << 24)
> > >                | ((uint32_t)bp[1] << 16)
> > >                | ((uint32_t)bp[2] << 8)
> > >                | bp[3];
> > > }
> > >
> > >
> > > STM32MP> bootstage report
> > > Timer summary in microseconds (12 records):
> > >        Mark    Elapsed  Stage
> > >           0          0  reset
> > >      84,264     84,264  SPL
> > >     959,300    875,036  end SPL
> > >     962,192      2,892  board_init_f
> > >   4,310,598  3,348,406  board_init_r
> > >   4,860,074    549,476  id=64
> > >   4,905,119     45,045  id=65
> > >   4,905,819        700  main_loop
> > >   5,098,636    192,817  id=175
> > >
> > > Accumulated time:
> > >                 83,202  dm_r
> > >                 95,252  dm_spl
> > >              1,501,950  dm_f
> > >
> > >
> > > There is no gain in board_init_r(), the added alignment test is expensive itself.
> 
> Drat.
> 
> > That's odd. It should be just an AND and a conditional branch added.
> > Those should be a few cycles compared to tens to hundreds of cycles
> > for 4 loads instead of 1.
> 
> The later loads are very likely to hit in cache though, right?  Or is
> this done before the caches are activated?
> 
> > Doesn't u-boot build with -Os typically? Maybe it's not actually
> > inlining.
> 
> Apparently it is.  And yet... it seems pretty suspicious that the
> numbers are so close, despite doing something quite different here.
> Can we somehow verify that the fast path is really being executed?
> 
> For comparison purposes, what are the numbers if we change these to
> unconditionally do an aligned load?

The before / after numbers can be seen at
https://lists.denx.de/pipermail/u-boot/2020-January/396707.html

-- 
Tom

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Device Tree]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]

  Powered by Linux