RE: RE: RE: Memory performance / Cache problem

"Woodruff, Richard" <r-woodruff2@xxxxxx> · Wed, 14 Oct 2009 10:25:12 -0500

> From: epsi@xxxxxx [mailto:epsi@xxxxxx]
> Sent: Wednesday, October 14, 2009 9:49 AM
> To: Woodruff, Richard; linux-omap@xxxxxxxxxxxxxxx; Premi, Sanjeev
> Subject: Re: RE: RE: Memory performance / Cache problem
>
> Mem clock is both times 166MHz. I don't know whether are differences in cycle
> access and timing, but memclock is fine.

How did you physically verify this?

> Following Siarhei hints of initialize the buffers (around 1.2 MByte each)
> I get different results in 22kernel for use of
> malloc alone
> memcpy =   473.764, loop4 =   448.430, loop1 =   102.770, rand =    29.641
> calloc alone
> memcpy =   405.947, loop4 =   361.550, loop1 =    95.441, rand =    21.853
> malloc+memset:
> memcpy =   239.294, loop4 =   188.617, loop1 =    80.871, rand =     4.726
>
> In 31kernel all 3 measures are about the same (unfortunatly low) level of
> malloc+memset in 22.

Yes aligned buffers can make a difference.  But probably more so for small copies.  Of course you must touch the memory or mprotect() it so its faulted in, but indications are you have done this.

> I used a standard memcpy (think this is glib and hence not neonbased)?
> To be neonbased I guess it has to be recompiled?

The version of glibc in use can make a difference.  CodeSourcery in 2009 release added PLD's to mem operations.  This can give a good benefit.  It might be you have optimized library in one case and a non-optimized in another.

Regards,
Richard W.

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html