On 04/02/14 16:34, Michael Kenney wrote:
Hi Martyn,
On Tue, Feb 4, 2014 at 7:28 AM, Martyn Welch <martyn.welch@xxxxxx
<mailto:martyn.welch@xxxxxx>> wrote:
On 28/12/13 00:34, Michael Kenney wrote:
Hi Martyn,
On Fri, Dec 27, 2013 at 4:23 PM, Martyn Welch
<martyn@xxxxxxxxxxxx <mailto:martyn@xxxxxxxxxxxx>> wrote:
On 27/12/13 20:15, Michael Kenney wrote:
We are using the vme_tsi148 bridge driver along with the
vme_user
driver to access the VME boards. The A/D board requires
D32 bus cycles
and the VME master window is configured accordingly,
however, when
monitoring the bus cycles with a logic analyzer, we
noticed that the
CPU is transferring one byte at a time (i.e. four D8
transfers rather
than one D32).
Is this the expected behavior of the tsi148 driver?
Hi Mike,
This is certainly not the expected behaviour - if the window
is configured
for D32 then it should do 32 bit transfers where possible.
I've heard of this happening recently, but haven't yet been
able to
replicate it. Which VME board are you running Linux on and
which flavour of
Linux?
I'm running Debian 7.2 with kernel 3.2 on a Fastwel CPC600
(Pentium M
based CPU board).
I haven't forgotten about this, still not sure exactly what is
happening.
Is your install/kernel 32 or 64 bit?
Are you doing single 32-bit transfers, or are you seeing this on
longer transfers (i.e. copying a buffer full of data)?
Thanks for getting back to me.
I'm running a 32-bit kernel and I see this behavior on all transfers
regardless of buffer size.
Gah! Thought I could see what may be causing it in 64-bit kernels, but
not 32-bit (my x86 asm is not particularly hot).
I think we /may/ be hitting issues with how the memcpy function gets
implemented on specific architectures. The tsi148 is a PCI/X to VME
bridge, if it receives a series of 8-bit reads or writes, it translates
these to 8-bit reads or writes on the VME bus, which is not necessarily
what we want.
According to the data sheet, the card you are using has an Intel 6300ESB
ICH, which seems to be PCI/X (so we can rule out PCIe to PCI/X bridges
or something like that doing nasty things).
I think (if I follow everything correctly) then the memcpy for 32-bit is
handled by __memcpy in arch/x86/include/asm/string_32.h:
static __always_inline void *__memcpy(void *to, const void *from, size_t n)
{
int d0, d1, d2;
asm volatile("rep ; movsl\n\t"
"movl %4,%%ecx\n\t"
"andl $3,%%ecx\n\t"
"jz 1f\n\t"
"rep ; movsb\n\t"
"1:"
: "=&c" (d0), "=&D" (d1), "=&S" (d2)
: "0" (n / 4), "g" (n), "1" ((long)to), "2"
((long)from)
: "memory");
return to;
}
I'd expected this function to use movl (32-bit moves) where possible,
but movsb to get to naturally aligned moves (which is something that we
deal with in the VME code already to use 16-bit reads where we can).
Greg (as co-maintainer of the VME subsystem :-) ), Am I reading this right?
On x86_64 I think we end up using memcpy_c_e() in
arch/x86/lib/memcpy_64.S at least some of the time:
/*
* memcpy_c_e() - enhanced fast string memcpy. This is faster and
simpler than
* memcpy_c. Use memcpy_c_e when possible.
*
* This gets patched over the unrolled variant (below) via the
* alternative instructions framework:
*/
.section .altinstr_replacement, "ax", @progbits
.Lmemcpy_c_e:
movq %rdi, %rax
movq %rdx, %rcx
rep movsb
ret
.Lmemcpy_e_e:
.previous
Which I think uses movq (64-bit moves) where possible, falling back to
movb. So for 32-bit / some small transfers, we'll see 8-bit transfers.
So it seems that your issue may unfortunately be different from what
we've seen internally.
Greg, Any ideas? I'm not sure the best person/mailing list to ask.
Martyn
--
Martyn Welch (Lead Software Engineer) | Registered in England and Wales
GE Intelligent Platforms | (3828642) at 100 Barbirolli Square
T +44(0)1327322748 | Manchester, M2 3AB
E martyn.welch@xxxxxx | VAT:GB 927559189
_______________________________________________
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxx
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel