On Wed, Feb 5, 2014 at 1:38 PM, Michael Kenney <mfkenney@xxxxxxxxx> wrote: > On Wed, Feb 5, 2014 at 1:22 PM, Martyn Welch <martyn@xxxxxxxxxxxx> wrote: >> >> >> >> On 5 February 2014 17:41, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: >>> >>> On Tue, Feb 04, 2014 at 06:34:13PM +0000, Martyn Welch wrote: >>> > On 04/02/14 16:34, Michael Kenney wrote: >>> > >Hi Martyn, >>> > > >>> > >On Tue, Feb 4, 2014 at 7:28 AM, Martyn Welch <martyn.welch@xxxxxx >>> > ><mailto:martyn.welch@xxxxxx>> wrote: >>> > > >>> > > On 28/12/13 00:34, Michael Kenney wrote: >>> > > >>> > > Hi Martyn, >>> > > >>> > > On Fri, Dec 27, 2013 at 4:23 PM, Martyn Welch >>> > > <martyn@xxxxxxxxxxxx <mailto:martyn@xxxxxxxxxxxx>> wrote: >>> > > >>> > > On 27/12/13 20:15, Michael Kenney wrote: >>> > > >>> > > >>> > > We are using the vme_tsi148 bridge driver along with >>> > > the >>> > > vme_user >>> > > driver to access the VME boards. The A/D board requires >>> > > D32 bus cycles >>> > > and the VME master window is configured accordingly, >>> > > however, when >>> > > monitoring the bus cycles with a logic analyzer, we >>> > > noticed that the >>> > > CPU is transferring one byte at a time (i.e. four D8 >>> > > transfers rather >>> > > than one D32). >>> > > >>> > > Is this the expected behavior of the tsi148 driver? >>> > > >>> > > >>> > > Hi Mike, >>> > > >>> > > This is certainly not the expected behaviour - if the >>> > > window >>> > > is configured >>> > > for D32 then it should do 32 bit transfers where possible. >>> > > >>> > > I've heard of this happening recently, but haven't yet been >>> > > able to >>> > > replicate it. Which VME board are you running Linux on and >>> > > which flavour of >>> > > Linux? >>> > > >>> > > >>> > > I'm running Debian 7.2 with kernel 3.2 on a Fastwel CPC600 >>> > > (Pentium M >>> > > based CPU board). >>> > > >>> > > >>> > > I haven't forgotten about this, still not sure exactly what is >>> > > happening. >>> > > >>> > > Is your install/kernel 32 or 64 bit? >>> > > >>> > > Are you doing single 32-bit transfers, or are you seeing this on >>> > > longer transfers (i.e. copying a buffer full of data)? >>> > > >>> > > >>> > >Thanks for getting back to me. >>> > > >>> > >I'm running a 32-bit kernel and I see this behavior on all transfers >>> > >regardless of buffer size. >>> > > >>> > >>> > Gah! Thought I could see what may be causing it in 64-bit kernels, but >>> > not >>> > 32-bit (my x86 asm is not particularly hot). >>> > >>> > I think we /may/ be hitting issues with how the memcpy function gets >>> > implemented on specific architectures. The tsi148 is a PCI/X to VME >>> > bridge, >>> > if it receives a series of 8-bit reads or writes, it translates these to >>> > 8-bit reads or writes on the VME bus, which is not necessarily what we >>> > want. >>> > >>> > According to the data sheet, the card you are using has an Intel 6300ESB >>> > ICH, which seems to be PCI/X (so we can rule out PCIe to PCI/X bridges >>> > or >>> > something like that doing nasty things). >>> > >>> > I think (if I follow everything correctly) then the memcpy for 32-bit is >>> > handled by __memcpy in arch/x86/include/asm/string_32.h: >>> > >>> > static __always_inline void *__memcpy(void *to, const void *from, size_t >>> > n) >>> > { >>> > int d0, d1, d2; >>> > asm volatile("rep ; movsl\n\t" >>> > "movl %4,%%ecx\n\t" >>> > "andl $3,%%ecx\n\t" >>> > "jz 1f\n\t" >>> > "rep ; movsb\n\t" >>> > "1:" >>> > : "=&c" (d0), "=&D" (d1), "=&S" (d2) >>> > : "0" (n / 4), "g" (n), "1" ((long)to), "2" >>> > ((long)from) >>> > : "memory"); >>> > return to; >>> > } >>> > >>> > I'd expected this function to use movl (32-bit moves) where possible, >>> > but >>> > movsb to get to naturally aligned moves (which is something that we deal >>> > with in the VME code already to use 16-bit reads where we can). >>> > >>> > Greg (as co-maintainer of the VME subsystem :-) ), Am I reading this >>> > right? >>> >>> Yes, but why would it matter if we copy by bytes, bits, or dwords to a >>> memory-backed bus? By definition, that shouldn't matter. If it does >>> matter, then we shouldn't be using memcpy, or relying on the compiler to >>> do the copying, but rather we need to use a write() function instead as >>> this would be IO memory that does care about this type of thing. >>> >> >> The drivers are using the functions memcpy_toio and memcpy_fromio for the >> bulk of the transfer, these seem to equate to memcpy on x86. >> >> In this instance it does matter - some devices can only accept certain width >> accesses and the VME bridge will translate a 8-bit PCI transfer to a 8-bit >> transfer on the VME bus. >> >> So (and I haven't even compile tested this yet), this?: >> >> diff --git a/drivers/vme/bridges/vme_ca91cx42.c >> b/drivers/vme/bridges/vme_ca91cx42.c >> index 1425d22c..0d87ebd 100644 >> --- a/drivers/vme/bridges/vme_ca91cx42.c >> +++ b/drivers/vme/bridges/vme_ca91cx42.c >> @@ -894,9 +894,9 @@ static ssize_t ca91cx42_master_read(struct >> vme_master_resource *image, >> } >> >> count32 = (count - done) & ~0x3; >> - if (count32 > 0) { >> - memcpy_fromio(buf + done, addr + done, (unsigned int)count); >> - done += count32; >> + while (done < count32) { >> + *(u32 *)(buf + done) = ioread32(addr + done); >> + done += 4; >> } >> >> if ((count - done) & 0x2) { >> @@ -948,9 +948,9 @@ static ssize_t ca91cx42_master_write(struct >> vme_master_resource *image, >> } >> >> count32 = (count - done) & ~0x3; >> - if (count32 > 0) { >> - memcpy_toio(addr + done, buf + done, count32); >> - done += count32; >> + while (done < count32) { >> + iowrite32(*(u32 *)(buf + done), addr + done); >> + done += 4; >> } >> >> if ((count - done) & 0x2) { >> diff --git a/drivers/vme/bridges/vme_tsi148.c >> b/drivers/vme/bridges/vme_tsi148.c >> index 5fbd08f..bc82991 100644 >> --- a/drivers/vme/bridges/vme_tsi148.c >> +++ b/drivers/vme/bridges/vme_tsi148.c >> @@ -1297,9 +1297,9 @@ static ssize_t tsi148_master_read(struct >> vme_master_resource *image, void *buf, >> } >> >> count32 = (count - done) & ~0x3; >> - if (count32 > 0) { >> - memcpy_fromio(buf + done, addr + done, count32); >> - done += count32; >> + if (done < count32) { >> + *(u32 *)(buf + done) = ioread32(addr + done); >> + done += 4; >> } >> >> if ((count - done) & 0x2) { >> @@ -1379,9 +1379,9 @@ static ssize_t tsi148_master_write(struct >> vme_master_resource *image, void *buf, >> } >> >> count32 = (count - done) & ~0x3; >> - if (count32 > 0) { >> - memcpy_toio(addr + done, buf + done, count32); >> - done += count32; >> + if (done < count32) { >> + iowrite32(*(u32 *)(buf + done), addr + done); >> + done += 4; >> } >> >> if ((count - done) & 0x2) { >> >> >>> >>> Does that help? >>> >> >> Yep :-) >> >> Thanks Greg. > > That certainly looks promising. I can give that patch a try later this week. > > Thanks. Good news Martyn. I had a bit of time today to test your patch and am happy to say it worked! No more bus errors. Now I can work on getting the actual data acquisition working :-) Thanks again for all your help. --Mike _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel