Hi Greg, On Wed, Sep 14, 2016 at 6:14 PM, Greg KH <greg@xxxxxxxxx> wrote: > On Wed, Sep 14, 2016 at 04:15:11PM +0100, Martin Townsend wrote: >> Hi, >> >> We are seeing a very strange problem here with our embedded AM4378 >> (ARM Cortex A9) board. We have 2 UARTS configured as RS485 ports and >> very occasionally we are seeing a byte missing from a received message >> and it's always the 49th byte. By occasionally it could take around >> 8000 or more received messages before we see it. >> >> I can reproduce the problem by configuring both TTY ports the same >> (raw mode) using >> stty -F /dev/ttyS3 intr undef quit undef erase undef kill undef eof >> undef eol undef eol2 undef swtch undef start undef stop undef susp >> undef rprnt undef werase undef lnext undef discard undef \ >> parenb -parodd -cmspar cs8 -hupcl -cstopb cread clocal -crtscts \ >> -ignbrk -brkint -ignpar -parmrk inpck -istrip -inlcr -igncr -icrnl >> -ixon -ixoff -iuclc -ixany -imaxbel -iutf8 \ >> -opost -olcuc -ocrnl -onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 >> bs0 vt0 ff0 \ >> -isig -icanon -iexten -echo -echoe -echok -echonl -noflsh -xcase >> -tostop -echoprt -echoctl -echoke -flusho -extproc 115200 >> >> and then loopback the 2 UARTs on the board and use cat and echo. >> >> Here's an example of the missing byte. >> .... >> 123456789012345678901234567890123456789012345678x0 >> 123456789012345678901234567890123456789012345678x0 >> 1234567890123456789012345678901234567890123456780 >> 123456789012345678901234567890123456789012345678x0 >> 123456789012345678901234567890123456789012345678x0 >> .... >> >> By using the following stress-ng command >> >> stress-ng --cpu-load 50 --io 4 --vm 2 --vm-bytes 128M --fork 4 >> >> It will happen more often but not straightaway you may have to leave >> it for a while and then it starts happening more frequently for a >> period and then it's fine again. Increasing the cpu-load didn't seem >> to make it happen more frequently but that is just from observation. >> >> The serial driver is 8250, as it's a TI CPU the driver is the OMAP version. >> The Linux kernel is 4.1 LTSI. > > Does this happen in 4.7? 4.8-rc? > > 4.1 is really old, loads of things have been fixed since then... I could build a newer Kernel but we have committed to 4.1 LTSI and I have seen that a lot has changed in the newer kernels (including RS485 half duplex support) which looks like it won't back port too easily. > >> I'm at a loss on how to debug this, I tried putting printk's in the >> tty_buffer code to see if the actual message being passed in from the >> serial driver contained the full message to try and work out if the >> driver was the problem but I didn't get any output in dmesg. >> >> One thing I also noticed that maybe completely unrelated is that I >> accidentally turned on the echo on the receiving end >> stty -F /dev/ttyS2 echo >> and then transmitted a long string >> echo '123456789012345678901234567890123456789012345678x012345678' > /dev/ttyS3 >> With this I always get >> 123456789012345678901234567890123456789012345678x >> The x is the 49th byte. > > echo and cat are not good things to use for serial ports, they do not > handle error handling properly or flow control. So you can't really > know much about this. A lot of people would agree with you on this :) cat and echo probably aren't the best tools but they were easiest to hand that could reproduce the failure. The original failure was detected using a Modbus RTU stack but it took ages to fire. I have managed to get printk working and have tracked the problem down to the driver so I should be able to find the problem now. If it's anything that I haven't done with my half duplex changes I'll report back but I suspect this is where the problem lies. > > thanks, > > greg k-h Cheers, Martin. -- To unsubscribe from this list: send the line "unsubscribe linux-serial" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html