Re: 8250 RS485 TTY occasional 49th byte being dropped

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 14, 2016 at 7:43 PM, Martin Townsend
<mtownsend1973@xxxxxxxxx> wrote:
> Hi Greg,
>
>
> On Wed, Sep 14, 2016 at 6:14 PM, Greg KH <greg@xxxxxxxxx> wrote:
>> On Wed, Sep 14, 2016 at 04:15:11PM +0100, Martin Townsend wrote:
>>> Hi,
>>>
>>> We are seeing a very strange problem here with our embedded AM4378
>>> (ARM Cortex A9) board.  We have 2 UARTS configured as RS485 ports and
>>> very occasionally we are seeing a byte missing from a received message
>>> and it's always the 49th byte.  By occasionally it could take around
>>> 8000 or more received messages before we see it.
>>>
>>> I can reproduce the problem by configuring both TTY ports the same
>>> (raw mode) using
>>> stty -F /dev/ttyS3 intr undef quit undef erase undef kill undef eof
>>> undef eol undef eol2 undef swtch undef start undef stop undef susp
>>> undef rprnt undef werase undef lnext undef discard undef \
>>> parenb -parodd -cmspar cs8 -hupcl -cstopb cread clocal -crtscts \
>>> -ignbrk -brkint -ignpar -parmrk inpck -istrip -inlcr -igncr -icrnl
>>> -ixon -ixoff -iuclc -ixany -imaxbel -iutf8 \
>>> -opost -olcuc -ocrnl -onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0
>>> bs0 vt0 ff0 \
>>> -isig -icanon -iexten -echo -echoe -echok -echonl -noflsh -xcase
>>> -tostop -echoprt -echoctl -echoke -flusho -extproc 115200
>>>
>>> and then loopback the 2 UARTs on the board and use cat and echo.
>>>
>>> Here's an example of the missing byte.
>>> ....
>>> 123456789012345678901234567890123456789012345678x0
>>> 123456789012345678901234567890123456789012345678x0
>>> 1234567890123456789012345678901234567890123456780
>>> 123456789012345678901234567890123456789012345678x0
>>> 123456789012345678901234567890123456789012345678x0
>>> ....
>>>
>>> By using the following stress-ng command
>>>
>>> stress-ng --cpu-load 50 --io 4 --vm 2 --vm-bytes 128M --fork 4
>>>
>>> It will happen more often but not straightaway you may have to leave
>>> it for a while and then it starts happening more frequently for a
>>> period and then it's fine again. Increasing the cpu-load didn't seem
>>> to make it happen more frequently but that is just from observation.
>>>
>>> The serial driver is 8250, as it's a TI CPU the driver is the OMAP version.
>>> The Linux kernel is 4.1 LTSI.
>>
>> Does this happen in 4.7?  4.8-rc?
>>
>> 4.1 is really old, loads of things have been fixed since then...
>
> I could build a newer Kernel but we have committed to 4.1 LTSI and I
> have seen that a lot has changed in the newer kernels (including RS485
> half duplex support) which looks like it won't back port too easily.
>>
>>> I'm at a loss on how to debug this, I tried putting printk's in the
>>> tty_buffer code to see if the actual message being passed in from the
>>> serial driver contained the full message to try and work out if the
>>> driver was the problem but I didn't get any output in dmesg.
>>>
>>> One thing I also noticed that maybe completely unrelated is that I
>>> accidentally turned on the echo on the receiving end
>>> stty -F /dev/ttyS2 echo
>>> and then transmitted a long string
>>>  echo '123456789012345678901234567890123456789012345678x012345678' > /dev/ttyS3
>>> With this I always get
>>> 123456789012345678901234567890123456789012345678x
>>> The x is the 49th byte.
>>
>> echo and cat are not good things to use for serial ports, they do not
>> handle error handling properly or flow control.  So you can't really
>> know much about this.
>
> A lot of people would agree with you on this :)  cat and echo probably
> aren't the best tools but they were easiest to hand that could
> reproduce the failure.  The original failure was detected using a
> Modbus RTU stack but it took ages to fire.  I have managed to get
> printk working and have tracked the problem down to the driver so I
> should be able to find the problem now. If it's anything that I
> haven't done with my half duplex changes I'll report back but I
> suspect this is where the problem lies.
>
>>
>> thanks,
>>
>> greg k-h
>
> Cheers,
> Martin.

The logic analyser traces showed that the byte wasn't lost on
transmission so must be on reception.  Looking through the OMAP 8250
driver I see nothing obvious but I did see that DMA was supported
(CONFIG_SERIAL_8250_DMA) so I tried this and it's been running fine
under cpu load for a while now.  In case this is useful to anyone else
that comes across the same problem I had to force the OMAP_DMA_TX_KICK
habit in the probe function.

// if (of_machine_is_compatible("ti,am33xx"))
    priv->habit |= OMAP_DMA_TX_KICK;

for the AM4378 device we are using.  You also have to add the relevant
dma properties to the device tree. for me it was
&uart2 {
...
    dmas = <&edma 30 &edma 31>;
    dma-names = "tx", "rx";
};

Cheers, Martin.
--
To unsubscribe from this list: send the line "unsubscribe linux-serial" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux PPP]     [Linux FS]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Linmodem]     [Device Mapper]     [Linux Kernel for ARM]

  Powered by Linux