Re: [PATCH v4] usbserial: cp210x - icount support for parity error checking

Johan Hovold <johan@xxxxxxxxxx> · Fri, 3 Jul 2020 09:45:39 +0200

On Wed, Jul 01, 2020 at 09:28:25PM +0200, Jerry wrote:
> Johan Hovold wrote on 7/1/20 5:42 PM:
> > It would be better if could detect both types of overrun.
> >
> > Did you try moving the purge command at close to after disabling the
> > uart?
> >
> > Or perhaps we can add a "dummy" comm-status command after disabling the
> > uart?

> I try to describe more details about this overrun problem:
> 1) I tried only CP2102 because our company uses it, I have no idea whether 
> the same apply for CP2104,2105... or not, I don't have another chip.
> 2) Maybe I should note I'm always using even parity (because of STM32 
> bootloader protocol).
> 3) I thought the problem is created by unreaded data when closing because 
> overrun was reported after closing if GET_COMM_STATUS shown positive 
> ulAmountInInQueue before closing. Later I discovered that if I close the 
> port, wait, send some data from outside, then open it, I will also get 
> HW_OVERRUN flag.
> 4) So currently I suppose that problem is usually created by any incoming 
> data after disabling interface. If I remove UART_DISABLE from close method, 
> no overrun ever reported.
> 5) Unfortunately this overrun is not reported immediately after port 
> opening but later after receiving first byte. I didn't find any way how to 
> purge nor dummy read this flag before transmitting data.
> 6) I didn't find this problem in a chip errata and nobody complaining about it.
> 7) I successfully reproduced the same problem in MS Windows 10. If some 
> data arrived to closed port, then I open it, everything is OK but after 
> sending request and receiving reply I often get overrun indication from 
> Win32 API ClearCommError()
> 8) I removed HW_OVERRUN checking because I don't want to break anything but 
> maybe Linux driver should work the same way as Windows and don't hide this 
> problem?
> 9) I needed just to detect parity error from user space and things 
> complicate.  :-)

Heh, yeah, it tends to be that way. :) But thanks for the great summary
of your findings!

I think it probably makes most sense to keep the error in as it's a
quirk of the device rather than risk missing an actual overrun.

The problem here is that we're sort of violating the API and turning
TIOCGICOUNT into a polling interface instead of just returning our error
and interrupt counters. This means will always undercount any errors for
a start.

The chip appears to have a mechanism for reporting errors in-band, but
that would amount to processing every character received to look for the
escape char, which adds overhead. I'm guessing that interface would also
insert an overrun error when returning the first character.

But perhaps that it was we should be using as it allows parity the
errors to be reported using the normal in-band methods. If we only
enable it when parity checking is enabled then the overhead seems
justified.

I did a quick test here and the event insertion appears to work well for
parity errors. Didn't manage to get it to report break events yet, and
modem-status changes are being buffered and not reported until the next
character. But in theory we could expand the implementation to provide
more features later.

I'll see if I can cook something up quickly.

> >>    /*
> >> - * Read how many bytes are waiting in the TX queue.
> >> + * Read how many bytes are waiting in the TX queue and update error counters.
> >>     */
> >> -static int cp210x_get_tx_queue_byte_count(struct usb_serial_port *port,
> >> -		u32 *count)
> >> +static int cp210x_get_comm_status(struct usb_serial_port *port,
> >> +		u32 *tx_count)
> >>    {
> >>    	struct usb_serial *serial = port->serial;
> >>    	struct cp210x_port_private *port_priv = usb_get_serial_port_data(port);
> >> @@ -855,7 +865,17 @@ static int cp210x_get_tx_queue_byte_coun
> >>    			0, port_priv->bInterfaceNumber, sts, sizeof(*sts),
> >>    			USB_CTRL_GET_TIMEOUT);
> >>    	if (result == sizeof(*sts)) {
> >> -		*count = le32_to_cpu(sts->ulAmountInOutQueue);
> >> +		u32 flags = le32_to_cpu(sts->ulErrors);
> >> +		if (flags & CP210X_SERIAL_ERR_BREAK)
> >> +			port->icount.brk++;
> >> +		if (flags & CP210X_SERIAL_ERR_FRAME)
> >> +			port->icount.frame++;
> >> +		if (flags & CP210X_SERIAL_ERR_QUEUE_OVERRUN)
> >> +			port->icount.overrun++;
> >> +		if (flags & CP210X_SERIAL_ERR_PARITY)
> >> +			port->icount.parity++;
> >> +		if (tx_count)
> >> +			*tx_count = le32_to_cpu(sts->ulAmountInOutQueue);
> >>    		result = 0;
> >>    	} else {
> >>    		dev_err(&port->dev, "failed to get comm status: %d\n", result);
> >> @@ -873,13 +893,26 @@ static bool cp210x_tx_empty(struct usb_s
> >>    	int err;
> >>    	u32 count;
> >>    
> >> -	err = cp210x_get_tx_queue_byte_count(port, &count);
> >> +	err = cp210x_get_comm_status(port, &count);
> >>    	if (err)
> >>    		return true;
> >>    
> >>    	return !count;
> >>    }
> > It seems a bit weird to be updating the error counters while polling
> > for tx-empty during close. How about having cp210x_get_comm_status()
> > take two u32 pointers for tx_count and errors and only pass in one or
> > the other from cp210x_tx_empty() and cp210x_get_icount() respectively?

> I suppose that GET_COMM_STATUS reads and CLEAR pending error flags (not 
> explained in datasheet but behaves that way). So if some errors are 
> reported during cp210x_tx_empty() it can be either counted immediately or 
> lost. I'm not sure if cp210x_tx_empty() is called ONLY during close but 
> seems be more consistent to count every detected error regardless who calls 
> GET_COMM_STATUS.

tx_empty() is called when waiting for data to be drained for example
during close but also on when changing terminal settings with TCSADRAIN
or on tcdrain().

Since we wouldn't be counting errors during normal operation it seems
arbitrary to be counting during these seemingly unrelated calls. Better
to only poll from TIOCGICOUNT, if we decide to go with that approach at
all.

> >> +static int cp210x_get_icount(struct tty_struct *tty,
> >> +		struct serial_icounter_struct *icount)
> >> +{
> >> +	struct usb_serial_port *port = tty->driver_data;
> >> +	int result;
> >> +
> >> +	result = cp210x_get_comm_status(port, NULL);
> >> +	if (result)
> >> +		return result;
> > And then you update the error counters here, preferably under the port
> > lock.

> I wasn't sure about port lock. I looked in several usb drivers (ftdi, 
> pl2303) and it seems that port->icount.xxx increments are usually done out 
> of lock. But if you wish I put increments into 
> spin_lock_irqsave(&port->lock, flags), correct?

Correct.

It's quite possible that that's missing elsewhere, but at least those
counters are updated in completion callbacks which do not run in
parallel unlike ioctls(), which are not serialised.

Johan