Bug in Au1x00 UART or USB drivers for 2.6 kernels?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



All,

After some thorough testing and exploration, I believe I have uncovered a bug in the Au1x00 driver code of either the UART or the USB host in the Linux 2.6 kernels.  In short, this bug causes corruption of the data being returned from the UART whenever both it and the USB are in use.

The issue I’m seeing is pretty simple to describe.  When I have a data stream running over a UART, and I introduce traffic on the USB, the data returning from my UART is corrupted.  (I will describe what the corruption looks like shortly.)  Even simple events such as hotplugging a device can and will create the corruption in the UART’s serial stream.  As expected, the amount of corruption seems dependent upon the amount of traffic on the USB.
 
I discovered this issue while working with a DBAu1500 running a buildroot package that contains a Linux 2.6.21 kernel (patched for the MIPS from the linux-mips.org site) and Busybox 1.7.3.  To keep YAMON from bombing during builds, I’m still using a trusty gcc 3.4.5 compiler to build it all.

To determine if the issue was particular to just that board or that kernel source, I ran the same tests on two other DBAu1500s I have lying around here.  I tested two 2.6 kernels (a 2.6.17 and the 2.6.21) and one 2.4 kernel I used several years ago.  Both 2.6 kernels displayed the same issue.  The 2.4.26 kernel, on the other hand, worked flawlessly.  This does not really surprise me, as I’m presently tempted to believe that the issue somehow has to do with the interplay between the USB and the Au1x00 UART support that is now integrated into the standard 8250 driver.

To really monitor the byte sequences returning, I wrote a simple, multi-threaded test app designed to test the loopback of the UART.  It works by generating a 4k packet containing a repeating alphabet sequence [ABCDEF…XYZAB…], sending it over the UART, and then reading it back again.  (The loopback is achieved by tying the TX and RX leads of the UART together.)  On the receive side, the receive buffer is initialized with (0x20) characters, so I know whether bytes were skipped or misread.  Once an entire 4k packet has been sent and received, the TX and RX threads exit and the main application compares their results byte-for-byte.  Any discrepancies are reported, and then the process repeats.

(To aid in testing, I’ve included the code from this test app, as well as instructions for its use, at the conclusion of this email.)

Now, the corruption I’m seeing looks like this:  (All of this data was taken by running my test code on ttyS1(tts/1) and simply plugging and unplugging a USB 802.11 wireless device and connecting it to a WAP.)

<<<<<<<<<<< start: code output>>>>>>>>>>>>>
Iteration: 197...done, cross-checking...done, MATCHED!
- RX ERROR: '' found in read return.  Byte 81 of the 113 bytes read does not fit between 'A' & 'Z'!
done, cross-checking...

Index:  TX:     RX:
3568    'G'[47] ''[00]
3569    'H'[48] 'G'[47]
3570    'I'[49] 'H'[48]
3571    'J'[4a] 'I'[49]
3572    'K'[4b] 'J'[4a]
3573    'L'[4c] 'K'[4b]
3574    'M'[4d] 'L'[4c]
3575    'N'[4e] 'M'[4d]
3576    'O'[4f] 'N'[4e]
3577    'P'[50] 'O'[4f]
3578    'Q'[51] 'P'[50]
3579    'R'[52] 'Q'[51]
3580    'S'[53] 'R'[52]
3581    'T'[54] 'S'[53]
3582    'U'[55] 'T'[54]
3583    'V'[56] 'U'[55]

        *** 16 Errors detected @ iteration 197! ***
Iteration: 200...done, cross-checking...done, MATCHED!
- RX ERROR: '' found in read return.  Byte 57 of the 105 bytes read does not fit between 'A' & 'Z'!
done, cross-checking...

Index:  TX:     RX:
2176    'S'[53] ''[00]
2177    'T'[54] 'S'[53]
2178    'U'[55] 'T'[54]
2179    'V'[56] 'U'[55]
2180    'W'[57] 'V'[56]
2181    'X'[58] 'W'[57]
2182    'Y'[59] 'X'[58]
2183    'Z'[5a] 'Y'[59]
2184    'A'[41] 'Z'[5a]
2185    'B'[42] 'A'[41]
2186    'C'[43] 'B'[42]
2187    'D'[44] 'C'[43]
2188    'E'[45] 'D'[44]
2189    'F'[46] 'E'[45]
2190    'G'[47] 'F'[46]
2191    'H'[48] 'G'[47]

        *** 16 Errors detected @ iteration 200! ***
Iteration: 203...done, cross-checking...done, MATCHED!
- RX ERROR: '' found in read return.  Byte 81 of the 129 bytes read does not fit between 'A' & 'Z'!

- RX ERROR: '' found in read return.  Byte 113 of the 129 bytes read does not fit between 'A' & 'Z'!
done, cross-checking...

Index:  TX:     RX:
0832    'A'[41] ''[00]
0833    'B'[42] 'A'[41]
0834    'C'[43] 'B'[42]
0835    'D'[44] 'C'[43]
0836    'E'[45] 'D'[44]
0837    'F'[46] 'E'[45]
0838    'G'[47] 'F'[46]
0839    'H'[48] 'G'[47]
0840    'I'[49] 'H'[48]
0841    'J'[4a] 'I'[49]
0842    'K'[4b] 'J'[4a]
0843    'L'[4c] 'K'[4b]
0844    'M'[4d] 'L'[4c]
0845    'N'[4e] 'M'[4d]
0846    'O'[4f] 'N'[4e]
0847    'P'[50] 'O'[4f]
0864    'G'[47] ''[00]
0865    'H'[48] 'G'[47]
0866    'I'[49] 'H'[48]
0867    'J'[4a] 'I'[49]
0868    'K'[4b] 'J'[4a]
0869    'L'[4c] 'K'[4b]
0870    'M'[4d] 'L'[4c]
0871    'N'[4e] 'M'[4d]
0872    'O'[4f] 'N'[4e]
0873    'P'[50] 'O'[4f]
0874    'Q'[51] 'P'[50]
0875    'R'[52] 'Q'[51]
0876    'S'[53] 'R'[52]
0877    'T'[54] 'S'[53]
0878    'U'[55] 'T'[54]
0879    'V'[56] 'U'[55]

        *** 32 Errors detected @ iteration 203! ***
Iteration: 204...
<<<<<<<<<<< end: code output>>>>>>>>>>>>>

As you can see, when the error occurs it effects an entire 16-byte block, and only that 16-byte block.  If you look at the byte index numbers above, you will notice that the stream always resyncs after taking a hit.  So, the error is not just a dropped or inserted byte.  If either of those events occurred, I would expect the entire subsequent serial stream to be off by the number of characters dropped or added.  That does not seem to be happening here.

Moreover, as you can see, there is a definite pattern to the corruption within the block.  It is not random.  In a corrupted block, the first byte is cleared to “0” (a value my test code is not permitted to use.)  All subsequent bytes are then shifted 1 location to the right with the last byte being lost.

This is where my current work has brought me.  I am beginning to dive into the kernel source to see if I can track down the issue, but before doing so I figured I would pass this around to the gurus here to see what you thought.  By providing a solid description of the issue, as well as instructions for easily reproducing it, I’m hopeful that a resolution to it can be found soon.

Thanks for your help!
Nathan Eggan



PS - Here is the test code I’ve been using.  Please review it, and feel free to comment on it.  To compile it, I’ve been issuing “mipsel-linux-g++  -static –lpthread”.  (I ran -static against it to make it library independent when I was switching between 2.4 and 2.6 kernels.)

Test Code:
<<<<<<<<<<< start: test code>>>>>>>>>>>>>
// standard includes
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

// custom includes


/////////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////////////////////

enum PARITY_TYPE
{
	NO_PARITY,
	PARITY
};


/******************************************************************************
 *
 * Name: OpenSerialPort
 *
 * Purpose: This function opens a standard serial port
 *
 *****************************************************************************/
int OpenSerialPort (
    char *dev_name,
    int baud_rate,
    int num_bytes,
    PARITY_TYPE parity
    )
{
    struct termios options;
    int ret_val = 0, fd = -1;

    // open the proper port
	char full_dev_name[256];
	ret_val = sprintf(full_dev_name, "/dev/%s", dev_name);
	if ( (ret_val <= 0) || (full_dev_name == NULL) )
	{
		printf("OpenSerialPort(): Failed to open serial port: '%s': errno = '%s'\n", full_dev_name, strerror(errno));
		// will return fd = -1
	}
	else
	{
		fd = open(full_dev_name, O_RDWR | O_NOCTTY| O_NDELAY);

		// error checking
		if (fd < 0)
		{
			printf ("OpenSerialPort(): Failed to open serial port: '%s': errno = '%s'\n", full_dev_name, strerror(errno));
		}
		else // success, now configure the port
		{
			// set the port to block on read
			fcntl(fd, F_SETFL, 0);

			// get the port's current attribute set
			tcgetattr(fd, &options);
			// set up the baud rate - lock at 115200 for now
			int desired_baud_rate = 0;
			switch (baud_rate)
			{
				// put the most likely first
				case 57600:
						desired_baud_rate = B57600;
						break;

				case 115200:
						desired_baud_rate = B115200;
						break;

				case 230400:
						desired_baud_rate = B230400;
						break;

#if !defined(HOST)
				case 460800:
						desired_baud_rate = B460800;
						break;

				case 500000:
						desired_baud_rate = B500000;
						break;

				case 576000:
						desired_baud_rate = B576000;
						break;

				case 921600:
						desired_baud_rate = B921600;
						break;

				case 1000000:
						desired_baud_rate = B1000000;
						break;

				case 1152000:
						desired_baud_rate = B1152000;
						break;

				case 1500000:
						desired_baud_rate = B1500000;
						break;

				case 2000000:
						desired_baud_rate = B2000000;
						break;

				case 2500000:
						desired_baud_rate = B2500000;
						break;

				case 3000000:
						desired_baud_rate = B3000000;
						break;

				case 3500000:
						desired_baud_rate = B3500000;
						break;

				case 4000000:
						desired_baud_rate = B4000000;
						break;
#endif

				case 50:
						desired_baud_rate = B50;
						break;

				case 75:
						desired_baud_rate = B75;
						break;

				case 110:
						desired_baud_rate = B110;
						break;

				case 134:
						desired_baud_rate = B134;
						break;

				case 150:
						desired_baud_rate = B150;
						break;

				case 200:
						desired_baud_rate = B200;
						break;

				case 300:
						desired_baud_rate = B300;
						break;

				case 600:
						desired_baud_rate = B600;
						break;

				case 1200:
						desired_baud_rate = B1200;
						break;

				case 1800:
						desired_baud_rate = B1800;
						break;

				case 2400:
						desired_baud_rate = B2400;
						break;

				case 4800:
						desired_baud_rate = B4800;
						break;

				case 9600:
						desired_baud_rate = B9600;
						break;

				case 19200:
						desired_baud_rate = B19200;
						break;

				case 38400:
						desired_baud_rate = B38400;
						break;

			}

			if (desired_baud_rate)
			{
				cfsetispeed(&options, desired_baud_rate);
				cfsetospeed(&options, desired_baud_rate);
			}
			else
			{
				/* set baud to something that is defined */
				cfsetispeed(&options, B115200);
				cfsetospeed(&options, B115200);
			}

			// enable the receiver and set local mode
			options.c_cflag |= (CLOCAL | CREAD); // NEVER set this directly, use "|="
			// set the 8N1
			options.c_cflag &= ~(PARENB); // disable parity checking [N]
			options.c_cflag &= ~(CSTOPB); // disable 2 stop bits (means 1 stop bit) [1]
			options.c_cflag &= ~(CSIZE);  // mask the character bits
			options.c_cflag |= num_bytes;     // select 8 data bits [8]

			// output options -- post process output and map new lines to
			// carriage return-new lines
			options.c_oflag |= OPOST | ONLCR;

			// lock in the options
			tcsetattr(fd, TCSANOW, &options);


		}
	}

    return(fd);
}

/******************************************************************************
 *
 * Name: OpenRawSerialPort
 *
 * Purpose: This function opens a raw serial port
 *
 *****************************************************************************/
int OpenRawSerialPort (
    char *dev_name,
    int baud_rate,
    int num_bytes,
    PARITY_TYPE parity,
	unsigned char vmin,
	unsigned char vtime
    )
{

#define ALT_SERIAL_SETUP	(1)

    int fd = OpenSerialPort (dev_name, baud_rate, num_bytes, parity);

    if (fd>= 0)
    {
        struct termios options;

        // get the port's current attribute set
        tcgetattr(fd, &options);

#if ALT_SERIAL_SETUP
		options.c_lflag = 0;
		// ignore framing and parity errors
		options.c_iflag = IGNPAR;
        // disable output post-processing
		options.c_oflag = 0;
#else
        
		// disable H/W flow control
        options.c_cflag &= ~(CRTSCTS);
        // use RAW input - i.e. do NOT process the stream, simply pump the bytes in/out
        options.c_lflag &= ~(ICANON | ECHO | ECHOE | ISIG);
        // disable input parity check
        options.c_iflag &= ~(INPCK);
        // disable S/W flow control
        options.c_iflag &= ~(IXON | IXOFF | IXANY);
        // disable a bunch of stuff
        options.c_iflag &= ~(IGNBRK | BRKINT | INLCR | IGNCR | ICRNL | IUCLC | IMAXBEL);
        // disable output post-processing
        options.c_oflag &= ~(OPOST);
#endif
        // set the return rules for read()
        options.c_cc[VMIN] = vmin; // defaults to MAX_INPUT; at least MAX_INPUT bytes must be read;
        options.c_cc[VTIME] = vtime; // defaults to 1 => inter-byte delay = 1/10 sec

		// flush read or written data
		tcflush(fd, TCIFLUSH);
        // lock in the options
        tcsetattr(fd, TCSANOW, &options);
    }

    return (fd);
}


/////////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////////////////////

//#define DEBUG


#define STRING_SIZE (4096)
#define MAX_TIMEOUTS (10)

#ifdef DEBUG
#define DEBUG_PRINT		(printf)
#else
#define DEBUG_PRINT(...)
#endif

// globals
int fd;
pthread_t tx_thread, rx_thread;
unsigned char *send_string = NULL;
unsigned char *recv_string = NULL;

/******************************************************************************
 *
 * Name:	catchControlC
 *
 * Purpose:	Catches the control-c and triggers a shutdown event so the
 *			application shuts down gracefully.
 *
 *****************************************************************************/
void catchControlC(int)
{
	printf("\n- killing tx_thread...\n");fflush(stdout);
	pthread_kill(tx_thread, 0);
	printf("\n- killing rx_thread...\n");fflush(stdout);
	pthread_kill(rx_thread, 0);
	printf("\n- closing fd...\n");fflush(stdout);
	close(fd);

	if (send_string)
	{
		delete (send_string);
	}
	if (recv_string)
	{
		delete (recv_string);
	}

	exit(1);
}


void *tx(void *params)
{
	DEBUG_PRINT("TX thread\n");

	// unsigned char *send_string = (char *)params;
	if (send_string == NULL)
	{
		printf("ERROR: send_string = NULL\n");
		pthread_exit(NULL);
	}

	// scan for non-alphabet characters in received string
	for (int x=0; x < STRING_SIZE; x++)
	{
		if ( (send_string[x] < 'A') || (send_string[x]> 'Z') )
		{
			printf("\n- TX ERROR: '%c' found in read return!\n", send_string[x]);
		}
	}

	int num_bytes_written = 0, num_timeouts = 0;
	while ( (num_bytes_written < STRING_SIZE) && (num_timeouts < MAX_TIMEOUTS) )
	{
		int z = write(fd, &send_string[num_bytes_written], (STRING_SIZE - num_bytes_written));
		if (z == 0) // timed out
		{
			num_timeouts++;
			printf("- TX: %d of %d sent, timeouts: %d\n", num_bytes_written, STRING_SIZE, num_timeouts);fflush(stdout);
		}
		else if ( num_bytes_written < 0 )
		{
			printf("*** ERROR: Failed to send!\n");fflush(stdout);
			pthread_exit(NULL);
		}
		num_bytes_written += z;
		DEBUG_PRINT("- TX: %d of %d sent, timeouts: %d\n", num_bytes_written, STRING_SIZE, num_timeouts);fflush(stdout);
	}
	
	if (num_timeouts> MAX_TIMEOUTS)
	{
		printf("-- TX: Exceeded timeout allotment (%d)\n", MAX_TIMEOUTS);
	}

	pthread_exit(NULL);
}


void *rx(void *params)
{
	DEBUG_PRINT("RX thread\n");

	// unsigned char *recv_string = (char *)params;
	if (recv_string == NULL)
	{
		printf("ERROR: recv_string = NULL\n");
		pthread_exit(NULL);
	}

	int num_bytes_read = 0, num_timeouts = 0;
	while ( (num_bytes_read < STRING_SIZE) && (num_timeouts < MAX_TIMEOUTS) )
	{
		int y = read(fd, &recv_string[num_bytes_read], (STRING_SIZE - num_bytes_read));
		if (y == 0) // timed out
		{
			num_timeouts++;
			printf("- RX: %d of %d received, timeouts: %d\n", num_bytes_read, STRING_SIZE, num_timeouts);fflush(stdout);
		}
		else if ( y < 0 ) // error
		{
			printf("*** ERROR: Read Error!\n");fflush(stdout);
			pthread_exit(NULL);
		}

		// scan for non-alphabet characters in received string
		for (int x=0; x < y; x++)
		{
			if ( (recv_string[num_bytes_read + x] < 'A') || (recv_string[num_bytes_read + x]> 'Z') )
			{
				printf("\n- RX ERROR: '%c' found in read return.  Byte %d of the %d bytes read does not fit between 'A' & 'Z'!\n", recv_string[num_bytes_read + x], x, y);
			}
		}

		num_bytes_read += y;
		DEBUG_PRINT("- RX: %d of %d received, timeouts: %d\n", num_bytes_read, STRING_SIZE, num_timeouts);fflush(stdout);
	}
	
	if (num_timeouts> MAX_TIMEOUTS)
	{
		printf("-- TX: Exceeded timeout allotment (%d)\n", MAX_TIMEOUTS);
	}

	pthread_exit(NULL);
}


int main (int argc, char *argv[])
{
	int ret_val = -1;

	unsigned char c;
	send_string = (unsigned char*) new unsigned char[STRING_SIZE];
	recv_string = (unsigned char*) new unsigned char[STRING_SIZE];

	// random seed
	srand(time(NULL));

	// -- set up signal controls --
	if(signal(SIGINT, SIG_IGN) != SIG_IGN)
	{
		signal(SIGINT, catchControlC);
	}

	if (argc == 2) // app name + serial port
	{
		int num_loops = 0;

		DEBUG_PRINT("starting...\n");fflush(stdout);

		fd = OpenRawSerialPort(argv[1], 115200, CS8, NO_PARITY, 0, 10);
		if (fd < 0)
		{
			printf("*** ERROR: Failed to open serial port: '%s',(%d)\n", strerror(errno), fd);fflush(stdout);
			exit(-1);
		}

		while(1)
		{
			printf("Iteration: %d...", num_loops);fflush(stdout);

			/* setup send string */
// cap alphabet
			c = 'A';
			for (int x = 0; x < STRING_SIZE; x++)
			{
				send_string[x] = c++;
				if (c == '[') // 1 step past 'Z'
				{
					c = 'A';
				}
			}
	
			/* clear recv string */
			// fill with  characters
			memset(recv_string, ' ', STRING_SIZE);

	        /* create both threads - receiver first since its the consumer */
			if ( pthread_create(&rx_thread, NULL, rx, (void *)recv_string) != 0 )
			{
				exit(-1);
			}
			if ( pthread_create(&tx_thread, NULL, tx, (void *)send_string) != 0 )
			{
				exit(-1);
			}

			/* wait for threads to complete */
			pthread_join(tx_thread, NULL);
			pthread_join(rx_thread, NULL);
	
			printf("done, cross-checking...");fflush(stdout);
			
			int num_errors = 0, offset = 0;

			for (unsigned int x = 0; x < STRING_SIZE; x++)
			{
				if ( send_string[x] != recv_string[x] )
				{
					if (num_errors == 0)
					{
						printf("\n\nIndex:\tTX:\tRX:\n");fflush(stdout);
					}
					printf("%04d\t'%c'[%02x]\t'%c'[%02x]\n", x, send_string[x], send_string[x], recv_string[x], recv_string[x]);fflush(stdout);
					num_errors++;
				}
			}

			if (num_errors == 0)
			{
				printf("done, MATCHED!");
			}
			else
			{
				printf("\n\t*** %d Errors detected @ iteration %d! ***\n", num_errors, num_loops);
			}

			printf("\r");
			usleep(10);
			num_loops++;
		}
	}
	else // show usage
	{
		printf("Error: Usage: '%s ttySx'\n", argv[0]);
	}

	return(ret_val);
}

<<<<<<<<<<< end: test code>>>>>>>>>>>>>

_________________________________________________________________
Your smile counts. The more smiles you share, the more we donate.  Join in.
www.windowslive.com/smile?ocid=TXT_TAGLM_Wave2_oprsmilewlhmtagline

[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux