EDAC messages about corrected errors affect realtime response

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The EDAC code currently prints numerous lines on the console when corrected
errors occur.  This can be a problem for realtime systems, as calling printk()
will eat a chunk of time that may be critical to processing of the realtime
workload, even though the system can proceed normally.

I'm working on a solution that allows a user space logger program to collect
corrected error information without disrupting the system. This relies on
reporting such errors via sysfs instead of the console.

1.	The directories in which the files would reside use the existing
	sysfs hierarchy. For example, L2 cache files would be in:

		/sys/device/system/edac/cpu/L2

2.	Device-dependent error data files would be added in an appropriate
	sysfs directory.  So, L2 cache-specific error data files might be:

	o	data capture (32-bit or 64-bit data items)
	o	address capture (as many bits as the physical address)
	o	syndrome (format is device dependent)
	o	attributes (format is device dependent)

	The idea is that reading each file once will retrieve the tuple of
	error data items for a single correctable error.

3.	Each file added is backed by a small queue so that information for
	multiple errors can be retrieved. Reading a datum discards that
	item.

4.	A sequence number file is added that should be read at the
	same time as the error data files. The sequence number is incremented
	for each error, even if the error data had to be discarded to avoid
	queue overflow. This allows detection of queue overflow by the
	logger program.

5.	If a logger dies partway through reading the error data files, the
	data will no longer be synchronized. To address this, writing to
	the sequence number file will cause any out-of-synch error data items
	to be discarded. This will allow the next read of all files to obtain
	the next complete tuple of error data.

I would expect to keep the current console output as the default, but to be
able to select console output, sysfs output, or both.

Things I'd like feedback on:
1.	Is sysfs even a reasonable place for this?
2.	Is this a workable interface for this information? Note that, unlike
	the console, this is a lossy reporting mechanism.
3.	Other suggestions?

Note: There may be other subsystems that also use printk() to report on
corrected errors. These are also likely to pose an issue for realtime systems
and this may become a model for handle non-EDAC situations.
-- 
David VL
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux