let me back off slightly and say that in the SMP case in particular I don't have high confidence. For single-processor I thought we were in better shape. But where is the 'busy' code you refer to? Mark Studebaker wrote: > > I don't have any confidence in any of the code WRT locking / reentrancy / > simultaneous access. > > I don't know of specific problems but I haven't seen anything > that describes how the design avoids these problems either. > > merlin wrote: > > > > On this note, I was wondering if there are perhaps some > > reentrancy problems with the sensors code. It may just > > be that the AMD 756 is 'delicate', but the problem I > > see is that sensors will run fine for months on end, as > > long as there is just one process periodically reading > > the sensors. As soon as I introduce a second process > > (e.g., sensord with cron-based rrd logging), things > > eventually fall over (collisions, timeouts, smbus death, > > need to reboot for sensors to work again). > > > > Currently, our locking seems to be based around the > > fact that the chip will report itself 'busy' if another > > process is using it and in schedule_timeout(). Is > > there a higher-level lock somewhere else? If not, > > should there be? > > > > Another reproducible problem, that may just be a > > side-effect of the abort code I introduced, is that if > > I ^C a process reading the sensors (e.g., cat > > /proc/sys/dev/sensors/*/*) then it is not uncommon for > > timeout/smbus death to occur. I presume that > > schedule_timeout() will return early (interrupt) and > > things may go downhill from there. Could other > > interrupts cause similar problems? > > > > As I say, it may be a local problem; I'm just seeking > > input. > > > > Thanks, Merlin > > > > r/author at mordac.netroedge.com/2002.04.30/08:55:30 > > >Update of /home/cvs/lm_sensors2/kernel/busses > > >In directory mordac.netroedge.com:/tmp/cvs-serv22535 > > > > > >Modified Files: > > > i2c-amd756.c > > >Log Message: > > >mh: Increase AMD 7x6 timeout to 500ms from 100ms, add code to > > >abort/clear the bus on timeout. Appears? to improve reliability.