On Tue, Dec 18, 2012 at 08:25:06AM -0600, Alex Elder wrote: > I was running xfstests on a 3.6-derived kernel and injecting > some errors. At some point a few of these surfaced as I/O > errors, which the generic buffer code complained about. > That's all fine (well, I think). An example: > > Buffer I/O error on device rbd2, logical block 3072 > Buffer I/O error on device rbd2, logical block 3073 > ... > > However, after a string of these, I got this: > > BUG: workqueue leaked lock or atomic: kworker/0:1/0x00000000/17554 > last function: xfs_end_io+0x0/0x110 [xfs] What are the errors leading up to this, and the full stack of the oops? > I haven't looked very hard at this yet because I wanted to > see if anyone had some quick info that would avoid me going > off in the wrong direction. > > The I/O error messages are generated in two spots (sadly, > identical error messages): > > end_buffer_write_sync() > end_buffer_async_write() > > The workqueue leaked message comes from process_one_work(), so the > xfs_end_io() is being called by the ioend work queue (not from > xfs_finish_ioend_sync()). > > So... I want to report this in case it's not been seen before. No, I haven't seen it before. Do you know what test is triggering it? If it's direct IO, I'm wondering if it might be caused by the nested transaction problem I recently fixed leaving an elevated freeze count behind.... > But I'm also trying to figure out whether the problem is likely > to lie in XFS, the generic buffer, code, or in the underlying > block device code. The latter is (of course) my assumption... > And any useful insights or suggestions how to proceed? I'd start by finding out what workqueue and work was just finished processed when the error occurs e.g. is it unwritten conversion, a buffered IO append transaction or a direct IO size update. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs