On Wed, Sep 28, 2011 at 12:58:35PM -0700, Simon Kirby wrote: > On Tue, Sep 27, 2011 at 01:04:15PM -0400, Trond Myklebust wrote: > > > On Tue, 2011-09-27 at 09:49 -0700, Simon Kirby wrote: > > > On Tue, Sep 27, 2011 at 07:42:53AM -0400, Trond Myklebust wrote: > > > > > > > On Mon, 2011-09-26 at 17:39 -0700, Simon Kirby wrote: > > > > > Hello! > > > > > > > > > > Following up on "System CPU increasing on idle 2.6.36", this issue is > > > > > still happening even on 3.1-rc7. So, since it has been 9 months since I > > > > > reported this, I figured I'd bisect this issue. The first bisection ended > > > > > in an IPMI regression that looked like the problem, so I had to start > > > > > again. Eventually, I got commit b80c3cb628f0ebc241b02e38dd028969fb8026a2 > > > > > which made it into 2.6.34-rc4. > > > > > > > > > > With this commit, system CPU keeps rising as the log crunch box runs > > > > > (reads log files via NFS and spews out HTML files into NFS-mounted report > > > > > directories). When it finishes the daily run, the system time stays > > > > > non-zero and continues to be higher and higher after each run, until the > > > > > box never completes a run within a day due to all of the wasted cycles. > > > > > > > > So reverting that commit fixes the problem on 3.1-rc7? > > > > > > > > As far as I can see, doing so should be safe thanks to commit > > > > 5547e8aac6f71505d621a612de2fca0dd988b439 (writeback: Update dirty flags > > > > in two steps) which fixes the original problem at the VFS level. > > > > > > Hmm, I went to git revert b80c3cb628f0ebc241b02e38dd028969fb8026a2, but > > > for some reason git left the nfs_mark_request_dirty(req); line in > > > nfs_writepage_setup(), even though the original commit had that. Is that > > > OK or should I remove that as well? > > > > > > Once that is sorted, I'll build it and let it run for a day and let you > > > know. Thanks! > > > > It shouldn't make any difference whether you leave it or remove it. The > > resulting second call to __set_page_dirty_nobuffers() will always be a > > no-op since the page will already be marked as dirty. > > Ok, confirmed, git revert b80c3cb628f0ebc241b02e38dd028969fb8026a2 on > 3.1-rc7 fixes the problem for me. Does this make sense, then, or do we > need further investigation and/or testing? Just to clear up what I said before, it seems that on plain 3.1-rc8, I am actually able to clear the endless CPU use in nfs_writepages by just running "sync". I am not sure when this changed, but I'm pretty sure that some versions between 2.6.34 and 3.1-rc used to not be affected by just "sync" unless it was paired with drop_caches. Maybe this makes the problem more obvious... Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html