On Wed, 10 Nov 2010 00:56:32 +0100 Jan Kara <jack@xxxxxxx> wrote: > On Tue 09-11-10 15:00:06, Andrew Morton wrote: > > On Tue, 9 Nov 2010 23:28:27 +0100 > > Jan Kara <jack@xxxxxxx> wrote: > > > New description which should address above questions: > > > Background writeback is easily livelockable in a loop in wb_writeback() by > > > a process continuously re-dirtying pages (or continuously appending to a > > > file). This is in fact intended as the target of background writeback is to > > > write dirty pages it can find as long as we are over > > > dirty_background_threshold. > > > > Well. The objective of the kupdate function is utterly different. > > > > > But the above behavior gets inconvenient at times because no other work > > > queued in the flusher thread's queue gets processed. In particular, > > > since e.g. sync(1) relies on flusher thread to do all the IO for it, > > > > That's fixable by doing the work synchronously within sync_inodes_sb(), > > rather than twiddling thumbs wasting a thread resource while waiting > > for kernel threads to do it. As an added bonus, this even makes cpu > > time accounting more accurate ;) > > > > Please remind me why we decided to hand the sync_inodes_sb() work off > > to other threads? > Because when sync(1) does IO on it's own, it competes for the device with > the flusher thread running in parallel thus resulting in more seeks. Skeptical. Has that effect been demonstrated? Has it been shown to be a significant problem? A worse problem than livelocking the machine? ;) If this _is_ a problem then it's also a problem for fsync/msync. But see below. > > > sync(1) can hang forever waiting for flusher thread to do the work. > > > > > > Generally, when a flusher thread has some work queued, someone submitted > > > the work to achieve a goal more specific than what background writeback > > > does. Moreover by working on the specific work, we also reduce amount of > > > dirty pages which is exactly the target of background writeout. So it makes > > > sense to give specific work a priority over a generic page cleaning. > > > > > > Thus we interrupt background writeback if there is some other work to do. We > > > return to the background writeback after completing all the queued work. > > > > ... > > > > So... what prevents higher priority works (eg, sync(1)) from > > > > livelocking or seriously retarding background or kudate writeout? > > > If other work than background or kupdate writeout livelocks, it's a bug > > > which should be fixed (either by setting sensible nr_to_write or by tagging > > > like we do it for WB_SYNC_ALL writeback). Of course, higher priority work > > > can be running when background or kupdate writeout would need to run as > > > well. But the idea here is that the purpose of background/kupdate types of > > > writeout is to get rid of dirty data and any type of writeout does this so > > > working on it we also work on background/kupdate writeout only possibly > > > less efficiently. > > > > The kupdate function is a data-integrity/quality-of-service sort of > > thing. > > > > And what I'm asking is whether this change enables scenarios in which > > these threads can be kept so busy that the kupdate function gets > > interrupted so frequently that we can have dirty memory not being > > written back for arbitrarily long periods of time? > So let me compare: > What kupdate writeback does: > queue inodes older than dirty_expire_centisecs > while some inode in the queue > write MAX_WRITEBACK_PAGES from each inode queued > break if nr_to_write <= 0 > > What any other WB_SYNC_NONE writeback (let me call it "normal WB_SYNC_NONE > writeback") does: > queue all dirty inodes > while some inode in the queue > write MAX_WRITEBACK_PAGES from each inode queued > break if nr_to_write <= 0 > > > There only one kind of WB_SYNC_ALL writeback - the one which writes > everything. fsync() uses WB_SYNC_ALL and it doesn't write "everything". > So after WB_SYNC_ALL writeback (provided all livelocks are fixed ;) > obviously no old data should be unwritten in memory. Normal WB_SYNC_NONE > writeback differs from a kupdate one *only* in the fact that we queue all > inodes instead of only the old ones. whoa. That's only true for sync(), or sync_filesystem(). It isn't true for, say, fsync() and msync(). Now when someone comes along and changes fsync/msync to use flusher threads ("resulting in less seeks") then we could get into a situation where fsync-serving flusher threads never serve kupdate? > We start writing old inodes first and OT, but: your faith in those time-ordered inode lists is touching ;) Put a debug function in there which checks that the lists _are_ time-ordered, and call that function from every site in the kernel which modifies the lists. I bet there are still gremlins. > go inode by inode writing MAX_WRITEBACK_PAGES from each. Now because the > queue can be longer for normal WB_SYNC_NONE writeback, it can take longer > before we return to the old inodes. So if normal writeback interrupts > kupdate one, it can take longer before all data of old inodes get to disk. > But we always get the old data to disk - essentially at the same time at > which kupdate writeback would get them to disk if dirty_expire_centisecs > was 0. > > Is this enough? Do you want any of this in the changelog? > > Thanks for the inquiry btw. It made me cleanup my thoughts on the subject ;) > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html