On Wed 10-11-10 15:37:29, Andrew Morton wrote: > On Wed, 10 Nov 2010 00:56:32 +0100 > Jan Kara <jack@xxxxxxx> wrote: > > On Tue 09-11-10 15:00:06, Andrew Morton wrote: > > > On Tue, 9 Nov 2010 23:28:27 +0100 > > > Jan Kara <jack@xxxxxxx> wrote: > > > > New description which should address above questions: > > > > Background writeback is easily livelockable in a loop in wb_writeback() by > > > > a process continuously re-dirtying pages (or continuously appending to a > > > > file). This is in fact intended as the target of background writeback is to > > > > write dirty pages it can find as long as we are over > > > > dirty_background_threshold. > > > > > > Well. The objective of the kupdate function is utterly different. > > > > > > > But the above behavior gets inconvenient at times because no other work > > > > queued in the flusher thread's queue gets processed. In particular, > > > > since e.g. sync(1) relies on flusher thread to do all the IO for it, > > > > > > That's fixable by doing the work synchronously within sync_inodes_sb(), > > > rather than twiddling thumbs wasting a thread resource while waiting > > > for kernel threads to do it. As an added bonus, this even makes cpu > > > time accounting more accurate ;) > > > > > > Please remind me why we decided to hand the sync_inodes_sb() work off > > > to other threads? > > Because when sync(1) does IO on it's own, it competes for the device with > > the flusher thread running in parallel thus resulting in more seeks. > > Skeptical. Has that effect been demonstrated? Has it been shown to be > a significant problem? A worse problem than livelocking the machine? ;) I don't think Jens has tested this when merging per-bdi writeback patches. Just the argument makes sense to me (I've seen some loads where application doing IO from balance_dirty_pages() together with pdflush doing writeback did seriously harm performance and this looks like a similar scenario). But I'd be also interested whether the theory and practice matches here so I'll try to do some test to compare the two approaches. > If this _is_ a problem then it's also a problem for fsync/msync. But > see below. > > > > > sync(1) can hang forever waiting for flusher thread to do the work. > > > > > > > > Generally, when a flusher thread has some work queued, someone submitted > > > > the work to achieve a goal more specific than what background writeback > > > > does. Moreover by working on the specific work, we also reduce amount of > > > > dirty pages which is exactly the target of background writeout. So it makes > > > > sense to give specific work a priority over a generic page cleaning. > > > > > > > > Thus we interrupt background writeback if there is some other work to do. We > > > > return to the background writeback after completing all the queued work. > > > > > > ... > > > > > So... what prevents higher priority works (eg, sync(1)) from > > > > > livelocking or seriously retarding background or kudate writeout? > > > > If other work than background or kupdate writeout livelocks, it's a bug > > > > which should be fixed (either by setting sensible nr_to_write or by tagging > > > > like we do it for WB_SYNC_ALL writeback). Of course, higher priority work > > > > can be running when background or kupdate writeout would need to run as > > > > well. But the idea here is that the purpose of background/kupdate types of > > > > writeout is to get rid of dirty data and any type of writeout does this so > > > > working on it we also work on background/kupdate writeout only possibly > > > > less efficiently. > > > > > > The kupdate function is a data-integrity/quality-of-service sort of > > > thing. > > > > > > And what I'm asking is whether this change enables scenarios in which > > > these threads can be kept so busy that the kupdate function gets > > > interrupted so frequently that we can have dirty memory not being > > > written back for arbitrarily long periods of time? > > So let me compare: > > What kupdate writeback does: > > queue inodes older than dirty_expire_centisecs > > while some inode in the queue > > write MAX_WRITEBACK_PAGES from each inode queued > > break if nr_to_write <= 0 > > > > What any other WB_SYNC_NONE writeback (let me call it "normal WB_SYNC_NONE > > writeback") does: > > queue all dirty inodes > > while some inode in the queue > > write MAX_WRITEBACK_PAGES from each inode queued > > break if nr_to_write <= 0 > > > > > > There only one kind of WB_SYNC_ALL writeback - the one which writes > > everything. > > fsync() uses WB_SYNC_ALL and it doesn't write "everything". Yes, but nobody submits fsync() work to a flusher thread (currently, there's no way to tell flusher thread to operate only on a single inode). But as you write below *if* somebody started doing it and fsync() would be called often enough, we would have a problem. > > So after WB_SYNC_ALL writeback (provided all livelocks are fixed ;) > > obviously no old data should be unwritten in memory. Normal WB_SYNC_NONE > > writeback differs from a kupdate one *only* in the fact that we queue all > > inodes instead of only the old ones. > > whoa. > > That's only true for sync(), or sync_filesystem(). It isn't true for, > say, fsync() and msync(). Now when someone comes along and changes > fsync/msync to use flusher threads ("resulting in less seeks") then we > could get into a situation where fsync-serving flusher threads never > serve kupdate? > > > We start writing old inodes first and > > OT, but: your faith in those time-ordered inode lists is touching ;) > Put a debug function in there which checks that the lists _are_ > time-ordered, and call that function from every site in the kernel > which modifies the lists. I bet there are still gremlins. ;-) Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html