Thinko with per-bdi flusher threads

Jan Kara <jack@xxxxxxx> · Fri, 23 Jul 2010 18:12:39 +0200

  Hi,

  I had a look at the bug https://bugzilla.kernel.org/show_bug.cgi?id=16312
(because we got some reports against our distro kernels as well ;). The
culprit is that when a device inode gets dirty, we try to file it to
per-bdi queues of the bdi device inode describes. I.e., if say /dev/zero
gets dirty because someone does touch /dev/zero, we try to file dirty inode
to the /dev/zero's bdi which obviously complains.
  So a trivial reproducer is:
cd /tmp; mknod devzero c 1 5; touch devzero
(provided /tmp is on some normal filesystem such as ext3).

  The question is how to solve this problem. Adding /dev/zero to the lists
of "zero" bdi seems silly (we'd have to create writeback thread, write that
single inode and kill the thread) and conceptually wrong (the inode write
has to happen against the filesystem carrying the device node, not against
mapping->backing_dev of the inode).
  But there are more complicated cases. Think for example what should
happen if a filesystem on /dev/sda carries a device inode for /dev/sdb.
Then dirty pages of the device inode should be written by a per-bdi thread
for /dev/sdb but inode metadata should be written by a thread for /dev/sda. 
Not too nice either because the device inode would have to be in two queues
- one for data and one for metadata writeback. OTOH checks like
bdi_has_dirty_io() would correctly report whether there is some
modification pending against a bdi or not.
  A reasonable mildly hacky solution would be to file inode against parent
filesystem's bdi if mapping->backing_dev isn't capable of having dirty
pages and do writeback and against mapping->backing_dev otherwise. This
would mean that we would have to properly mark bdis like the one of
/dev/zero as not capable of writeback.
  Any opinions?

								Honza

-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html