On Mon, Jul 11, 2011 at 12:48 PM, Jan Kara <jack@xxxxxxx> wrote: > On Mon 11-07-11 10:11:17, Curt Wohlgemuth wrote: >> >> One other issue I have with sync as it's structured is that we don't >> >> do a WB_SYNC_ALL pass on any inode that's only associated with a block >> >> device, and not on a mounted filesystem. Blockdev mounts are >> >> pseudo-mounts, and are explicitly skipped in __sync_filesystem(). So >> >> if you've written directly to a block device and do a sync, the only >> >> pass over the pages for this inode are via the >> >> wakeup_flusher_threads() -- which operates on a BDI, regardless of the >> >> superblock, and uses WB_SYNC_NONE. >> >> >> >> All the sync_filesystem() calls are per-sb, not per-BDI, and they'll >> >> exclude pseudo-superblocks. >> >> >> >> I've seen cases in our modified kernels here at Google in which >> >> lilo/shutdown failed because of a lack of WB_SYNC_ALL writeback for >> >> /dev/sda (though I haven't been able to come up with a consistent test >> >> case, nor reproduce this on an upstream kernel). >> > Ho hum, I wonder what would be the cleanest way to fix this. I guess we >> > could make an exception for 'bdev' superblock in the code but it's ugly. >> >> Yes, it's ugly. >> >> You could declare that sync(2) is just doing it's job, that it's not >> designed to write dirty pages from devices that don't have filesystems >> mounted on them; that's what Christoph seems to be saying, and it's >> what the man pages for sync(2) that I've seen say as well. But >> everybody I've talked to about this is surprised, so you could declare >> it a bug as well :-) . >> >> It seems to me that sys_sync *could* just dispense with iterating over >> the superblocks, and just iterate over the BDIs, just as >> wakeup_flusher_threads() does. I.e., just do two passes over all BDIs >> -- one with WB_SYNC_NONE, and one with WB_SYNC_ALL. Well, not quite. >> It would still have to go to each SB and call the quota_sync and >> sync_fs callbacks, but those should be cheap. > Well, they're not exactly cheap (journaling filesystems have to force > transaction commits and wait for them) but that does not really matter. > The real problem is that to wait for IO completion you need some list of > inodes you want to wait for and you can currently get such list only from > superblock. Ah, of course, I realized that some days ago but forgot it -- something that seems to happen more than I'd like to admit to myself. Thanks, Curt > >> And yes, this would make sys_sync and sys_syncfs more different than >> they are today. > > Honza > -- > Jan Kara <jack@xxxxxxx> > SUSE Labs, CR > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html