On Fri, 17 Aug 2007 06:24:47 +0400 Alex Tomas <alex@xxxxxxxxxxxxx> wrote: > Andrew Morton wrote: > > On Thu, 16 Aug 2007 22:20:06 +0400 > > Alex Tomas <alex@xxxxxxxxxxxxx> wrote: > > > >> Andrew Morton wrote: > >>>>> But under this proposal, t_sync_datalist just gets removed: the new > >>>>> ordered-data mode _only_ need to do the sb->inode->page walk. So if I'm > >>>>> understanding you, the way in which we'd handle any such race is to make > >>>>> kjournald's writeback of the dirty pages block in lock_page(). Once it > >>>>> gets the page lock it can look to see if some other thread has mapped the > >>>>> page to disk. > >>>> if I'm right holding number of pages locked, then they won't be locked, but > >>>> writeback. of course kjournald can block on writeback as well, but how does > >>>> it find pages with *newly allocated* blocks only? > >>> I don't think we'd want kjournald to do that. Even if a page was dirtied > >>> by an overwrite, we'd want to write it back during commit, just from a > >>> quality-of-implementation point of view. If we were to leave these pages > >>> unwritten during commit then a post-recovery file could have a mix of > >>> up-to-five-second-old data and up-to-30-seconds-old data. > >> trying to implement this I've got to think that there is one significant > >> difference between t_sync_datalist and sb->inode->page walk: t_sync_datalist > >> is per-transaction. IOW, it doesn't change once transaction is closed. in > >> contrast, nothing (currently) would prevent others to modify pages while > >> commit is in progress. > > > > That can happen at present - there's nothing to stop a process from modifying > > a page which is undergoing ordered-data commit-time writeout. > > I tend to think it's still a bit different: set of pages doesn't change with > t_sync_datalist. with sb->inode->page approach even silly dd will be able to > *add* a bunch of new pages while we're syncing first ones. why shouldn't we > fix this? > Sort-of. But the per-superpblock, per-inode writeback code is pretty careful to avoid livelocks. The per-inode writeback is a strict single linear sweep across the file. It'll basically write out anything which was dirty when it was called. The per-superblock inode walk isn't as accurate as that, becuase of the difficulties of juggling list_heads. But we're slowly working on that, and I suspect it'll be ggod enough for ext3 purposes already. - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html