On Mon, 2007-02-19 at 21:31 +0000, Jörn Engel wrote: > Looks like I really write the first log-structured filesystem for Linux. > At least I can into a fairly arcane race that seems to be generic to all > of them. > > Writing when space is tight may involve calling the garbage collector. > The garbage collector will iget() random inodes, either to verify if a > block is valid or to copy the block around. At this point, all writes > to LogFS are serialized. > > __sync_single_inode() will first lock a random inode, then call > write_inode(), then unlock the inode. So we can get this: > > > __sync_single_inode() garbage collector > --------------------------------------------------------------------- > inode->i_state |= I_LOCK; ... > ... mutex_lock(&super->s_w_mutex); > write_inode(inode, wait); ... > ... iget(sb, ino); > mutex_lock(&super->s_w_mutex); ... > ... wait_on_inode(inode); > mutex_unlock(&super->s_w_mutex); > ... > ... > inode->i_state &= ~I_LOCK; > > > And once in a blue moon, those two will race for the same inode. As far > as I can see, the race can only get fixed in two ways: > 1. Never iget() inside the garbage collector. That would require having > a private inode cache for LogFS. > 2. Synchonize __sync_single_inode() and the garbage collector somehow. > > Variant 1 would result in double caching for the same object, something > I would like to avoid. So does anyone have suggestions how variant 2 > could be achieved? Essentially what I need is a way to say "don't sync > any inodes right now, I'll be back in 5 milliseconds or so". It'd be nice if you could drop s_w_mutex when the garbage collector calls i_get(). Otherwise, you may be able to call ilookup5_nowait() in the garbage collector, and skip that inode if I_LOCK is set. > > Jörn > -- David Kleikamp IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html