On Tue 02-10-12 11:16:38, Dmitry Monakhov wrote: ... > > > +static int ext4_do_flush_completed_IO(struct inode *inode, > > > + while (!list_empty(&unwritten)) { > > > + io = list_entry(unwritten.next, ext4_io_end_t, list); > > > + BUG_ON(!(io->flag & EXT4_IO_END_UNWRITTEN)); > > > + list_del_init(&io->list); > > > + > > > + err = ext4_end_io(io); > > > + if (unlikely(!ret && err)) > > > + ret = err; > > > + > > > + list_add_tail(&io->list, &complete); > > > + } > > > + /* It is important to update all flags for all end_io in one shot w/o > > > + * dropping the lock.*/ > > Why? It would seem that once io structures are moved from > > i_completed_io_list, they are unreachable by anyone else? You seem to have missed this comment? > > > + spin_lock_irqsave(&ei->i_completed_io_lock, flags); > > > + while (!list_empty(&complete)) { > > > + io = list_entry(complete.next, ext4_io_end_t, list); > > > + io->flag &= ~EXT4_IO_END_UNWRITTEN; > > > + /* end_io context can not be destroyed now because it still > > > + * used by queued worker. Worker thread will destroy it later */ > > > + if (io->flag & EXT4_IO_END_QUEUED) > > > + list_del_init(&io->list); > > > + else > > > + list_move(&io->list, &to_free); > > > + } > > > + /* If we are called from worker context, it is time to clear queued > > > + * flag, and destroy it's end_io if it was converted already */ > > > + if (work_io) { > > > + work_io->flag &= ~EXT4_IO_END_QUEUED; > > > + if (!(work_io->flag & EXT4_IO_END_UNWRITTEN)) > > > + list_add_tail(&work_io->list, &to_free); > > > } > > > - list_del_init(&io->list); > > > spin_unlock_irqrestore(&ei->i_completed_io_lock, flags); > > > - (void) ext4_end_io_nolock(io); > > > - mutex_unlock(&inode->i_mutex); > > > -free: > > > - ext4_free_io_end(io); > > > + > > > + while (!list_empty(&to_free)) { > > > + io = list_entry(to_free.next, ext4_io_end_t, list); > > > + list_del_init(&io->list); > > > + ext4_free_io_end(io); > > > + } > > > + return ret; > > > +} > > > + > > > +/* > > > + * work on completed aio dio IO, to convert unwritten extents to extents > > > + */ > > > +static void ext4_end_io_work(struct work_struct *work) > > > +{ > > > + ext4_io_end_t *io = container_of(work, ext4_io_end_t, work); > > > + ext4_do_flush_completed_IO(io->inode, io); > > > +} > > > + > > > +int ext4_flush_completed_IO(struct inode *inode) > > > +{ > > > + return ext4_do_flush_completed_IO(inode, NULL); > > > } > > Also it seems that when ext4_flush_completed_IO() is called, workqueue > > can have several IO structures queued in its local lists thus we miss them > > here and don't properly wait for all conversions? > No it is not. Because list drained atomically, and > add_complete_io will queue work only if list is empty. > > Race between conversion and dequeue-process is not possible because > we hold lock during entire walk of complete_list, so from external > point of view we mark list as conversed(clear unwritten flag) > happens atomically. I've drawn all possible situations and race not > happen. If you know any please let me know. I guess I'm missing something obvious. So let's go step by step: 1) ext4_flush_completed_IO() must make sure there is no outstanding conversion for the inode. 2) Now assume we have non-empty i_completed_io_list - thus work is queued. 3) The following situation seems to be possible: CPU1 CPU2 (worker thread) (truncate) ext4_end_io_work() ext4_do_flush_completed_IO() spin_lock_irqsave(&ei->i_completed_io_lock, flags); dump_completed_IO(inode); list_replace_init(&ei->i_completed_io_list, &unwritten); spin_unlock_irqrestore(&ei->i_completed_io_lock, flags); ext4_flush_completed_IO() ext4_do_flush_completed_IO() - sees empty i_completed_io_list => exits But we still have some conversions pending in 'unwritten' list. What am I missing? Honza -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html