On Mon 31-12-12 17:36:21, Jan Kara wrote: > On Mon 24-12-12 15:55:41, Zheng Liu wrote: > > From: Zheng Liu <wenqing.lz@xxxxxxxxxx> > > > > Currently all unwritten extent conversion work is pushed into a workqueue to be > > done because DIO end_io is in a irq context and this conversion needs to take > > i_data_sem locking. But we couldn't take a semaphore in a irq context. After > > tracking all extent status, we can first convert this unwritten extent in extent > > status tree, and call aio_complete() and inode_dio_done() to notify upper level > > that this dio has done. We don't need to be worried about exposing a stale data > > because we first try to lookup in extent status tree. So it makes us to see the > > latest extent status. Meanwhile we queue a work to convert this unwritten > > extent in extent tree. After this improvement, reader also needn't wait this > > conversion to be done when dioread_nolock is enabled. > > > > CC: Jan Kara <jack@xxxxxxx> > > CC: "Darrick J. Wong" <darrick.wong@xxxxxxxxxx> > > CC: Christoph Hellwig <hch@xxxxxxxxxxxxx> > > Signed-off-by: Zheng Liu <wenqing.lz@xxxxxxxxxx> > > --- > > [Cc' to Jan, Darrick, and Christoph because Christoph is trying to handle > > O_(D)SYNC for AIO] > > > > Hi Jan, Darrick, and Christoph, > > > > This patch refines the unwritten extent conversion in ext4. Now we can call > > aio_complete() and inode_dio_done() in end_io function. I believe Christoph's > > patch also can work well after applied this patch. Could you please review > > this patch? > Umm, I don't understand one thing (please bear with me, I've not followed > extent status tree work in detail): After you report IO completion, you > must make sure subsequent read returns data you wrote. Thus you would need > to track also physical location of all extents that are written but not yet > converted in the extent status tree. I'm not sure which patches are in > flight but it's definitely not happening right now and it seems to me it > would complicate the extent status tree (effectively making a full extent > cache out of it, making it considerably heavier etc.). If extent tree is > really going that way, then what you propose is probably a good idea. I'm > still somewhat uneasy about completing the IO before it's really on disk > (we still need flushing of conversions in various places) but doing the > conversion before completing the IO has its own (locking) issues especially > for writeback path. So the solution using extent status tree is fine. Ah, the patch is part of a series which changes the extent tree :) OK, I'm looking into the patches... Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html