On Wed, Aug 17, 2011 at 11:49 PM, Michael Tokarev <mjt@xxxxxxxxxx> wrote: > 17.08.2011 21:02, Ted Ts'o wrote: > [] >> What I'd like to do long-term here is to change things so that (a) >> instead of instantiating the extent as uninitialized, writing the >> data, and then doing the uninit->init conversion to writing the data >> and then instantiated the extent as initialzied. This would also >> allow us to get rid of data=ordered mode. And we should make it work >> for fs block size != page size. >> >> It means that we need a way of adding this sort of information into an >> in-memory extent cache but which isn't saved to disk until the data is >> written. We've also talked about adding the information about whether >> an extent is subject to delalloc as well, so we don't have to grovel >> through the page cache and look at individual buffers attached to the >> pages. And there are folks who have been experimenting with an >> in-memory extent tree cache to speed access to fast PCIe-attached >> flash. >> >> It seems to me that if we're careful a single solution should be able >> to solve all of these problems... > > What about current situation, how do you think - should it be ignored > for now, having in mind that dioread_nolock isn't used often (but it > gives _serious_ difference in read speed), or, short term, fix this > very case which have real-life impact already, while implementing a > long-term solution? I plan to send my patch as a bandaid fix. It doesn't solve the fundamental problem but I think it helps close the race you saw on your test. In the long term, I agree that we should think about implementing an extent tree cache and use it to hold pending uninitialized-to-initialized extent conversions. Jiaying > > Thank you! > > /mjt > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html