Hi Ted and Jiaying, On 08/19/2011 02:54 AM, Jiaying Zhang wrote: > On Wed, Aug 17, 2011 at 11:49 PM, Michael Tokarev <mjt@xxxxxxxxxx> wrote: >> 17.08.2011 21:02, Ted Ts'o wrote: >> [] >>> What I'd like to do long-term here is to change things so that (a) >>> instead of instantiating the extent as uninitialized, writing the >>> data, and then doing the uninit->init conversion to writing the data >>> and then instantiated the extent as initialzied. This would also >>> allow us to get rid of data=ordered mode. And we should make it work >>> for fs block size != page size. >>> >>> It means that we need a way of adding this sort of information into an >>> in-memory extent cache but which isn't saved to disk until the data is >>> written. We've also talked about adding the information about whether >>> an extent is subject to delalloc as well, so we don't have to grovel >>> through the page cache and look at individual buffers attached to the >>> pages. And there are folks who have been experimenting with an >>> in-memory extent tree cache to speed access to fast PCIe-attached >>> flash. >>> >>> It seems to me that if we're careful a single solution should be able >>> to solve all of these problems... >> >> What about current situation, how do you think - should it be ignored >> for now, having in mind that dioread_nolock isn't used often (but it >> gives _serious_ difference in read speed), or, short term, fix this >> very case which have real-life impact already, while implementing a >> long-term solution? > I plan to send my patch as a bandaid fix. It doesn't solve the fundamental > problem but I think it helps close the race you saw on your test. In the long > term, I agree that we should think about implementing an extent tree cache > and use it to hold pending uninitialized-to-initialized extent conversions. Does Google has some plan of doing it recently? We used a large number of direct read, and we can arrange some resources to try to work it out. Thanks Tao -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html