On Wed, Nov 01, 2006 at 07:41:53AM -0500, Chris Mason wrote: > On Wed, Nov 01, 2006 at 03:53:50PM +0530, Suparna Bhattacharya wrote: > > > > Really appreciate your biting the bullet on this one. I have a few > > minor comments/questions as I'm trying to compare this to what we > > have discussed in the past and the last attempt approach outlined in > > http://www.kernel.org/pub/linux/kernel/people/suparna/DIO-simplify.txt > > back in Feb this year. > > The main difference from DIO-simplify.txt is that I'm making truncate > wait on the placeholder pages instead of using i_alloc_sem. I haven't Ah OK, that is true. > coded the TAG_LOCKED part, but more on that below. As you > noticed, I'm using expanding truncates to update i_size for extending > the file, which has pluses and minuses. > > > > > I recall that the first time you experimented with dummy struct pages > > (placeholder pages as they are now called), you ended up putting it on > > hold due to regressions in performance which we attributed to radix > > tree insertion overheads. I think you mention in an earlier note that > > you've managed to optimize some of that -- I'm just wondering what helped > > there - radix_tree_preload or something more than that ? > > My original code was fairly dumb and didn't use gang lookups. It also > dropped the radix tree lock after every insert/delete. The code posted > here is about 2x as fast (or 1/2 the regression, depending on how you > look at it). For smaller read ios, the performance hit is about zero. > The cost of radix tree insertion is offset by less mutex and semaphore > locking. OK, so you think the main overheads had to do with radix-tree locking, rather than with the additional node insertions ? > > > > > This is sort of why we had discussed the alternative of tagged locking > > in the radix tree. > > Lookup and lock pages in the range by tagging the radix tree > > slots as "TAG_LOCKED", and locking pages that were present. > > Modify find_get_page et al to return a dummy locked cache page > > if TAG_LOCKED. (Alternatively put the check in add_to_page_cache, > > instead) > > Putting the check in add_to_page_cache is more consistent with what you > > do now. > > > > Where I think we were stuck at that point was that even this helps only upto > > a point in saving on leaf nodes, but did not avoid intermediate levels of > > nodes from being created. With Nick's changes to the radix tree .. I have > > been thinking that some kind of sliding height scheme could become possible. > > > > Anyway, the other reason for bringing this up is because I am wondering if > > some of the complexity in find_and_insert_placeholder_pages and a few other > > places could be reduced with a tagged locking approach where we do not > > actually stuff placeholder pages explicitly in the page cache and hence > > possibly can avoid some of the checks you have below. I need to take a closer > > look to be completely sure, but just a thought ... > > > > I guess the important thing at this point is to get the interfaces right > > in a way that allows for future optimization. > > Yes, my goal was to get the code fast enough for most machines, so that > we could test the result and see what impact it would have on the rest > of the page cache. From here we can decide if additional radix tuning > is a good idea, or if we just want to stick with the mutex/semaphore > combo in use now. Yup, that makes sense. The complexity in find_and_insert_placeholder_pages does worry me though, wouldn't want to end up debugging tricky problems in that code after all this. Do you have any ideas on that ? > > I suspect that we won't be able to tune the radix tree enough to make it > suitable for the huge SGI 512 cpu test cases. So we might want to stop > tuning now (letting XFS keep their current code) or switch away from > radix to something that can lock down an extent. > > Radix tree gang lookups, insertions and deletions all have room for > optimization though (even without the LOCKED tag). A new tag in the > radix tree isn't free, Andrew might prefer not to increase the space > usage across the board just for O_DIRECT. Yes that was a consideration on my mind as well. There are costs both ways. Regards Suparna > > -chris -- Suparna Bhattacharya (suparna@xxxxxxxxxx) Linux Technology Center IBM Software Lab, India - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html