On 5/7/14, 8:42 PM, Dave Chinner wrote: > On Wed, May 07, 2014 at 06:29:15PM -0500, Eric Sandeen wrote: >> The way discontiguous buffers are currently handled in >> prefetch is by unlocking the prefetch tree and reading >> them one at a time in pf_read_discontig(), inside the >> normal loop of searching for buffers to read in a more >> optimized fashion. >> >> But by unlocking the tree, we allow other threads to come >> in and find buffers which we've already stashed locally >> on our bplist[]. If 2 threads think they own the same >> set of buffers, they may both try to delete them from >> the prefetch btree, and the second one to arrive will not >> find it, resulting in: >> >> fatal error -- prefetch corruption >> >> Fix this by maintaining 2 lists; the original bplist, >> and a new one containing only discontiguous buffers. >> >> The original list can be seek-optimized as before, >> and the discontiguous list can be read one by one >> before we do the seek-optimized reads, after all of the >> tree manipulation has been completed. > > Nice job finding the problem, Eric! It looks like your patch solves > the problem, but after considering this approach for a while I think > it's overkill. ;) Well, that's how it goes. :) > What the loop is trying to do is linearise all the IO and turn lots > of small IO into a single large IO, so if we grab all the discontig > buffers in the range, then do IO on them, then do the large IO, we > are effectively seeking all over that range, including backwards. > This is exactly the sort of problem the prefetch loop is trying to > avoid. mmmhm... OTOH, discontig buffers are ... fairly rare? And they do have to be read sometime. > So what I think is best is that we simply abort the pulling of new > buffers off the list when we hit a discontiguous buffer. Leave the > discontig buffer as the last on the list, and process the list as > per normal. Remove all the remaining buffers from the btree, then > drop the lock and do the pread64 call. I kind of half considered something like that, but the optimizing trims back num based on a few criteria, some involving the last buffer in the bplist. So it's going to require a bit more awareness I think. > Then, check the last buffer on the bplist - if it's the discontig > buffer (i.e. wasn't dropped during list processing), then issue the > discontig buffer IO. It should at least start as either sequential I > oor with a small forwards seek, so so shoul be as close to seek > optimised as we can get for such buffers. Then it can be removed > from the bplist, num decremented, the lock picked back up and the > large buffer read in via pread64() can be sliced and diced > appropriately... > > i.e. much less code, no need for a separate list, and the seeks > shoul dbe minimised as much as possible.... I'll give something like that a shot. Yeah, it felt a bit brute-force-y. -eric > Cheers, > > Dave. > _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs