On 5/8/14, 8:17 PM, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > The way discontiguous buffers are currently handled in prefetch is > by unlocking the prefetch tree and reading them one at a time in > pf_read_discontig(), inside the normal loop of searching for buffers > to read in a more optimized fashion. > > But by unlocking the tree, we allow other threads to come in and > find buffers which we've already stashed locally on our bplist[]. > If 2 threads think they own the same set of buffers, they may both > try to delete them from the prefetch btree, and the second one to > arrive will not find it, resulting in: > > fatal error -- prefetch corruption > > To fix this, simply abort the buffer gathering loop when we come > across a discontiguous buffer, process the gathered list as per > normal, and then after running the large optimised read, check to > see if the last buffer on the list is a discontiguous buffer. > If is is discontiguous, then issue the discontiguous buffer read > while the locks are not held. We only ever have one discontiguous > buffer per read loop, so it is safe just to check the last buffer in > the list. > > The fix is loosely based on a a patch provided by Eric Sandeen, who > did all the hard work of finding the bug and demonstrating how to > fix it. Ok, this makes sense to me. The comment above the discontig read seems a bit confusing; you say it's safe to read while unlocked, but I wouldn't have expected it not to be - the lock is just for btree manipulation, and that's not being done. So I think the comment adds a little confusion rather than clarification. But the code looks fine. Reviewed-by: Eric Sandeen <sandeen@xxxxxxxxxx> > Reported-by:Eric Sandeen <sandeen@xxxxxxxxxx> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > --- > repair/prefetch.c | 53 +++++++++++++++++++++++++++-------------------------- > 1 file changed, 27 insertions(+), 26 deletions(-) > > diff --git a/repair/prefetch.c b/repair/prefetch.c > index 65fedf5..e269f1f 100644 > --- a/repair/prefetch.c > +++ b/repair/prefetch.c > @@ -444,27 +444,6 @@ pf_read_inode_dirs( > } > > /* > - * Discontiguous buffers require multiple IOs to fill, so we can't use any > - * linearising, hole filling algorithms on them to avoid seeks. Just remove them > - * for the prefetch queue and read them straight into the cache and release > - * them. > - */ > -static void > -pf_read_discontig( > - struct prefetch_args *args, > - struct xfs_buf *bp) > -{ > - if (!btree_delete(args->io_queue, XFS_DADDR_TO_FSB(mp, bp->b_bn))) > - do_error(_("prefetch corruption\n")); > - > - pthread_mutex_unlock(&args->lock); > - libxfs_readbufr_map(mp->m_ddev_targp, bp, 0); > - bp->b_flags |= LIBXFS_B_UNCHECKED; > - libxfs_putbuf(bp); > - pthread_mutex_lock(&args->lock); > -} > - > -/* > * pf_batch_read must be called with the lock locked. > */ > static void > @@ -496,13 +475,19 @@ pf_batch_read( > } > while (bplist[num] && num < MAX_BUFS && fsbno < max_fsbno) { > /* > - * Handle discontiguous buffers outside the seek > - * optimised IO loop below. > + * Discontiguous buffers need special handling, so stop > + * gathering new buffers and process the list and this > + * discontigous buffer immediately. This avoids the > + * complexity of keeping a separate discontigous buffer > + * list and seeking back over ranges we've already done > + * optimised reads for. > */ > if ((bplist[num]->b_flags & LIBXFS_B_DISCONTIG)) { > - pf_read_discontig(args, bplist[num]); > - bplist[num] = NULL; > - } else if (which != PF_META_ONLY || > + num++; > + break; > + } > + > + if (which != PF_META_ONLY || > !B_IS_INODE(XFS_BUF_PRIORITY(bplist[num]))) > num++; > if (num == MAX_BUFS) > @@ -570,6 +555,22 @@ pf_batch_read( > * now read the data and put into the xfs_but_t's > */ > len = pread64(mp_fd, buf, (int)(last_off - first_off), first_off); > + > + /* > + * Check the last buffer on the list to see if we need to > + * process a discontiguous buffer. The gather loop guarantees > + * we only ever have a single discontig buffer on the list, > + * and that it is the last buffer, so it is safe to do this > + * check and read here like this while we aren't holding any > + * locks. > + */ > + if ((bplist[num - 1]->b_flags & LIBXFS_B_DISCONTIG)) { > + libxfs_readbufr_map(mp->m_ddev_targp, bplist[num - 1], 0); > + bplist[num - 1]->b_flags |= LIBXFS_B_UNCHECKED; > + libxfs_putbuf(bplist[num - 1]); > + num--; > + } > + > if (len > 0) { > /* > * go through the xfs_buf_t list copying from the > _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs