On June 14, 2016 9:58:59 AM GMT-03:00, Brian Foster <bfoster@xxxxxxxxxx> wrote: >xfs_reserve_blocks() is responsible to update the XFS reserved block >pool count at mount time or based on user request. When the caller >requests to increase the reserve pool, blocks must be allocated from >the >global counters such that they are no longer available for general >purpose use. If the requested reserve pool size is too large, XFS >reserves what blocks are available. The implementation requires looking >at the percpu counters and making an educated guess as to how many >blocks to try and allocate from xfs_mod_fdblocks(), which can return >-ENOSPC if the guess was not accurate due to counters being modified in >parallel. > >xfs_reserve_blocks() retries the guess in this scenario until the >allocation succeeds or it is determined that there is no space >available >in the fs. While not easily reproducible in the current form, the retry >code doesn't actually work correctly if xfs_mod_fdblocks() actually >fails. The problem is that the percpu calculations use the m_resblks >counter to determine how many blocks to allocate, but unconditionally >update m_resblks before the block allocation has actually succeeded. >Therefore, if xfs_mod_fdblocks() fails, the code jumps to the retry >label and uses the already updated m_resblks value to determine how >many >blocks to try and allocate. If the percpu counters previously suggested >that the entire request was available, fdblocks_delta could end up set >to 0. In that case, m_resblks is updated to the requested value, yet no >blocks have been reserved at all. > >Refactor xfs_reserve_blocks() to use an explicit loop and make the code >easier to follow. Since we have to drop the spinlock across the >xfs_mod_fdblocks() call, use a delta value for m_resblks as well and >only apply the delta once allocation succeeds. > >Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> >--- > >This is something I had laying around from the thin block device >reservation stuff. That work introduced a more common >xfs_mod_fdblocks() >failure scenario that isn't as much of a problem with the current code, >but the current xfs_reserve_blocks() retry code is clearly broken and >so >should probably be fixed up. > >Brian > >fs/xfs/xfs_fsops.c | 105 >++++++++++++++++++++++++++++++----------------------- > 1 file changed, 60 insertions(+), 45 deletions(-) > >diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c >index b4d7582..003d180 100644 >--- a/fs/xfs/xfs_fsops.c >+++ b/fs/xfs/xfs_fsops.c >@@ -667,8 +667,11 @@ xfs_reserve_blocks( > __uint64_t *inval, > xfs_fsop_resblks_t *outval) > { >- __int64_t lcounter, delta, fdblks_delta; >+ __int64_t lcounter, delta; >+ __int64_t fdblks_delta = 0; > __uint64_t request; >+ __int64_t free; >+ int error = 0; > > /* If inval is null, report current values and return */ > if (inval == (__uint64_t *)NULL) { >@@ -682,24 +685,23 @@ xfs_reserve_blocks( > request = *inval; > > /* >- * With per-cpu counters, this becomes an interesting >- * problem. we needto work out if we are freeing or allocation >- * blocks first, then we can do the modification as necessary. >+ * With per-cpu counters, this becomes an interesting problem. we >need >+ * to work out if we are freeing or allocation blocks first, then we >can >+ * do the modification as necessary. > * >- * We do this under the m_sb_lock so that if we are near >- * ENOSPC, we will hold out any changes while we work out >- * what to do. This means that the amount of free space can >- * change while we do this, so we need to retry if we end up >- * trying to reserve more space than is available. >+ * We do this under the m_sb_lock so that if we are near ENOSPC, we >will >+ * hold out any changes while we work out what to do. This means that >+ * the amount of free space can change while we do this, so we need >to >+ * retry if we end up trying to reserve more space than is available. > */ >-retry: > spin_lock(&mp->m_sb_lock); > > /* > * If our previous reservation was larger than the current value, >- * then move any unused blocks back to the free pool. >+ * then move any unused blocks back to the free pool. Modify the >resblks >+ * counters directly since we shouldn't have any problems unreserving >+ * space. > */ >- fdblks_delta = 0; > if (mp->m_resblks > request) { > lcounter = mp->m_resblks_avail - request; > if (lcounter > 0) { /* release unused blocks */ >@@ -707,54 +709,67 @@ retry: > mp->m_resblks_avail -= lcounter; > } > mp->m_resblks = request; >- } else { >- __int64_t free; >+ if (fdblks_delta) { >+ spin_unlock(&mp->m_sb_lock); >+ error = xfs_mod_fdblocks(mp, fdblks_delta, 0); >+ spin_lock(&mp->m_sb_lock); >+ } >+ >+ goto out; >+ } > >+ /* >+ * If the request is larger than the current reservation, reserve the >+ * blocks before we update the reserve counters. Sample m_fdblocks >and >+ * perform a partial reservation if the request exceeds free space. >+ */ >+ error = -ENOSPC; >+ while (error == -ENOSPC) { Why don't you make this a "do { } while (error == -ENOSPC)"? xfs_mod_fdblocks() will already set the error at the end of that loop. Paulo -- Paulo Alcantara, HP Speaking for myself only. _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs