On Thu, Feb 04, 2021 at 03:02:17AM +0800, Gao Xiang wrote: > Hi Darrick, > > On Wed, Feb 03, 2021 at 10:12:11AM -0800, Darrick J. Wong wrote: > > On Wed, Feb 03, 2021 at 10:51:46PM +0800, Gao Xiang wrote: > > ... > > > > > > > > > > + } > > > > > + > > > > > if (error) > > > > > goto out_trans_cancel; > > > > > } > > > > > @@ -137,15 +157,15 @@ xfs_growfs_data_private( > > > > > */ > > > > > if (nagcount > oagcount) > > > > > xfs_trans_mod_sb(tp, XFS_TRANS_SB_AGCOUNT, nagcount - oagcount); > > > > > - if (nb > mp->m_sb.sb_dblocks) > > > > > + if (nb != mp->m_sb.sb_dblocks) > > > > > xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS, > > > > > nb - mp->m_sb.sb_dblocks); > > > > > > > > Maybe use delta here? > > > > > > The reason is the same as above, `delta' here was changed due to > > > xfs_resizefs_init_new_ags(), which is not nb - mp->m_sb.sb_dblocks > > > anymore. so `extend` boolean is used (rather than just use delta > 0) > > > > Long question: > > > > The reason why we use (nb - dblocks) is because growfs is an all or > > nothing operation -- either we succeed in writing new empty AGs and > > inflating the (former) last AG of the fs, or we don't do anything at > > all. We don't allow partial growing; if we did, then delta would be > > relevant here. I think we get away with not needing to run transactions > > for each AG because those new AGs are inaccessible until we commit the > > new agcount/dblocks, right? > > > > In your design for the fs shrinker, do you anticipate being able to > > eliminate all the eligible AGs in a single transaction? Or do you > > envision only tackling one AG at a time? And can we be partially > > successful with a shrink? e.g. we succeed at eliminating the last AG, > > but then the one before that isn't empty and so we bail out, but by that > > point we did actually make the fs a little bit smaller. > > Thanks for your question. I'm about to sleep, I might try to answer > your question here. > > As for my current experiement / understanding, I think eliminating all > the empty AGs + shrinking the tail AG in a single transaction is possible, > that is what I'm done for now; > 1) check the rest AGs are empty (from the nagcount AG to the oagcount - 1 > AG) and mark them all inactive (AGs freezed); Add some words, there might raise up some additional assistance transactions (e.g. if we'd like to confirm bmbt has the only one extent rather than just do some basic math to confirm the whole AG is empty) we might need to put all AGFL free blocks from AGFL to bmbt as well. Yet that process is independent from the main shrinking transaction. And in principle have no visible impact to users. I'll reply the rest suggestions tomorrow, thanks for the review again! Thanks, Gao Xiang > 2) consume an extent from the (nagcount - 1) AG; > 3) decrease the number of agcount from oagcount to nagcount. > > Both 2) and 3) can be done in the same transaction, and after 1) the state > of such empty AGs is fixed as well. So on-disk fs and runtime states are > all in atomic. > > > > > There's this comment at the bottom of xfs_growfs_data() that says that > > we can return error codes if the secondary sb update fails, even if the > > new size is already live. This convinces me that it's always been the > > case that callers of the growfs ioctl are supposed to re-query the fs > > geometry afterwards to find out if the fs size changed, even if the > > ioctl itself returns an error... which implies that partial grow/shrink > > are a possibility. > > > > I didn't realize that possibility but if my understanding is correct > the above process is described as above so no need to use incremental > shrinking by its design. But it also support incremental shrinking if > users try to use the ioctl for multiple times. > > If I'm wrong, kindly point out, many thanks in advance! > > Thanks, > Gao Xiang >