On Fri, Apr 08, 2022 at 10:06:05AM +1000, Dave Chinner wrote: > On Thu, Apr 07, 2022 at 04:39:41PM -0700, Darrick J. Wong wrote: > > On Fri, Apr 08, 2022 at 07:56:05AM +1000, Dave Chinner wrote: > > > On Thu, Apr 07, 2022 at 01:46:56PM -0700, Darrick J. Wong wrote: > > > > @@ -1284,6 +1285,43 @@ xfs_rtmount_init( > > > > return 0; > > > > } > > > > > > > > +static inline int > > > > +xfs_rtalloc_count_frextent( > > > > + struct xfs_trans *tp, > > > > + const struct xfs_rtalloc_rec *rec, > > > > + void *priv) > > > > +{ > > > > + uint64_t *valp = priv; > > > > + > > > > + *valp += rec->ar_extcount; > > > > + return 0; > > > > +} > > > > + > > > > +/* Reinitialize the number of free realtime extents from the realtime bitmap. */ > > > > +STATIC int > > > > +xfs_rtalloc_reinit_frextents( > > > > + struct xfs_mount *mp) > > > > +{ > > > > + struct xfs_trans *tp; > > > > + uint64_t val = 0; > > > > + int error; > > > > + > > > > + error = xfs_trans_alloc_empty(mp, &tp); > > > > + if (error) > > > > + return error; > > > > > > At this point in the mount, the transaction subsystem is not > > > really available for use. I understand that this is an empty > > > transaction and it doesn't reserve log space, but I don't like > > > creating an implicit requirement that xfs_trans_alloc_empty() never > > > touches log state for this code to work correctly into the future. > > > > <nod> I'd wondered if it was really a good idea to be creating a > > transaction at the "end" of log recovery. Recovery itself does it, but > > at least that's a carefully controlled box... > > > > > That is, the log isn't in a state where we can run normal > > > transactions because we can still have pending intent recovery > > > queued in the AIL. These can't be moved from the tail of the AIL > > > until xfs_log_mount_finish() is called to remove and process them. > > > They are dependent on there being enough log space remaining for the > > > transaction reservation they make to succeed and there may only be > > > just enough space for the first intent to make that reservation. > > > Hence if we consume any log space before we recover intents, we can > > > deadlock on log space when then trying to recover the intents. And > > > because the AIL can't push on intents, the same reservation deadlock > > > could occur with any transaction reservation we try to run before > > > xfs_log_mount_finish() has been run. > > > > > > Yes, I know this is an empty transaction and so it doesn't push on > > > the log or consume any space. But I think it's a bad precedent even > > > to use transactions at all when we know we are in the middle of log > > > recovery and the transaction subsystem is not open for general use > > > yet. Hence I think using transaction handles - even empty ones - > > > before xfs_log_mount_finish() is run is a pattern we want to avoid > > > if at all possible. > > > > ...yes... > > > > > > + > > > > + xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL); > > > > + error = xfs_rtalloc_query_all(tp, xfs_rtalloc_count_frextent, &val); > > > > > > Looking at the code here, AFAICT the only thing the tp is absolutely > > > required for here is to hold the mount pointer - all the internal > > > bitmap walk operations grab and release buffers using interfaces > > > that can handle a null tp being passed. > > > > > > Hence it looks to me like a transaction context isn't necessary for > > > this read-only bitmap walk. Is that correct? Converting > > > xfs_rtalloc_query_all() to take a mount and a trans would also make > > > this code becomes a lot simpler.... > > > > ...and I was about to say "Yeah, we could make xfs_rtalloc_query_all > > take the mount and an optional transaction", but it occurred to me: > > > > The rtalloc queries need to read the rtbitmap file's data fork mappings > > into memory to find the blocks. If the rtbitmap is fragmented enough > > that the data fork is in btree format and the bmbt contains an upward > > cycle, loading the bmbt will livelock the mount attempt. > > "upward cycle", as in a corrupted btree? Yes. > > Granted, xfs_bmapi_read doesn't take a transaction, so this means that > > we'd either have to change it to take one, or we'd have to change the > > rtmount code to create an empty transaction and call iread_extents to > > detect a bmbt cycle. > > > > So yeah, I think the answer to your question is "yes", but then there's > > this other issue... :( > > Yup, but keep in mind that: > > xfs_fs_reserve_ag_blocks(tp == NULL) > xfs_ag_resv_init > xfs_finobt_calc_reserves > xfs_inobt_count_blocks > xfs_btree_count_blocks > > Will do a finobt btree walk without a transaction pointer if > we don't have finobt block counts recorded in the AGI. So it > seems to me that we are already at risk of cycles in corrupted > btrees being able to cause mount hangs. > > Hence from that perspective, I think worrying about a corrupted > rtbitmap BMBT btree is largely outside the scope of this patchset; > it's a generic issue with transaction-less btree walks, and one we > are already exposed to in the same phase of the mount/log recovery > process. Hence I think we should consider how to solve this problem > seperately rather that tie this patchset into twisty knots trying to > deal with here... <nod> I've long wanted to do a full audit of all the places we walk ondisk graph metadata without a transaction, but still haven't gotten to it... :( Anyway, I'll leave /that/ for a future series. In the meantime, I'll amend the rtalloc query functions to take mp and tp. --D > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx