On Sat, Mar 24, 2018 at 10:05:42AM -0700, Darrick J. Wong wrote: > On Fri, Mar 09, 2018 at 10:47:04AM -0500, Brian Foster wrote: > > On Fri, Mar 09, 2018 at 09:10:38AM +1100, Dave Chinner wrote: > > > On Thu, Mar 08, 2018 at 02:11:13PM -0500, Brian Foster wrote: > > > > On Thu, Mar 08, 2018 at 03:40:17PM +1100, Dave Chinner wrote: > > > > > On Tue, Mar 06, 2018 at 06:12:45PM -0800, Darrick J. Wong wrote: > > > > > > On Tue, Mar 06, 2018 at 08:16:02PM -0500, Brian Foster wrote: > > > > > > > On Wed, Mar 07, 2018 at 09:14:41AM +1100, Dave Chinner wrote: > > > > > > > > On Tue, Mar 06, 2018 at 07:31:56AM -0500, Brian Foster wrote: > > > > > > > My argument is not that it's not possible. My argument is that the > > > > > > > semantics of XFS_LID_DIRTY suggest the transaction modified the object > > > > > > > in memory. Today, that means we log a range of the object, log an > > > > > > > invalidation or order a buffer. E.g., we modify an actual metadata > > > > > > > object, dirty the item (log or order), dirty the lidp and dirty the > > > > > > > transaction. If we didn't ever modify a particular object, then there's > > > > > > > no reason to dirty it that I can see. Failing to log/order the object > > > > > > > properly doesn't justify assuming it hasn't been modified IMO, we don't > > > > > > > really know either way. > > > > > > > > > > > > > > The way the flag is used seems to bear that out. It's set when an object > > > > > > > has been modified by a transaction. The way the flag alters behavior > > > > > > > suggests the same. Objects that have been dirtied by a transaction > > > > > > > cannot be released from that transaction and a cancel of a transaction > > > > > > > with a dirty lidp causes fs shutdown (because we dirty the transaction > > > > > > > once we dirty a lidp). > > > > > > > > > > I'll point out that I've just stumbled onto a series of bugs where > > > > > log items are multiply joined to a single transaction, in which case > > > > > the lidp state may not reflect the current state of the log item > > > > > because the log item no longer points to the lidp in question. > > > > > > > > > > This also raises questions about what happens when we process a log > > > > > item twice in the cil commit infrastructure - two formatting passes, > > > > > multiple inserts into the CIL list, multiple calls to > > > > > iop_committing/iop_unlock when the transaction is freed, etc. > > > > > There's lots of shit that could go wrong as a result of this type of > > > > > bug... > > > > > > > > > > > > A cancel of that same transaction would shutdown the fs because it > > > > > > > dirtied (i.e., presumably modified) an object. So we can't cancel the > > > > > > > transaction for risk of corruption, we can't release the object from the > > > > > > > transaction, yet this patch proposes behavior where a commit of that > > > > > > > transaction can silently undirty the lidp, commit whatever else might be > > > > > > > in the transaction and carry on as if nothing were wrong. > > > > > > > > > > > > > > At the very least, this is inconsistent with how this flag is used > > > > > > > everywhere else. How do you explain that? > > > > > > > > > > > > So the state "li dirty, lip !ordered, niovecs == 0" is an invalid state, > > > > > > and this patch proposes that if we ever see this invalid state then we > > > > > > decide that no the buffer isn't dirty since there are zero iovecs. This > > > > > > prevents the log from allocating anything for this item since there's no > > > > > > evidence of anything being dirty. > > > > > > > > > > Essentially. The log item dirty state is the thing we trust right > > > > > through the log item life cycle. It's fundamental to the relogging > > > > > algorithm we use to keep dirty metadata moving forwards through the > > > > > log. > > > > > > > > > > The log item descriptor, OTOH, was just an abstraction that allowed > > > > > the transaction commit to couple the log item formatting to the > > > > > xlog_write() vector calls, which is something that went away with > > > > > delayed logging about 8 years ago. The only piece of the log item > > > > > descriptor that remained was the dirty flag, and the issue here > > > > > boils down to one simple question: which dirty state do we trust - > > > > > the log item or the descriptor? > > > > > > > > > > That, as an architectural question, is a no brainer. It's the log > > > > > item state that matters. The log item descriptor is an abstraction > > > > > long past it's use-by date, so I'm going to resolve this problem > > > > > simply by removing it (if you are wondering how I found the > > > > > mulitply-joined log item bugs...). > > > > > > > > > > > > > Ooh, we're in danger of making some progress here... :P > > > > > > > > So this all suggests to me that you see the lidp dirty state as > > > > duplicative with the log item dirty state. > > > > > > Well, sort of. > > > > > > > That is quite different from > > > > the perception I have from reading the code (a perception which I tried > > > > my best to describe so you, or somebody, could set me straight if > > > > necessary). Then again... > > > > > > The log item descriptor used to link the transaction to the log item > > > as it passed though the journal. Transactions used to be freed on > > > log IO completion - their life cycle changed drastically with > > > delayed logging, such that they are now freed by xfs_trans_commit(), > > > rather than being attached to the iclogbuf (as the CIL checkpoint > > > transaction now is) and freed on log IO completion. i.e. their > > > functionality was effective replaced by the xfs_log_vec that we now > > > allocate and format during transaction commit. > > > > > > > Ok, that makes sense. Pre-CIL is before my time, but we essentially > > disconnected the transaction from the full lifecycle up through log I/O > > completion and replaced it with the log vector, so we can reformat items > > that are relogged before the vectors are written out during a > > checkpoint. > > > > > IOWs, if we can't/don't create a xfs_log_vec that will be written to > > > the log during formatting, then it doesn't matter what the lid says > > > - there's nothing to write to the log, so we don't add the object to > > > the CIL.... > > > > > > > Yes, there is no point in adding a physically unlogged log item to the > > CIL. My question is bigger picture than the CIL.. > > > > > > > In doing this, we no longer have the question of which one to trust > > > > > - all the "dirty in transaction" state is carried on the log item > > > > > itself and is valid only during the life of a transaction. At > > > > > which point, there should be no possibility of the log item dirty > > > > > flag getting out of step with it's dirty state, and we can simply > > > > > add asserts in the write place to validate this. > > > > > > > > > > > > > This more describes the lidp object as duplicative, while the dirty > > > > state it supports may still be unique with respect to the lip dirty > > > > state (which is what I thought to begin with). In other words, we still > > > > need to differentiate between a dirty lip that might be in the log > > > > pipeline and a log item that is "dirty in a transaction," because afaict > > > > it's still possible to have a dirty log item in a clean transaction. > > > > > > Yes, that is possible. In which case, it doesn't get formatted again > > > by the current transaction commit because it's already been > > > formatted and tracked by the CIL/AIL correctly. It's essentially > > > just an optimisation to avoid redundant relogging of log items. > > > > > > > Ok, that's pretty much how I understood it. Thanks for describing some > > of this background/viewpoint. At the very least this starts to isolate > > where we are looking at this differently. > > > > > > So which of the above is more accurate? If the latter, doesn't the > > > > "dirty in transaction" state we'd move over to the lip reflect the same > > > > meaning as XFS_LID_DIRTY today? If so, ISTM that we should still not > > > > expect to have a "dirty in transaction" log item unless the log item > > > > itself is dirty. Hm? > > > > > > Yup, but now it's all held in the one object and there's no > > > possibility of getting that out of sync anymore as the state is > > > managed directly when the log item is either dirtied or removed > > > from the transaction. We're not reliant on managing lids correctly > > > for it to be correct. > > > > > > > Ok. FWIW, I think you're completely missing the point I'm trying to make > > with regard to this patch. It really has nothing to do with the CIL or > > the log item, or really the lidp for that matter if we think about in > > terms of the extra lidp background you've documented. Just assume the > > lidp is gone.. all that matters here are the fundamental "log item > > dirty" and "log item dirty in transaction" states and the distinction > > between them. > > > > The point is that if we have a transaction that holds an item "dirty in > > the transaction" but the item itself is not dirty, we should probably > > consider that the in-core metadata had been modified and treat the > > transaction as inconsistent because we only dirty an item in a > > transaction when it's associated metadata has been physically modified. > > Of course, we don't know that for sure in this "shouldn't currently > > happen" case, but we do already consider it an error to cancel a dirty > > transaction clearing the "dirty in transaction" state from a particular > > item is logically equivalent to cancelling that subset of the > > transaction. > > > > I completely agree that we should not insert that log vector in the CIL. > > That part seems obvious to me. I'm simply suggesting that rather than > > being so focused on skipping the particular lidp/lip, we probably > > shouldn't commit the entire transaction. Rather, force it into the error > > path that is analogous to a cancel (i.e., shutdown the fs). > > > > This is technically a trivial change to the current patch so I'm quite > > curious what drives resistance to the point of motivating elimination of > > the lidp. Of course all that work might be independently > > correct/overdue/worthwhile, but it's presented here as if it washes away > > the fundamental discrepancy I'm describing. In reality, so long as we > > maintain distinct "dirty log item" and "dirty in transaction" states, it > > doesn't change the point. > > Sooo... afaict, the upstream kernel doesn't seem to be at a loss for not > having either of these two patches. Dave's patch makes it so that if we > screw up the dirty state between the log item & log item descriptor > we'll trust the log item and not stumble into a crash. Brian's patch > below seems to have the viewpoint that if this happens it's evidence of > either a software bug or memory corruption, so let's go offline, at > least until we decide that we actually want to support that situation > because some upper level xfs code wants it, or someone rips out the log > item descriptors. > A software bug... sure, not so sure about a memory corruption. I suppose that is always possible but the point is more that a dirty in transaction item implies the in-core metadata structure may have been modified. We don't know for sure, but there is non-zero risk of filesystem corruption if we assume that it wasn't, leave it clean in-core and commit the rest of the transaction. > I think the main reason to take Dave's patch is a defensive one: if > anyone else screws up logging in this manner the kernel will stay up. > This seems reasonable to me. > Indeed. > Moving on to the second patch (below), is there any reason /not/ to > clean it up and pull it in at the same time? Nobody has come up with a > use case that requires a dirty log item descriptor and a clean log item, > so is there any reason why we shouldn't consider that state symptomatic > of /something/ being wrong in the kernel or memory and shutdown as a > precaution? > If so (if not? if there is a reason not to..? :P), I'm clearly not aware of it. The only state I can think of that is remotely close is an ordered item. Ordered items have a flag that essentially allows them to pass through to the AIL without being logged. Also note that the hunk below was not intended as an independent patch. It was a suggestion on how to address my feedback in the currently proposed patch. Brian > Ah well, I'll dump both of them in my tree and see if generic/388 has > anything interesting to say. :P > > --D > > > Brian > > > > P.S., IOW, fold something like the appended into the original patch. As > > easy as it is to add, it can just as easily be removed if "dirty in tx" > > + "clean item" turns into an explicitly handled state in the future. In > > the meantime, this turns potential bugs like the symlink bug in the buf > > log range patch into a shutdown that prevents the silent corruption that > > would occur if the transaction had committed. > > > > --- 8< --- > > > > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c > > index f190d1b84c0d..d5d8b88a8aaa 100644 > > --- a/fs/xfs/xfs_log_cil.c > > +++ b/fs/xfs/xfs_log_cil.c > > @@ -170,6 +170,15 @@ xlog_cil_alloc_shadow_bufs( > > } > > > > /* > > + * If the transaction dirtied an item but didn't tell us how to > > + * log it, something is seriously wrong. We have to assume the > > + * associated in-core metadata was modified. Don't risk > > + * corruption by committing the transaction. > > + */ > > + if (!niovecs && !ordered) > > + xfs_force_shutdown(log->l_mp, SHUTDOWN_CORRUPT_INCORE); > > + > > + /* > > * We 64-bit align the length of each iovec so that the start > > * of the next one is naturally aligned. We'll need to > > * account for that slack space here. Then round nbytes up > > @@ -352,6 +361,7 @@ xlog_cil_insert_format_items( > > * added to the CIL by mistake. > > */ > > if (!shadow->lv_niovecs && !ordered) { > > + ASSERT(XFS_FORCED_SHUTDOWN(log->l_mp)); > > lidp->lid_flags &= ~XFS_LID_DIRTY; > > continue; > > } > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html