On Tue, Sep 20, 2016 at 10:13:30AM +1000, Dave Chinner wrote: > [ sorry to take so long to get back to this, Brian, I missed your > reply and only yesterday when I was sorting out for-next updates > that I still had this on my "for-review" patch stack. ] > No problem. I've been away anyways.. > On Mon, Aug 29, 2016 at 02:17:22PM -0400, Brian Foster wrote: > > On Mon, Aug 29, 2016 at 11:16:31AM +1000, Dave Chinner wrote: > > > On Thu, Aug 11, 2016 at 01:11:03PM -0400, Brian Foster wrote: > > > i.e. We are very careful to write commit records in the correct > > > order because that is what determines recovery order, but we don't > > > care what order we write the actual contents of the checkpoints or > > > whether they interleave with other checkpoints. As such, ophdrs > > > change transactions and LSNs without having actually completed > > > recovery of a checkpoint. > > > > > > I think writeback should occur when all the transactions with a > > > given lsn have been committed. I'm not sure there's a simple way to > > > track and detect this, but using the ophdrs to detect a change of > > > lsn to trigger buffer writeback does not look correct to me at this > > > point in time. > > > > > > > That is precisely the intent of this patch. What I think could be a > > problem is something like the following, if possible: > > > > CA CB CC CD > > +---------+--------+--+-------+--+--------+-------+--+--+ > > trans A trans B trans C trans C trans D > > Yes, that's possible. > Ok. > > Assume that trans A and trans B are within the same record and trans C > > is in a separate record. In that case, we commit trans A which populates > > buffer_list. We lookup trans C, note a new LSN and drain buffer_list. > > Then we ultimately commit trans B, which has the same metadata LSN as > > trans A and thus is a path to the original problem if trans B happened > > to modify any of the same blocks as trans A. > > Yes, that's right, we still are exposed to the same problem, and > there's much more convoluted versions of it possible. > > > Do note however that this is just an occurrence of the problem with log > > recovery as implemented today (once we update metadata LSNs, and is > > likely rare as I haven't been able to reproduce corruption in many > > tries). > > Yeah, it's damn hard to intentionally cause interleaving of > checkpoint and commit records these days because of the delayed > logging does aggregation in memory rather than in the log buffers > themselves. > Makes sense. > > If that analysis is correct, I think a straightforward solution > > might be to defer submission to the lookup of a transaction with a new > > LSN that _also_ corresponds with processing of a commit record based on > > where we are in the on-disk log. E.g.: > > > > if (log->l_recovery_lsn != trans->r_lsn && > > oh_flags & XLOG_COMMIT_TRANS) { > > error = xfs_buf_delwri_submit(buffer_list); > > ... > > } > > > > So in the above, we'd submit buffers for A and B once we visit the > > commit record for trans C. Thoughts? > > Sounds plausible - let me just check I understood by repeating it > back. Given the above case, we start with log->l_recovery_lsn set to > the lsn before trans A and an empty buffer list. > > 1. We now recover trans A and trans B into their respective structures, > but we don't don't add their dirty buffers to the delwri list yet - > they are kept internal to the trans. > > 2. We then see commit A, and because the buffer list is empty we > simply add them to the buffer list and update log->l_recovery_lsn to > point at the transaction LSN. > Right... > 3. We now see trans C, and start recovering it into an internal buffer > list. > > 4. Then we process commit B, see that there are already queued buffers > and so check the transaction LSN against log->l_recovery_lsn. They > are the same, so we simply add the transactions dirty buffers to > the buffer list. > Maybe just weird wording here, but to be precise (and pedantic), the top-level check is for the current LSN change, not necessarily whether the buffer_list is empty or not. The behavior is the same either way. > 5. We continue processing transaction C, and start on transaction D. > We then see commit C. Buffer list is populated, so we check > transaction lsn against log->l_recovery_lsn. They are different. > At this point we know we have fully processed all the transactions > that are associated with log->l_recovery_lsn, hence we can submit > the buffer_list and mark it empty again. > > 6. At this point we jump back to step 2, this time processing commit > C onwards.... > > 7. At the end of log recovery, we commit the remaining buffer list > from the last transaction we recovered from the log. > > Did I understand it right? If so, I think this will work just fine. > Yep, I think so. I'll send an updated version. Brian > Thanks, Brian! > > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs