On Sun, Jun 21, 2015 at 04:25:29PM -0400, Brian Foster wrote: > On Sun, Jun 21, 2015 at 02:27:22AM -0700, Christoph Hellwig wrote: > > This looks sensible to me, but I still can't make sense of the old > > code which just conditionally copied it even after taking a brief > > look at the pre-git history of this code. Does anyone understand why > > the code was like this? Fixing code that seems to have had an > > intention I can't make sense of always feel dangerous. > > > > Yeah, I couldn't really make much sense of the original code either. My > reasoning at the time was that the memcpy() seemed superfluous in this > case, as we wouldn't add anything to trans->r_itemq and end up doing the > memcpy() again the next time around. > > Taking another look at other xlog_recover_add_item() callsites, there is > this in the xlog_recover_add_to_cont_trans() case: > > if (list_empty(&trans->r_itemq)) { > /* finish copying rest of trans header */ > xlog_recover_add_item(&trans->r_itemq); > ptr = (xfs_caddr_t) &trans->r_theader + > sizeof(xfs_trans_header_t) - len; > memcpy(ptr, dp, len); > return 0; > } > > ... which according to the code/comment, seems to imply that the > transaction header could be split across op records..? I'm not terribly > familiar with the log on-disk format. Does this sound sane? Yes. Look at xfs_cil_push, where it formats the transaction header. It is just another 16 byte vector that is passed to xlog_write so it gets encapsulated in log opheaders. That means it can be split across multiple iclogs and log opheaders (i.e. can trigger the last_was_partial_copy case in xlog_write_setup_copy()). > If so, perhaps the patch is wrong and we should only bail if the length > in either of these two cases is obviously invalid (e.g., it exceeds the > size of the full in-core structure). The code currently assumes that if the magic number matches, then the length of the opheader will be in range. It's probably a fair assumption, as only a software bug in setting the opheader size in the log write code will cause problems. Other on-disk corruption (e.g. bit flips) will be caught by the log buffer CRCs.... However, we should really trigger a corruption if it does exceed the size of the transaction header. Similarly, we should do the same check on the other side of the partial trans header copy... > Perhaps there's also more > validation that can occur here: can we assert that this should mean > we're at the end of the operation record in the first short copy > instance? Not sure what you mean here. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs