On 5/25/16 2:31 PM, Brian Foster wrote: > Multi-block buffers are logged based on buffer offset in > xfs_trans_log_buf(). xfs_buf_item_log() ultimately walks each mapping in > the buffer and marks the associated range to be logged in the > xfs_buf_log_format bitmap for that mapping. This code is broken, > however, in that it marks the actual buffer offsets of the associated > range in each bitmap rather than shifting to the byte range for that > particular mapping. > > For example, on a 4k fsb fs, buffer offset 4096 refers to the first byte > of the second mapping in the buffer. This means byte 0 of the second log > format bitmap should be tagged as dirty. Instead, the current code marks > byte offset 4096 of the second log format bitmap, which is invalid and > potentially out of range of the mapping. > > As a result of this, the log item format code invoked at transaction > commit time is not be able to correctly identify what parts of the > buffer to copy into log vectors. This can lead to NULL log vector > pointer dereferences in CIL push context if the item format code was not > able to locate any dirty ranges at all. This crash has been reproduced > on a 4k FSB filesystem using 16k directory blocks where an unlink > operation happened not to log anything in the first block of the > mapping. The logged offsets were all over 4k, marked as such in the > subsequent log format mappings, and thus left the transaction with an > xfs_log_item that is marked DIRTY but without any logged regions. > > Further, even when the logged regions are marked correctly in the buffer > log format bitmaps, the format code doesn't copy the correct ranges of > the buffer into the log. This means that any logged region beyond the > first block of a multi-block buffer is subject to corruption after a > crash and log recovery sequence. This is due to a failure to convert the > mapping bm_len field from basic blocks to bytes in the buffer offset > tracking code in xfs_buf_item_format(). > > Update xfs_buf_item_log() to convert buffer offsets to segment relative > offsets when logging multi-block buffers. This ensures that the modified > regions of a buffer are logged correctly and avoids the aforementioned > crash. Also update xfs_buf_item_format() to correctly track the source > offset into the buffer for the log vector formatting code. This ensures > that the correct data is copied into the log. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > > Note that this has only been tested so far in a particular filesystem > environment that reproduces the crash/corruption discussed in the commit > log description. Generic testing still required, posting just to get a > jump on review... > > Brian > > fs/xfs/xfs_buf_item.c | 18 +++++++++++++----- > 1 file changed, 13 insertions(+), 5 deletions(-) > > diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c > index 3425799..2e95ad0 100644 > --- a/fs/xfs/xfs_buf_item.c > +++ b/fs/xfs/xfs_buf_item.c > @@ -359,7 +359,7 @@ xfs_buf_item_format( > for (i = 0; i < bip->bli_format_count; i++) { > xfs_buf_item_format_segment(bip, lv, &vecp, offset, > &bip->bli_formats[i]); > - offset += bp->b_maps[i].bm_len; > + offset += BBTOB(bp->b_maps[i].bm_len); Yep this and the other instance in the other hunk look obviously correct. > } > > /* > @@ -915,20 +915,28 @@ xfs_buf_item_log( > for (i = 0; i < bip->bli_format_count; i++) { > if (start > last) > break; > - end = start + BBTOB(bp->b_maps[i].bm_len); > + end = start + BBTOB(bp->b_maps[i].bm_len) - 1; > + So let's say start is 0, and len is 8, just making up nice numbers. That means a range of 0 through 7, with the next one starting at 8. 7 is the last block, so if "first" is beyond that, we need to skip. Makes sense. > + /* skip to the map that includes the first byte to log */ > if (first > end) { > start += BBTOB(bp->b_maps[i].bm_len); > continue; > } > + > + /* > + * Trim the range to this segment and mark it in the bitmap. > + * Note that we must convert buffer offsets to segment relative > + * offsets (e.g., the first byte of each segment is byte 0 of > + * that segment). > + */ > if (first < start) > first = start; > if (end > last) > end = last; > - > - xfs_buf_item_log_segment(first, end, > + xfs_buf_item_log_segment(first - start, end - start, > &bip->bli_formats[i].blf_data_map[0]); I guess this is the trickiest part. in the called function, we do: /* * Get a pointer to the first word in the bitmap * to set a bit in. */ word_num = first_bit >> BIT_TO_WORD_SHIFT; wordp = &map[word_num]; where first_bit is derived from the passed-in "first" argument (in bytes). So we're marking bits in map[]; this skips us out into the word_num index in map[]. But what's map? The map[] we passed in is &bip->bli_formats[i].blf_data_map[0], where 'i' is the i'th segment. So yes, this needs to be segment-relative bytes, whereas start (and therefore end) are offsets in the whole buffer. So yes, this makes sense to me as well. > > - start += bp->b_maps[i].bm_len; > + start += BBTOB(bp->b_maps[i].bm_len); Again obviously correct. Reviewed-by: Eric Sandeen <sandeen@xxxxxxxxxx> > } > } > > _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs