On Fri, Sep 04, 2020 at 11:01:59PM +0800, Gao Xiang wrote: > On Fri, Sep 04, 2020 at 09:37:04AM -0400, Brian Foster wrote: > > On Fri, Sep 04, 2020 at 08:46:34PM +0800, Gao Xiang wrote: > ... > > > > > @@ -3001,21 +3011,19 @@ xlog_do_recovery_pass( > > > > > */ > > > > > h_size = be32_to_cpu(rhead->h_size); > > > > > h_len = be32_to_cpu(rhead->h_len); > > > > > - if (h_len > h_size) { > > > > > - if (h_len <= log->l_mp->m_logbsize && > > > > > - be32_to_cpu(rhead->h_num_logops) == 1) { > > > > > - xfs_warn(log->l_mp, > > > > > + if (h_len > h_size && h_len <= log->l_mp->m_logbsize && > > > > > + rhead->h_num_logops == cpu_to_be32(1)) { > > > > > + xfs_warn(log->l_mp, > > > > > "invalid iclog size (%d bytes), using lsunit (%d bytes)", > > > > > - h_size, log->l_mp->m_logbsize); > > > > > - h_size = log->l_mp->m_logbsize; > > > > > - } else { > > > > > - XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, > > > > > - log->l_mp); > > > > > - error = -EFSCORRUPTED; > > > > > - goto bread_err1; > > > > > - } > > > > > + h_size, log->l_mp->m_logbsize); > > > > > + h_size = log->l_mp->m_logbsize; > > > > > + rhead->h_size = cpu_to_be32(h_size); > > > > > > > > I don't think we should update rhead like this, particularly in a rare > > > > and exclusive case. This structure should reflect what is on disk. > > > > > > > > All in all, I think this patch should be much more focused: > > > > > > > > 1.) Add the bufsize parameter and associated corruption check to > > > > xlog_valid_rec_header(). > > > > 2.) Pass the related value from the existing calls. > > > > 3.) (Optional) If there's reason to revalidate after executing the mkfs > > > > workaround, add a second call within the branch that implements the > > > > h_size workaround. > > > > > > > > > > I moved workaround code to xlog_valid_rec_header() at first is > > > because in xlog_valid_rec_header() actually it has 2 individual > > > checks now: > > > > > > 1) check rhead->h_len vs rhead->h_size for each individual log record; > > > 2) check rhead->h_size vs the unique allocated buffer size passed in > > > for each record (since each log record has one stored h_size, > > > even though there are not used later according to the current > > > logic of xlog_do_recovery_pass). > > > > > > if any of the conditions above is not satisfied, xlog_valid_rec_header() > > > will make fs corrupted immediately, so I tried 2 ways up to now: > > > > > > - (v1,v2) fold in workaround case into xlog_valid_rec_header() > > > - (v3) rearrange workaround and xlog_valid_rec_header() order in > > > xlog_do_recovery_pass() and modify rhead->h_size to the > > > workaround h_size before xlog_valid_rec_header() validation > > > so xlog_valid_rec_header() will work as expected since it > > > has two individual checks as mentioned above. > > > > > > If there is some better way, kindly let me know :) and I'd like to > > > hear other folks about this in advance as well.... so I can go > > > forward since this part is a bit tricky for now. > > > > > > > My suggestion is to, at minimum, separate the above two logic changes > > into separate patches. Item #2 above is a functional check in that it > > ensures each record fits into the record buffer size we've allocated > > (based on h_size) at the start of recovery. This item is what my > > feedback above was referring to and I think is a fairly straightforward > > change. > > There are 3-value relationships now (rhead->h_len <= rhead->h_size <= > unique allocated buffer size [which is the same as rhead->h_size of > initial tail log record]). > > The only buffer overrun case which could do harm runtimely is > rhead->h_len > bufsize as I mentioned in the previous email > https://lore.kernel.org/r/20200902224726.GB4671@xxxxxxxxxxxxxxxxxx > > so I guess what you meant is > > - get h_size from the tail initial log record as buffer size (so > workaround needs to be settled here as well); > - call in xlog_valid_rec_header() to fail out rhead->h_len > > buffer size. > > as the 1st splited patch. Yet that's not the case of Item #2 > (rhead->h_size <= buffer size), just to confirm if my understanding > is correct. > Yep. > > > > Item #1 is a bit more nebulous. h_size refers to the iclog size used by > > the kernel that wrote to the log. I think it should be uniform across > > all records and AFAICT it doesn't serve a functional purpose (validation > > notwithstanding) for recovery beyond establishing the size of the buffer > > we should allocate to process records. The mkfs workaround implements a > > case where we have to ignore the on-disk h_size and use the in-core > > iclog size, and that is currently isolated to a single place. I'm not > > fundamentally against more h_size verification beyond current usage, but > > if that means we have to consider the workaround for multiple records > > (which is confusing and incorrect) or make subtle runtime changes like > > quietly tweaking the in-core record header structure from what is on > > disk, then I don't think it's worth the complexity. > > > > If we _really_ wanted to include such a change, I think a more > > appropriate validation check might be to similarly track the h_size from > > the initial record and simply verify that the values are uniform across > > all processed records. That way we don't conflict or impose the > > workaround logic on the underlying log format rules. It also catches the > > (unlikely) case that the mkfs workaround is applied on the first record > > incorrectly because h_size is corrupt and there are more records that > > conflict. Also note that v5 filesystems enforce record CRC checks > > anyways, so this still might be of limited value... > > based on the words above, my understanding is that the 2nd spilted > patch is used to do such extra check in xlog_valid_rec_header() to > > check buffer size == rhead->h_size all the time? > > so the relationship would be rhead->h_len <= buffer size = rhead->h_size. > > please correct me if I'm wrong since I'm a bit confusing about some > English expressions. Yeah, if my understanding is correct, I'd like > to hear if this 2nd spilted check patch is needed (it won't > cause any stability impact since rhead->h_size for log records > other than tail record isn't useful due to the current logic...) > Well I'm not totally convinced the second check is really needed (for the reasons you mentioned), so this was just an idea of something I think might be more acceptable than the current approach. The idea was just to make the validation ensure that h_size is the same across all log records and it would be completely indepedendent from the mkfs workaround and the h_len <= bufsize check. Brian > Thanks, > Gao Xiang > > > > > Brian > > > > > > Also, please test the workaround case to make sure it still works as > > > > expected (if you haven't already). > > > > > > ok, will double confirm this, thanks! > > > > > > Thanks, > > > Gao Xiang > > > > > > > > > > > Brian > > > > > > > > > > > > >