On Fri, Dec 22, 2023 at 04:24:16PM -0800, Ira Weiny wrote: > Ira Weiny wrote: > > Ira Weiny wrote: > > > Coly, > > > > > > Yesterday I noticed that a few of our nvdimm tests were failing. I bisected > > > the problem to the following commit. > > > > > > aa511ff8218b ("badblocks: switch to the improved badblock handling code") > > > > > > Reverting this patch fixed our tests. > > > > > [snip] > > I added some prints[1] to try and see what is happening. Perhaps this will > help you. > > ... > [ 99.919237] IKW set_badblock 00000000aa44c55d 8000 1 > [ 99.921448] IKW set_badblock 00000000aa44c55d 8001 1 > [ 99.924051] IKW set_badblock 00000000aa44c55d 8002 1 > [ 99.926135] IKW set_badblock 00000000aa44c55d 8003 1 > [ 99.928516] IKW set_badblock 00000000aa44c55d 8004 1 > [ 99.930491] IKW set_badblock 00000000aa44c55d 8005 1 > [ 99.932894] IKW set_badblock 00000000aa44c55d 8006 1 > [ 99.936638] IKW set_badblock 00000000aa44c55d 8007 1 > [ 100.999297] IKW _badblocks_check() 00000000aa44c55d s 8000 num 1 > [ 101.000027] IKW table count 1 shift 0 > [ 101.000644] IKW 0: off 8000 end 8008 > [ 101.001271] IKW prev 0, cnt 1 > [ 101.002481] IKW start 8000, len 1 > [ 101.003464] IKW front overlap 0 > [ 101.004256] IKW rv 1 > > ... ^^^^^^^^^ > <This is a valid failure as part of the test> > ... > > [ 101.148783] IKW set_badblock 00000000721b4f3d 8000 1 > [ 101.150629] IKW set_badblock 00000000721b4f3d 8001 1 > [ 101.152315] IKW set_badblock 00000000721b4f3d 8002 1 > [ 101.154544] IKW set_badblock 00000000721b4f3d 8003 1 > [ 101.156238] IKW set_badblock 00000000721b4f3d 8004 1 > [ 101.158310] IKW set_badblock 00000000721b4f3d 8005 1 > [ 101.160196] IKW set_badblock 00000000721b4f3d 8006 1 > [ 101.162158] IKW set_badblock 00000000721b4f3d 8007 1 > [ 101.163543] IKW _badblocks_check() 00000000721b4f3d s 0 num 8 > [ 101.164427] IKW table count 1 shift 0 > [ 101.165310] IKW 0: off 8000 end 8008 > [ 101.166398] IKW prev -1, cnt 1 > [ 101.167178] IKW start 0, len 8 > [ 101.168107] IKW rv 0 > [ 101.168858] IKW _badblocks_check() 00000000721b4f3d s 8 num 8 > [ 101.169814] IKW table count 1 shift 0 > [ 101.170547] IKW 0: off 8000 end 8008 > [ 101.171238] IKW prev -1, cnt 1 > [ 101.171985] IKW start 8, len 8 > [ 101.173007] IKW front overlap -1 <== this is prev which is used to index bb->pages > [ 101.174157] IKW prev -1, cnt 1 > [ 101.175268] IKW start 9, len 7 > [ 101.176557] IKW rv -1 > > ... ^^^^^^^^^ > This is where the failure occurs. > ... > > I think overlap_front() is not working correctly in this case. And from > my reading of the code I don't know how it would. But overlap_front() is > used elsewhere and I'm not confident in making the change. > > Hope this helps, Hi Ira, The above information is accurate and very helpful, thank you! >From __badblocks_check(), the problematic code block is, 1303 re_check: 1304 bad.start = s; 1305 bad.len = sectors; 1306 1307 if (badblocks_empty(bb)) { 1308 len = sectors; 1309 goto update_sectors; 1310 } 1311 1312 prev = prev_badblocks(bb, &bad, hint); 1313 1314 /* start after all badblocks */ 1315 if ((prev + 1) >= bb->count && !overlap_front(bb, prev, &bad)) { 1316 len = sectors; 1317 goto update_sectors; 1318 } 1319 1320 if (overlap_front(bb, prev, &bad)) { 1321 if (BB_ACK(p[prev])) 1322 acked_badblocks++; 1323 else 1324 unacked_badblocks++; 1325 1326 if (BB_END(p[prev]) >= (s + sectors)) 1327 len = sectors; 1328 else 1329 len = BB_END(p[prev]) - s; 1330 1331 if (set == 0) { 1332 *first_bad = BB_OFFSET(p[prev]); 1333 *bad_sectors = BB_LEN(p[prev]); 1334 set = 1; 1335 } 1336 goto update_sectors; 1337 } 1338 1339 /* Not front overlap, but behind overlap */ 1340 if ((prev + 1) < bb->count && overlap_behind(bb, &bad, prev + 1)) { 1341 len = BB_OFFSET(p[prev + 1]) - bad.start; 1342 hint = prev + 1; 1343 goto update_sectors; 1344 } 1345 1346 /* not cover any badblocks range in the table */ 1347 len = sectors; 1348 1349 update_sectors: If the checking range is before all badblocks records in the badblocks table, value -1 is returned from prev_badblock(). Code blocks between line 1314 and line 1337 doesn't hanle the implicit '-1' value properly. Then counter unacked_badblocks is increased at line 1324 mistakenly. So the value prev should be checked and make sure '>= 0' before comparing the checking range with a badblock record returned by prev_badblocks(). Other wise it dones't make sense. For badblocks_set() and badblocks_clear(), 'prev < 0' is explicitly checked, value '-1' doesn't go though into following code. Could you please apply and try the attached patch? Hope it may help a bit. And now it is weekend time, you may be out of office and not able to access the testing hardware. I will do more testing from myside and update more info if necessary. Thanks for the report and debug! Coly Li [debug patch snipped]
>From 0f81e197404653a4631f5d5857c63510e0534430 Mon Sep 17 00:00:00 2001 From: Coly Li <colyli@xxxxxxx> Date: Sat, 23 Dec 2023 17:24:51 +0800 Subject: [PATCH] badblocks: avoid checking invalid range in badblocks_check() If prev_badblocks() returns '-1', it means no valid badblocks record before the checking range. It doesn't make sense to check whether the input checking range is overlapped with the non-existed invalid front range. This patch checkes whether 'prev >= 0' is true before calling overlap_front(), to void such invalid operations. Reported-by: Ira Weiny <ira.weiny@xxxxxxxxx> Signed-off-by: Coly Li <colyli@xxxxxxx> --- block/badblocks.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/block/badblocks.c b/block/badblocks.c index fc92d4e18aa3..db4ec8b9b2a8 100644 --- a/block/badblocks.c +++ b/block/badblocks.c @@ -1312,12 +1312,14 @@ static int _badblocks_check(struct badblocks *bb, sector_t s, int sectors, prev = prev_badblocks(bb, &bad, hint); /* start after all badblocks */ - if ((prev + 1) >= bb->count && !overlap_front(bb, prev, &bad)) { + if ((prev >= 0) && + ((prev + 1) >= bb->count) && !overlap_front(bb, prev, &bad)) { len = sectors; goto update_sectors; } - if (overlap_front(bb, prev, &bad)) { + /* Overlapped with front badblocks record */ + if ((prev >= 0) && overlap_front(bb, prev, &bad)) { if (BB_ACK(p[prev])) acked_badblocks++; else -- 2.35.3