[PATCH 2/3] md/r5cache: after recovery, increase journal seq by 1000

Song Liu <songliubraving@xxxxxx> · Wed, 7 Dec 2016 09:42:06 -0800

Currently, we increase journal entry seq by 10 after recovery.
However, this is not sufficient in the following case.

After crash the journal looks like

| seq+0 | +1 | +2 | +3 | +4 | +5 | +6 | +7 | ... | +11 | +12 |

If +1 is not valid, we dropped all entries from +1 to +12; and
write seq+10:

| seq+0 | +10 | +2 | +3 | +4 | +5 | +6 | +7 | ... | +11 | +12 |

However, if we write a big journal entry with seq+11, it will
connect with some stale journal entry:

| seq+0 | +10 |                     +11                 | +12 |

To reduce the risk of this issue, we increase seq by 1000 instead.

Signed-off-by: Song Liu <songliubraving@xxxxxx>
---
 drivers/md/raid5-cache.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 875f963..5301081 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -2003,8 +2003,8 @@ static int r5c_recovery_flush_log(struct r5l_log *log,
  * happens again, new recovery will start from meta 1. Since meta 2n is
  * valid now, recovery will think meta 3 is valid, which is wrong.
  * The solution is we create a new meta in meta2 with its seq == meta
- * 1's seq + 10 and let superblock points to meta2. The same recovery will
- * not think meta 3 is a valid meta, because its seq doesn't match
+ * 1's seq + 1000 and let superblock points to meta2. The same recovery
+ * will not think meta 3 is a valid meta, because its seq doesn't match
  */
 
 /*
@@ -2034,7 +2034,7 @@ static int r5c_recovery_flush_log(struct r5l_log *log,
  *   ---------------------------------------------
  *   ^                              ^
  *   |- log->last_checkpoint        |- ctx->pos+1
- *   |- log->last_cp_seq            |- ctx->seq+11
+ *   |- log->last_cp_seq            |- ctx->seq+1001
  *
  * However, it is not safe to start the state machine yet, because data only
  * parities are not yet secured in RAID. To save these data only parities, we
@@ -2045,7 +2045,7 @@ static int r5c_recovery_flush_log(struct r5l_log *log,
  *   -----------------------------------------------------------------
  *   ^                                                ^
  *   |- log->last_checkpoint                          |- ctx->pos+n
- *   |- log->last_cp_seq                              |- ctx->seq+10+n
+ *   |- log->last_cp_seq                              |- ctx->seq+1000+n
  *
  * If failure happens again during this process, the recovery can safe start
  * again from log->last_checkpoint.
@@ -2057,7 +2057,7 @@ static int r5c_recovery_flush_log(struct r5l_log *log,
  *   -----------------------------------------------------------------
  *                        ^                         ^
  *                        |- log->last_checkpoint   |- ctx->pos+n
- *                        |- log->last_cp_seq       |- ctx->seq+10+n
+ *                        |- log->last_cp_seq       |- ctx->seq+1000+n
  *
  * Then we can safely start the state machine. If failure happens from this
  * point on, the recovery will start from new log->last_checkpoint.
@@ -2157,8 +2157,8 @@ static int r5l_recovery_log(struct r5l_log *log)
 	if (ret)
 		return ret;
 
-       pos = ctx.pos;
-       ctx.seq += 10;
+	pos = ctx.pos;
+	ctx.seq += 1000;
 
 	if (ctx.data_only_stripes == 0) {
 		log->next_checkpoint = ctx.pos;
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html