[PATCH] bcache: fix insane -ESRCH error in bch_journal_replay

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



bch_keylist_empty is checked to make sure key insert is successful

Signed-off-by: Minlan Wang <wangminlan@xxxxxxxxxxxxxxx>
---
We've been using a pretty old version of bcache (from
linux-4.18-rc8) and experiencing occasional data corruption
after a new register session of bcache.  It turns out the
data corruption is caused by this code in our old version of
bcache:

in bch_journal_replay:
		 ...
                 for (k = i->j.start;
                      k < bset_bkey_last(&i->j);
                      k = bkey_next(k)) {
                         trace_bcache_journal_replay_key(k);

                         bch_keylist_init_single(&keylist, k);

                         ret = bch_btree_insert(s, &keylist, i->pin, NULL);
                         if (ret)
                                 goto err;
		 ...
bch_journal_replay returns -ESRCH, then in run_cache_set:
		 ...
                 if (j->version < BCACHE_JSET_VERSION_UUID)
                         __uuid_write(c);

                 bch_journal_replay(c, &journal);
		 ...
There's no message warning about this key insert error in
journal replay, nor does run_cache_set check for return
value of bch_journal_replay either, so later keys in jset
got lost silently and cause data corruption.

We checked the key which caused this failure in
bch_journal_replay, it is a key mapped to 2 btree node, and
the first btree node is full.
The code path causing this problem is:
1. k1 mapped to btree node n1, n2
2. When mapping into n1, bch_btree_insert_node goes into
   split case, top half of k1, say k1.1, is inserted into
   splitted n1: say n1.1, n1.2. Since the bottom half of k1,
   say k1.2, is left in insert_keys, so bch_btree_insert_node
   returns -EINTR.
   This happens in this code:
   in bch_btree_insert_node:
        ...
        } else {
                /* Invalidated all iterators */
                int ret = btree_split(b, op, insert_keys, replace_key);

                if (bch_keylist_empty(insert_keys))
                        return 0;
                else if (!ret)
                        return -EINTR;
                return ret;
        }
        ...
3. For return value -EINTR, another
   bch_btree_map_nodes_recurse in bcache_btree_root is
   triggered, with the wrong "from" key: which is
   START_KEY(&k1), instead of START_KEY(&k1.2)
4. n1.2 is revisted, since it has no overlapping range for
   k1.2, bch_btree_insert_keys sets
   op->insert_collision = true
   in the following code path:
   btree_insert_fn
   --> bch_btree_insert_node
     --> bch_btree_insert_keys
5. n2 is visisted and k1.2 is inserted here
6. bch_btree_insert detects op.op.insert_collision, and
   returns -ESRCH

In this case, the key is successfully inserted, though
bch_btree_insert returns -ESRCH.

We tried the latest version of bcache in linux mainline,
which has commit ce3e4cfb59cb (bcache: add failure check to
run_cache_set() for journal replay), the cache disk register
failed with this message:
bcache: replay journal failed, disabling caching

This is not what is expected, since the data in disk in
intact, journal replay should success.
I'm wondering if the checking of bch_btree_insert's return
value is really neccessary in bch_journal_replay, could we
just check bch_keylist_empty(&keylist) and make the replay
continue if that is empty?

We are using this patch for a work around for this issue now.

 drivers/md/bcache/journal.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 61bd79babf7a..35b8cbcd73c5 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -380,6 +380,8 @@ int bch_journal_replay(struct cache_set *s, struct list_head *list)
 			bch_keylist_init_single(&keylist, k);
 
 			ret = bch_btree_insert(s, &keylist, i->pin, NULL);
+			if ((ret == -ESRCH) && (bch_keylist_empty(&keylist)))
+				ret = 0;
 			if (ret)
 				goto err;
 
-- 
2.18.4






[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux