Hi Coly, Hmm, after a second thought, this problem sounds cant happen in the discard disabled reason: because the seq is a random number, get_random_bytes(&i->seq, sizeof(uint64_t)); So it's not possible to get a same random seq in last invalidated bucket and the new bucket. But what about the power-cut case? Yang 发件人:"杨东升" <dongsheng.yang@xxxxxxxxxxxx> 发送日期:2020-09-16 14:19:46 收件人:colyli <colyli@xxxxxxx> 抄送人:linux-bcache <linux-bcache@xxxxxxxxxxxxxxx> 主题:Fw:About bcache-check>Resending with no HTML format ... ... > > >Hi Coly and all, > I found there is an error message in our testing: > > >Sep 27 17:43:00 node-1 kernel: bcache: error on >c2914b7e-d665-4ec1-80e1-272755de19ef: unsupported bset version at bucket > 58290, block 0, 40818810 keys, disabling caching > > >I checked the code in bch_btree_node_read_done() around this message: > > 214 for (; > 215 b->written < btree_blocks(b) && i->seq == b->keys.set[0].data->seq; > 216 i = write_block(b)) { > 217 err = "unsupported bset version"; > 218 if (i->version > BCACHE_BSET_VERSION) > 219 goto err; > 220 >The problem is we found the i->seq is what we expected for this btree_node, but the version is not BCACHE_BSET_VERSION (1) > > > >I think there would be two reasons to cause this messages: >(1) cache discard is not enabled. > When we allocate a bucket, if we dont enable discard, there could be some outdated data in this bucket, > >and there is possibility that the location of i->seq is equal to what we expected, > >but that's really not an bset at all, so we will found version, magic and bset_csum are all unexpected, > >currently we will goto err and stop cache_set. > > >(2) power-cut. > When we are doing btree_node_write, if there is a power-cut happen, we could write a partial btree. > > >But when we meet this kind of problems, we cant use this cache device. There is no tool to recovery from this kind of problem. > >I think I can cook a bcache-check in bcache-tools, something like fsck. to check this kind of problem > >and allow user to repair it, warning on user force-repaire is risky. > > > >Please help to point out if there is something I am missing. > > > >Thanx >Dongsheng > > >