> 2022年6月13日 16:20,Zou Mingzhe <mingzhe.zou@xxxxxxxxxxxx> 写道: > > > 在 2022/6/6 18:34, Coly Li 写道: >> >>> 2022年6月6日 17:29,Zou Mingzhe <mingzhe.zou@xxxxxxxxxxxx> 写道: >>> >>> >>> >>> 在 2022/6/6 16:57, Coly Li 写道: >>>>> 2022年6月6日 16:45,mingzhe.zou@xxxxxxxxxxxx >>>>> 写道: >>>>> >>>>> From: mingzhe >>>>> <mingzhe.zou@xxxxxxxxxxxx> >>>>> >>>>> >>>>> >>>>> >>>> [snipped] >>>> >>>> >>>>> We want to use those invalid_uuid slots carefully. Because, the bkey of the inode >>>>> may still exist in the btree. So, we need to check the btree before reuse it. >>>>> >>>>> Signed-off-by: mingzhe >>>>> <mingzhe.zou@xxxxxxxxxxxx> >>>>> >>>>> --- >>>>> drivers/md/bcache/btree.c | 35 +++++++++++++++++++++++++++++++++++ >>>>> drivers/md/bcache/btree.h | 1 + >>>>> drivers/md/bcache/super.c | 15 ++++++++++++++- >>>>> 3 files changed, 50 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c >>>>> index e136d6edc1ed..a5d54af73111 100644 >>>>> --- a/drivers/md/bcache/btree.c >>>>> +++ b/drivers/md/bcache/btree.c >>>>> @@ -2755,6 +2755,41 @@ struct keybuf_key *bch_keybuf_next_rescan(struct cache_set *c, >>>>> return ret; >>>>> } >>>>> >>>>> +static bool check_pred(struct keybuf *buf, struct bkey *k) >>>>> +{ >>>>> + return true; >>>>> +} >>>>> + >>>>> +bool bch_btree_can_inode_reuse(struct cache_set *c, size_t inode) >>>>> +{ >>>>> + bool ret = true; >>>>> + struct keybuf_key *k; >>>>> + struct bkey end_key = KEY(inode, MAX_KEY_OFFSET, 0); >>>>> + struct keybuf *keys = kzalloc(sizeof(struct keybuf), GFP_KERNEL); >>>>> + >>>>> + if (!keys) { >>>>> + ret = false; >>>>> + goto out; >>>>> + } >>>>> + >>>>> + bch_keybuf_init(keys); >>>>> + keys->last_scanned = KEY(inode, 0, 0); >>>>> + >>>>> + while (ret) { >>>>> + k = bch_keybuf_next_rescan(c, keys, &end_key, check_pred); >>>>> + if (!k) >>>>> >>>> This is a single thread iteration, for a large filled cache device it can be very slow. I observed 40+ minutes during my testing. >>>> >>>> >>>> Coly Li >>>> >>>> > Hi, Coly > > I use a 200G cache device to test this patch. For faster testing, the bucket_size is set to 16k. > > ``` > > [root@node-3 ~]# lsblk -s /dev/bcache4 > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > bcache4 251:512 0 500G 0 disk > ├─rbd4 253:64 0 500G 0 disk > └─nvme1n1p6 259:12 0 200G 0 part > └─nvme1n1 259:6 0 1.8T 0 disk > > [root@node-3 ~]# bcache-super-show /dev/nvme1n1p6 > sb.magic ok > sb.first_sector 8 [match] > sb.csum 53512B9BA99771C5 [match] > sb.version 3 [cache device] > > dev.label (empty) > dev.uuid 007cf801-98bf-4d00-87e5-6e9127c83622 > dev.sectors_per_block 8 > dev.sectors_per_bucket 32 > dev.cache.first_sector 32 > dev.cache.cache_sectors 419430368 > dev.cache.total_sectors 419430400 > dev.cache.ordered yes > dev.cache.discard no > dev.cache.pos 0 > dev.cache.replacement 1 [fifo] > > cset.uuid 024c6ef2-ec7d-4f31-aadc-171e9be748e2 > ``` > > The test steps are as follows: > > 1. attach the backing device to the cache device > > 2. set sysfs (cache_mode set to writeback, etc) > > 3. fio randwrite 10G data > > 4.the backing device from the cache device > > The for loop executes above steps > > > I added some dmesg print information in code: > > ``` > > diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c > index fba0e538e46e..fd15b1dc8346 100644 > --- a/drivers/md/bcache/btree.c > +++ b/drivers/md/bcache/btree.c > @@ -2783,6 +2783,7 @@ static bool check_pred(struct keybuf *buf, struct bkey *k) > > bool bch_btree_can_inode_reuse(struct cache_set *c, size_t inode) > { > + size_t rescan = 0; > bool ret = true; > struct keybuf_key *k; > struct bkey end_key = KEY(inode, MAX_KEY_OFFSET, 0); > @@ -2793,10 +2794,12 @@ bool bch_btree_can_inode_reuse(struct cache_set *c, size_t inode) > goto out; > } > > + pr_info("try to reuse inode %zu", inode); > bch_keybuf_init(keys); > keys->last_scanned = KEY(inode, 0, 0); > > while (ret) { > + rescan++; > k = bch_keybuf_next_rescan(c, keys, &end_key, check_pred); > if (!k) > break; > @@ -2806,6 +2809,7 @@ bool bch_btree_can_inode_reuse(struct cache_set *c, size_t inode) > bch_keybuf_del(keys, k); > } > > + pr_info("inode %zu rescan %zu", inode, rescan); > kfree(keys); > out: > return ret; > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c > index 8335aedaffa9..7427fdacf61b 100644 > --- a/drivers/md/bcache/super.c > +++ b/drivers/md/bcache/super.c > @@ -472,10 +472,13 @@ static struct uuid_entry *uuid_find_reuse(struct cache_set *c) > { > struct uuid_entry *u; > > + pr_info("reuse invalid_uuid"); > for (u = c->uuids; u < c->uuids + c->nr_uuids; u++) > if (!memcmp(u->uuid, invalid_uuid, 16) && > - bch_btree_can_inode_reuse(c, u - c->uuids)) > + bch_btree_can_inode_reuse(c, u - c->uuids)) { > + pr_info("reuse inode %zu", u - c->uuids); > return u; > + } > > return NULL; > } > ``` > > > These are dmesg output: > > ``` > > [148602.383377] bcache: uuid_find_reuse() reuse invalid_uuid > [148602.383883] bcache: bch_btree_can_inode_reuse() try to reuse inode 0 > [148602.384402] bcache: bch_btree_can_inode_reuse() inode 0 rescan 1 > [148602.384897] bcache: bch_btree_can_inode_reuse() try to reuse inode 1 > [148602.385399] bcache: bch_btree_can_inode_reuse() inode 1 rescan 1 > [148602.385893] bcache: bch_btree_can_inode_reuse() try to reuse inode 2 > [148602.386406] bcache: bch_btree_can_inode_reuse() inode 2 rescan 1 > [148602.386901] bcache: bch_btree_can_inode_reuse() try to reuse inode 3 > [148602.387408] bcache: bch_btree_can_inode_reuse() inode 3 rescan 1 > [148602.387903] bcache: bch_btree_can_inode_reuse() try to reuse inode 4 > [148602.388405] bcache: bch_btree_can_inode_reuse() inode 4 rescan 1 > [148602.388898] bcache: bch_btree_can_inode_reuse() try to reuse inode 5 > [148602.389400] bcache: bch_btree_can_inode_reuse() inode 5 rescan 1 > [148602.389891] bcache: bch_btree_can_inode_reuse() try to reuse inode 6 > [148602.390391] bcache: bch_btree_can_inode_reuse() inode 6 rescan 1 > [148602.390967] bcache: bch_btree_can_inode_reuse() try to reuse inode 7 > [148602.391464] bcache: bch_btree_can_inode_reuse() inode 7 rescan 1 > [148602.391953] bcache: bch_btree_can_inode_reuse() try to reuse inode 8 > [148602.392455] bcache: bch_btree_can_inode_reuse() inode 8 rescan 1 > [148602.392949] bcache: bch_btree_can_inode_reuse() try to reuse inode 9 > [148602.403515] bcache: bch_btree_can_inode_reuse() inode 9 rescan 1 > [148602.404004] bcache: bch_btree_can_inode_reuse() try to reuse inode 10 > [148602.404504] bcache: bch_btree_can_inode_reuse() inode 10 rescan 1 > [148602.405082] bcache: bch_btree_can_inode_reuse() try to reuse inode 11 > [148602.405581] bcache: bch_btree_can_inode_reuse() inode 11 rescan 1 > [148602.406077] bcache: bch_btree_can_inode_reuse() try to reuse inode 12 > [148602.406580] bcache: bch_btree_can_inode_reuse() inode 12 rescan 1 > [148602.407074] bcache: bch_btree_can_inode_reuse() try to reuse inode 13 > [148602.407573] bcache: bch_btree_can_inode_reuse() inode 13 rescan 1 > [148602.408064] bcache: bch_btree_can_inode_reuse() try to reuse inode 14 > [148602.408563] bcache: bch_btree_can_inode_reuse() inode 14 rescan 1 > [148602.409052] bcache: bch_btree_can_inode_reuse() try to reuse inode 15 > [148602.409549] bcache: bch_btree_can_inode_reuse() inode 15 rescan 1 > [148602.410039] bcache: bch_btree_can_inode_reuse() try to reuse inode 16 > [148602.410537] bcache: bch_btree_can_inode_reuse() inode 16 rescan 1 > [148602.411027] bcache: bch_btree_can_inode_reuse() try to reuse inode 17 > [148602.411527] bcache: bch_btree_can_inode_reuse() inode 17 rescan 1 > [148602.412018] bcache: bch_btree_can_inode_reuse() try to reuse inode 18 > [148602.412519] bcache: bch_btree_can_inode_reuse() inode 18 rescan 1 > [148602.413009] bcache: bch_btree_can_inode_reuse() try to reuse inode 19 > [148602.413516] bcache: bch_btree_can_inode_reuse() inode 19 rescan 1 > [148602.414008] bcache: bch_btree_can_inode_reuse() try to reuse inode 20 > [148602.414552] bcache: bch_btree_can_inode_reuse() inode 20 rescan 1 > [148602.415041] bcache: bch_btree_can_inode_reuse() try to reuse inode 21 > [148602.415594] bcache: bch_btree_can_inode_reuse() inode 21 rescan 1 > [148602.416082] bcache: bch_btree_can_inode_reuse() try to reuse inode 22 > [148602.416730] bcache: bch_btree_can_inode_reuse() inode 22 rescan 1 > [148602.417219] bcache: bch_btree_can_inode_reuse() try to reuse inode 23 > [148602.417928] bcache: bch_btree_can_inode_reuse() inode 23 rescan 1 > [148602.418422] bcache: bch_btree_can_inode_reuse() try to reuse inode 24 > [148602.419142] bcache: bch_btree_can_inode_reuse() inode 24 rescan 1 > [148602.419654] bcache: bch_btree_can_inode_reuse() try to reuse inode 25 > [148602.420364] bcache: bch_btree_can_inode_reuse() inode 25 rescan 1 > [148602.420851] bcache: bch_btree_can_inode_reuse() try to reuse inode 26 > [148602.421578] bcache: bch_btree_can_inode_reuse() inode 26 rescan 1 > [148602.422072] bcache: bch_btree_can_inode_reuse() try to reuse inode 27 > [148602.422778] bcache: bch_btree_can_inode_reuse() inode 27 rescan 1 > [148602.423267] bcache: bch_btree_can_inode_reuse() try to reuse inode 28 > [148602.423972] bcache: bch_btree_can_inode_reuse() inode 28 rescan 1 > [148602.424466] bcache: bch_btree_can_inode_reuse() try to reuse inode 29 > [148602.425174] bcache: bch_btree_can_inode_reuse() inode 29 rescan 1 > [148602.425668] bcache: bch_btree_can_inode_reuse() try to reuse inode 30 > [148602.426371] bcache: bch_btree_can_inode_reuse() inode 30 rescan 1 > [148602.426859] bcache: bch_btree_can_inode_reuse() try to reuse inode 31 > [148602.427570] bcache: bch_btree_can_inode_reuse() inode 31 rescan 1 > [148602.428060] bcache: bch_btree_can_inode_reuse() try to reuse inode 32 > [148602.428764] bcache: bch_btree_can_inode_reuse() inode 32 rescan 1 > [148602.429251] bcache: bch_btree_can_inode_reuse() try to reuse inode 33 > [148602.429942] bcache: bch_btree_can_inode_reuse() inode 33 rescan 1 > [148602.430437] bcache: bch_btree_can_inode_reuse() try to reuse inode 34 > [148602.431127] bcache: bch_btree_can_inode_reuse() inode 34 rescan 1 > [148602.431622] bcache: bch_btree_can_inode_reuse() try to reuse inode 35 > [148602.432323] bcache: bch_btree_can_inode_reuse() inode 35 rescan 1 > [148602.432820] bcache: bch_btree_can_inode_reuse() try to reuse inode 36 > [148602.433516] bcache: bch_btree_can_inode_reuse() inode 36 rescan 1 > [148602.434004] bcache: bch_btree_can_inode_reuse() try to reuse inode 37 > [148602.434681] bcache: bch_btree_can_inode_reuse() inode 37 rescan 1 > [148602.435169] bcache: bch_btree_can_inode_reuse() try to reuse inode 38 > [148602.435673] bcache: bch_btree_can_inode_reuse() inode 38 rescan 1 > [148602.436159] bcache: uuid_find_reuse() reuse inode 38 > > ``` > > According to my tests, even if the cache device is very large, the reuse time cost should be acceptable, should not be 40+ minutes. Yes from your testing, of course it is not 40+ minutes. What I said was 90G+ btree nodes, it should be around 1.5-2T cached data with 512byte block size. It has been a long time when the registration process was not multi-threaded. Roughly the btree nodes checking spent around 15-20 minutes, and the dirty sectors counting for backing device spent 20-25 minutes. To generate such large meta data, it took around a whole day by fio with 10 jobs and 512 bytes block size, on a 4T NVMe SSD as cache. Such configuration is quite easy to see nowadays in enterprise environment. Indeed event reduce the registration time to 1/10 is still not ideal, it may still exceed the udev 120 seconds timeout and causes problem during boot time. This is why I added the asynchronized registration Kconfg item to avoid blocking udev task too much time. Coly Li