On Thu, 2015-08-06 at 16:11 -0700, Kent Overstreet wrote: > On Wed, Aug 05, 2015 at 11:40:06PM -0700, Ming Lin wrote: > > On Tue, 2015-07-28 at 11:45 -0700, Ming Lin wrote: > > > On Tue, Jul 28, 2015 at 11:41 AM, Ming Lin <mlin@xxxxxxxxxx> wrote: > > > > On Fri, Jul 24, 2015 at 1:47 PM, Ming Lin <mlin@xxxxxxxxxx> wrote: > > > >> > > > >> And I want to learn how the btree node insert/delete/update happens on > > > >> disk. These maybe too detail. I'm going to write a small tool to dump > > > >> the file system. Then I could understand better the on disk btree > > > >> format. > > > > > > > > Here is my simple tool to dump parts of the on-disk format. > > > > http://www.minggr.net/cgit/cgit.cgi/bcache-tools/commit/?id=deb258e2 > > > > > > Actually: http://www.minggr.net/cgit/cgit.cgi/bcache-tools/commit/?id=3121eec > > > > > > > > > > > It's not in good shape, but simple enough to learn the on-disk format. > > > > Hi Kent, > > > > I'm trying to understand how the root inode is stored in the inode > > btree. > > > > dd if=/dev/zero of=fs.img bs=10M count=1 > > bcacheadm format -C fs.img > > mount -t bcache -o loop fs.img /mnt > > umount /mnt > > hexdump -C fs.img > fs.hex > > > > From my simple tool, I know that the inode btree starts from offset > > 0xec000 > > The root node of the inode btree? Are you handling trees with multiple nodes > yet? Yes and no. > > > > > 000ec000 43 ef f3 df ff ff ff ff 86 c1 47 1e 99 25 51 35 |C.........G..%Q5| > > 000ec010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > 000ec020 00 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff |................| > > 000ec030 ff ff ff ff ff ff ff ff 01 05 00 00 00 00 00 00 |................| > > 000ec040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > * > > 000ec070 88 b5 38 e2 45 36 eb f6 00 00 00 00 00 00 00 00 |..8.E6..........| > > 000ec080 01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................| > > 000ec090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > * > > 000ed000 31 66 fd 31 ff ff ff ff 88 b5 38 e2 45 36 eb f6 |1f.1......8.E6..| > > 000ed010 02 00 00 00 00 00 00 00 01 00 00 00 03 00 0b 00 |................| > > 000ed020 0b 01 80 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > 000ed030 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 |................| > > 000ed040 ed 41 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.A..............| > > 000ed050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > * > > 000ed070 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > 000ed080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > * > > > > btree_node (0xec000) > > bset (0xed008) ---> bset->u64s = 0x0b = 11 > > bkey_packed (0xed020) > > bkey (0xed020) > > bch_inode (0xed040 to 0xed077) ---> root inode > > > > Is the decode above correct? > > I think so. The code that deals with reading in a btree node disk and > interpreting the contents is mainly in bch_btree_node_read_done(), btree_io.c - > it looks like you found that? I haven't dig into the code yet. Firstly to understand the on-disk structure by hexdump. > > > I found the root inode manually. But how is it actually found by code? > > The root inode is the inode with inode number BCACHE_ROOT_INO (4096) - > http://evilpiepirate.org/git/linux-bcache.git/tree/drivers/md/bcache/fs.c?h=bcache-dev&id=5cf7fb11d124839eea2191fd7e8eddecb296d67d#n2285 > > So to do it correctly, you'll need the bkey packing code in order to unpack the > key (if it was packed) so that you can get the actual inode number of the key. > > You'll also need to do something like the mergesort algorithm (or something > equivalent; you don't need to do the actual mergesort if you're just doing a > linear search for one key). That is - if there's multiple bsets, they will > likely contain duplicates and keys in newer bsets overwrite keys in older bsets. Don't understand this part for now. I'll learn it. > > > Could you help to explain what it is from 0xec070 to 0xed007? > > Are they also bsets? > > Without knowing your block size and spending a fair amount of time staring at > the hexdump, I don't know what starts there - but quite possibly yes; bsets that > aren't at the start of the btree node are embeddedd in a struct > btree_node_entry, not a struct btree_node. > > To tell if it's a valid bset, you compare bset->seq against the seq in the first > bset - it's a random number generated for each new btree node; if they match > then the bset there goes with that btree node. The block size is 4K. OK, now I can interpret the hexdump. 000ec000 43 ef f3 df ff ff ff ff 86 c1 47 1e 99 25 51 35 |C.........G..%Q5| 000ec010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000ec020 00 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff |................| 000ec030 ff ff ff ff ff ff ff ff 01 05 00 00 00 00 00 00 |................| 000ec040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000ec070 88 b5 38 e2 45 36 eb f6 00 00 00 00 00 00 00 00 |..8.E6..........| 000ec080 01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................| 000ec090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000ed000 31 66 fd 31 ff ff ff ff 88 b5 38 e2 45 36 eb f6 |1f.1......8.E6..| 000ed010 02 00 00 00 00 00 00 00 01 00 00 00 03 00 0b 00 |................| 000ed020 0b 01 80 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000ed030 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 |................| 000ed040 ed 41 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.A..............| 000ed050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000ed070 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000ed080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000ee000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| There are 2 bsets: bset->seq "88 b5 38 e2 45 36 eb f6" btree_node (0xec000) bset_1 (0xec070) ---> bset->u64s = 0 (a empty bset?) btree_node_entry (0xed000) bset_2 (0xed008) ---> bset->u64s = 0x0b = 11 bkey_packed (0xed020) bkey (0xed020) bch_inode (0xed040 to 0xed077) ---> root inode Why is there a empty bset at the start of the btree node? -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html