On 23/11/2020 18:49, David Sterba wrote: > On Tue, Nov 10, 2020 at 08:26:14PM +0900, Naohiro Aota wrote: >> Superblock (and its copies) is the only data structure in btrfs which has a >> fixed location on a device. Since we cannot overwrite in a sequential write >> required zone, we cannot place superblock in the zone. One easy solution is >> limiting superblock and copies to be placed only in conventional zones. >> However, this method has two downsides: one is reduced number of superblock >> copies. The location of the second copy of superblock is 256GB, which is in >> a sequential write required zone on typical devices in the market today. >> So, the number of superblock and copies is limited to be two. Second >> downside is that we cannot support devices which have no conventional zones >> at all. >> >> To solve these two problems, we employ superblock log writing. It uses two >> zones as a circular buffer to write updated superblocks. Once the first >> zone is filled up, start writing into the second buffer. Then, when the >> both zones are filled up and before start writing to the first zone again, >> it reset the first zone. >> >> We can determine the position of the latest superblock by reading write >> pointer information from a device. One corner case is when the both zones >> are full. For this situation, we read out the last superblock of each >> zone, and compare them to determine which zone is older. >> >> The following zones are reserved as the circular buffer on ZONED btrfs. >> >> - The primary superblock: zones 0 and 1 >> - The first copy: zones 16 and 17 >> - The second copy: zones 1024 or zone at 256GB which is minimum, and next >> to it > > I was thinking about that, again. We need a specification. The above is > too vague. > > - supported zone sizes > eg. if device has 256M, how does it work? I think we can support > zones from some range (256M-1G), where filling the zone will start > filing the other zone, leaving the remaining space empty if needed, > effectively reserving the logical range [0..2G] for superblock > > - related to the above, is it necessary to fill the whole zone? > if both zones are filled, assuming 1G zone size, do we really expect > the user to wait until 2G of data are read? > with average reading speed 150MB/s, reading 2G will take about 13 > seconds, just to find the latest copy of the superblock(!) > > - what are exact offsets of the superblocks > primary (64K), ie. not from the beginning > as partitioning is not supported, nor bootloaders, we don't need to > worry about overwriting them > > - what is an application supposed to do when there's a garbage after a > sequence of valid superblocks (all zeros can be considered a valid > termination block) > > The idea is to provide enough information for a 3rd party tool to read > the superblock (blkid, progs) and decouple the format from current > hardware capabilities. If the zones are going to be large in the future > we might consider allowing further flexibility, or fix the current zone > maximum to 1G and in the future add a separate incompat bit that would > extend the maximum to say 10G. > We don't need to do that. All we need to do for finding the valid superblock is a report zones call, get the write pointer and then read from write-pointer - sizeof(struct brtfs_super_block). There is no need for scanning a whole zone. The last thing that was written will be right before the write pointer.