Re: [PATCH v9 11/41] btrfs: implement log-structured superblock for ZONED mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 03, 2020 at 03:10:35PM +0100, David Sterba wrote:
On Fri, Oct 30, 2020 at 10:51:18PM +0900, Naohiro Aota wrote:
Superblock (and its copies) is the only data structure in btrfs which has a
fixed location on a device. Since we cannot overwrite in a sequential write
required zone, we cannot place superblock in the zone. One easy solution is
limiting superblock and copies to be placed only in conventional zones.
However, this method has two downsides: one is reduced number of superblock
copies. The location of the second copy of superblock is 256GB, which is in
a sequential write required zone on typical devices in the market today.
So, the number of superblock and copies is limited to be two.  Second
downside is that we cannot support devices which have no conventional zones
at all.

To solve these two problems, we employ superblock log writing. It uses two
zones as a circular buffer to write updated superblocks. Once the first
zone is filled up, start writing into the second buffer. Then, when the
both zones are filled up and before start writing to the first zone again,
it reset the first zone.

We can determine the position of the latest superblock by reading write
pointer information from a device. One corner case is when the both zones
are full. For this situation, we read out the last superblock of each
zone, and compare them to determine which zone is older.

The following zones are reserved as the circular buffer on ZONED btrfs.

- The primary superblock: zones 0 and 1
- The first copy: zones 16 and 17
- The second copy: zones 1024 or zone at 256GB which is minimum, and next
  to it

If these reserved zones are conventional, superblock is written fixed at
the start of the zone without logging.

I don't have a clear picture here.

In case there's a conventional zone covering 0 and 1st copy (64K and
64M) it'll be overwritten. What happens for 2nd copy that's at 256G?
sb-log?

For all-sequential drive, the 0 and 1st copy are in the first zone.
You say 0 and 1, but how come if the minimum zone size we ever expect is
256M?

On zoned device, we always reserve the above zones (0, 1, 16, 17, 1024,
1025 (or zones at 256G)) regardless of it is sequential or conventional.
And, if the reserved zones is conventional, we write a superblock always at
the beginning of the reserved zone. So, if a drive have 32
conventional zones, superblocks are placed at the beginning of zone 0 and
zone 16. And, zone 1024 and 1025 are written with sb-log.


The circular buffer comprises zones covering all superblock copies? I
mean one buffer for 2 or more sb copies? The problem is that we'll have
just one copy of the current superblock. Or I misunderstood.

A circular buffer consists with a pair of the zones, so we'll have three
sb-logs for each on zone pairs 0 & 1, zones 16 & 17, and 1024 & 1025.


My idea is that we have primary zone, unfortunatelly covering 2
superblocks but let it be. Second zone contains 2nd superblock copy
(256G), we can assume that devices will be bigger than that.

Then the circular buffers happen in each zone, so first one will go from
offset 64K up to the zone size (256M or 1G).  Second zone rotates from
offset 0 to end of the zone.

The positive outcome of that is that both zones contain the latest
superblock after succesful write and their write pointer is slightly out
of sync, so they never have to be reset at the same time.

In numbers:
- first zone 64K .. 256M, 65520 superblocks
- second zone 256G .. 245G+256M, 65536 superblocks

The difference is 16 superblock updates, which should be enough to let
the zone resets happen far apart.

Hmm, this makes the minimal FS size requirement to 256 GB to survive a
crash after resetting the first zone... So, that's why we have two zones as
a circular buffer.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux