On Tue, Dec 8, 2020 at 5:08 PM Kevin Kofler via devel <devel@xxxxxxxxxxxxxxxxxxxxxxx> wrote: > > Sergio Belkin wrote: > > So, let's say we have 3 small disks: 4GB, 3G, and 2GB. > > > > If I create one file of 3GB I think that > > 3 GB is written on 4GB disk, it leaves 1 GB free. > > 3 GB of copy is written on 3 GB disk, it leaves 0 GB Free. > > > > So, I create one file of 1GB that is written on 4GB disk, it leaves 0 GB > > free. > > 1 GB of copy is written on 2 GB disk, so it leaves 1 GB free. > > > > So I've used 4GB, ok it leaves 1 GB free on only one disk, but cannot be > > mirrored. > > > > However as [1] I could use 4.5 ((4GB+3GB+2GB)/2) GB instead of 4GB. > > Surely, I'm missing or mistaking something. > > > > Please could you help me? > > The optimum size can theoretically be achieved by using the following > physical partitioning: > * x GB on the 4 GB disk and the 3 GB disk, > * y GB on the 4 GB disk and the 2 GB disk, and > * z GB on the 3 GB disk and the 2 GB disk, > for a total of x+y+z GB, where x, y, and z solve the following system of > equations: > * x+y=4 > * x+z=3 > * y+z=2 > i.e., in standard form: > * 1x+1y+0z=4 > * 1x+0y+1z=3 > * 0x+1y+1z=2 > The determinant of this system is -2, which is not 0, so this system admits > a unique solution. It can be computed using any method to solve linear > systems of equations, such as direct substitution (solving an equation for a > variable and substituting it), Gauss elimination with back substitution, > Gauss-Jordan (bidirectional) elimination, or Cramer's rule. The result is: > * x=2.5 > * y=1.5 > * z=0.5 > for a total of x+y+z=2.5+1.5+0.5=4.5 GB. > > Now how btrfs actually handles this in practice is a different story. > Judging from Chris Murphy's reply, it does not precompute the above > repartition, but tries to dynamically select 2 disks for each newly > allocated 1 GB block to approximate the optimal solution for large enough > drives (which will not achieve the optimum for the sizes in your example > because the optimum allocation is not an integer amount of gigabytes, and > will in fact be pretty far from the optimum due to the small sizes, whereas > the larger the disk sizes, the less noticeable the loss is). It's a bit more complicated still. The block group size is typically 1G but in reality it's variable, depending on file system size, and unallocated space remaining. I don't know the minimum size, although I have seen 128MB data block groups. The reason block groups are not set in advance, is because there are different types of block groups: data and metadata. File system blocks go in metadata block groups, and blocks for file data go in data block groups. And the ratio of data to metadata usage is workload dependent. Some workloads produce heavy metadata others less so. Why separate block groups? They can have different block sizes and redundancy profiles, e.g. by default 16KiB block size for metadata, 4KiB for data. And by default hard drives have dup metadata, single data; and 2+ device file systems will get raid1 for metadata and single for data. But it's this way for efficiency and features. I'll stop here before I fall into a balance, resize, multiple device rabbit hole. (dup = two copies on a single device, can also apply to data) -- Chris Murphy _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx