>>>>> <" <Cedric.dewijs@xxxxxxxxxx>> writes: > I have 4 slow, loud, big, power hungry and old hard drives, and 2 SSD's. I'm trying to come up with a way to combine them into a system that has the following characteristics: > A) The hard drives stop spinning 5 minutes after they have been used. > B) The SSD's are used for read and write caching. Writes to the system are absorbed by the SSD's. Only when the ssd's are full of dirty data, then the hard drives are woken up. (This means the SSD's contain dirty data for potentially a long time.) > C) When data is requested that's not present on the SSD's (a read cache miss), then the hard drive which has that data is woken up. > D) When a hard drive is woken up as a result of a read cache miss, then the SSD's write out the dirty data to that drive. > E) If one drive fails, or starts to produce random data, the system must return the correct data to the user. > First idea is to use this stack of bcache and btrfs: > +--------------------------------------------+--------------+ > | btrfs raid 1 (2 copies) /mnt | > +--------------+--------------+--------------+--------------+ > | /dev/bcache0 | /dev/bcache1 | /dev/bcache2 |/dev/bcache3 | > +--------------+--------------+--------------+--------------+ > | Cache (SSD) | > | /dev/sda4 | > +--------------+--------------+--------------+--------------+ > | Data HDD | Data HDD | Data HDD |Data HDD | > | /dev/sda8 | /dev/sda9 | /dev/sda10 |/dev/sda11 | > +--------------+--------------+--------------+--------------+ > The good: > Btrfs in raid 1 is able to handle a failing hard drive, both when it failed completely, and when it corrupts data. > Bcache is capable of using an ssd to cache the read and the write requests from btrfs. > The not-so-good: > Bcache can only use one SSD, so using bcache is only possible as read cache in order to achieve characteristic E, but this prevents characteristic B to be achieved. > I can't get bcache to read-ahead the data that is adjacent to the data that has just been accessed. > Second idea is to use a SSD in front of each hard drive: > +-----------------------------------------------------------+ > | btrfs raid 1 (2 copies) /mnt | > +--------------+--------------+--------------+--------------+ > | /dev/bcache0 | /dev/bcache1 | /dev/bcache2 | /dev/bcache3 | > +--------------+--------------+--------------+--------------+ > | Cache SSD | Cache SSD | Cache SSD | Cache SSD | > | /dev/sda5 | /dev/sda6 | /dev/sda7 | /dev/sda8 | > +--------------+--------------+--------------+--------------+ > | Data | Data | Data | Data | > | /dev/sda9 | /dev/sda10 | /dev/sda11 |/dev/sda12 | > +--------------+--------------+--------------+--------------+ > The good: > This setup achieves all characteristics I'm after > The not-so-good: > This requires more SSD's and more (SATA) ports than I have. I can't > get bcache to read-ahead the data that is adjacent to the data that > has just been accessed. Why don't you just partition your SSD(s) into 4 partitions and use each partition as a cache for a seperate HDD? The SSDs have more than enough IOPs to handle the load. But! I would mirror a pair of SSDs and mirror pairs of disks for even more redundancy and reliability here. > Third idea is to use mdadm to create a raid 0 array out of the 2 > SSD's to create a fault tolerant write cache: No, RAID1 (mirror) not a RAID 0 stripe across two SSDs. > +-----------------------------------------------------------+ > | btrfs raid 1 (2 copies) /mnt | > +--------------+--------------+--------------+--------------+ > | /dev/bcache0 | /dev/bcache1 | /dev/bcache2 |/dev/bcache3 | > +--------------+--------------+--------------+--------------+ > | bcache Cache | > | /dev/md0 | > +-----------------------------------------------------------+ > | mdadm raid 0 array /dev/md0 | > | SSD /dev/sda4 and SSD /dev/sda5 | > +--------------+--------------+--------------+--------------+ > | Data | Data | Data | Data | > | /dev/sda9 | /dev/sda10 | /dev/sda11 |/dev/sda12 | > +--------------+--------------+--------------+--------------+ > The good: > This setup is capable of achieving all characteristics I'm after. It can handle abrupt failure of a single drive. > The not-so-good: > When one of the SSD's start to produce random data, mdadm is not able to know what SSD produces correct data, and data is lost. (both copies of the data btrfs is trying to write to underlying storage are on the 2 SSD's. > Fourth idea is to use dm-cache. Dm-cache can only cache one backing device, and it has no way to use 2 cache devices. > +-----------------------------------------------------------+ > | btrfs raid 1 (2 copies) /mnt | > +--------------+--------------+--------------+--------------+ > | /dev/bcache0 | /dev/bcache1 | /dev/bcache2 | /dev/bcache3 | > +--------------+--------------+--------------+--------------+ > | Cache SSD | Cache SSD | Cache SSD | Cache SSD | > | /dev/sda5 | /dev/sda6 | /dev/sda7 | /dev/sda8 | > +--------------+--------------+--------------+--------------+ > | Data | Data | Data | Data | > | /dev/sda9 | /dev/sda10 | /dev/sda11 |/dev/sda12 | > +--------------+--------------+--------------+--------------+ > The good: > This setup is capable of achieving all characteristics I'm after. > The not-so-good: > This requires more SSD's and more (SATA) ports than I have. Remember, a single SSD can handle loads more IOPs than all four of your drives, so partitioning your SSDS might be an answer to your solutions.