On Mon, Apr 6, 2015 at 8:39 PM, Kai Krakow <kai@xxxxxxxxxxx> wrote: > Bcache itself needs at least one partition or device for the caching layer. > This is, you make one empty partition on your SSD and format it with > make-bcache -C. Take care to use a bucket size that fits your SSDs erase > block size. Usually 2MB is a safe value. You also want to enable discard > since most modern SSDs work better with it than relying on the hidden > reservation area for wear-levelling. If you want to gain maximum > performance, you can also choose write-back mode. This is usually safe. > Ensure that your SSD supports power-loss protection. Otherwise you may loose > data when the power is lost. Usually, all major manufacturers like Intel, > Samsung, Crucial, and SanDisk support it - at least for the more expensive > drives. The product specs will tell you. > > Next, you create the partitions for the to-be-cached filesystems. You cannot > simply put the filesystem raw onto the partition. You have to prepare the > partition with make-bcache -B first ("B" for backing device). This then > creates you a new virtual device "bcacheX" (with X being a number) which you > use to operate your filesystem. Attach this virtual device to your caching > device/partition. Then use you normal mkfs tools to format this virtual > device. The underlying raw device/partition is not used by you, it is > managed by bcache. This configuration is stored by bcache and automatically > restored on next reboot by the kernel. You can attach multiple backing > devices to the same caching device - it's designed that multiple filesystems > can share the same bcache dynamically. > > You can also put your rootfs on bcache (I did it) but this involves > recreating the rootfs (because you need to format your rootfs partition with > make-bcache -B first, attach it, then recreate it on the new virtual bcache > device), and I am not sure if you need an initramfs then to boot. But most > distributions boot from initramfs anyways, but you should make sure they > support bcache in initramfs. I think this is because the kernel has to wait > for the bcache devices to appear first because they are not immediatly > available and thus a "root=/dev/bcache/bcache0" (or similar) would fail as > it is not immediatly found. I think udev rules need to run first so the > symlinks and device nodes are created, and detected bcache's become > registered again and imported into the kernel's knowledge. > > bcache caching device = intermediate storage for cached data > bcache backing device = persistent storage for data > > bcache migrates data from the caching device to the backing device to > persist data and make room for new data in the cache. In write-back mode it > will persist data with a delay and with idle priority in the background when > you write to the backing device (in reality it is a bit more advanced and > complicated for performance reasons and does a very good and reliable job). > In write-through mode it will write to the caching device and the backing > device at the same time, ensuring that data is persisted to the backing > device when the kernel acknowledges the process that data was flushed and > written. Upon re-read it is then already present in the cache. In > write-around mode, data is never written to the cache, only to the backing > device. Only reads will be written to the cache and can be re-read from the > cache on successive read-requests. > > Write-back mode is usually safe because the caching device is journalled. > Bcache will rewrite all dirty data after (unexpected) reboots to the > persistent backing device, in fact bcache doesn't even finish writing dirty > data at shutdown as part of its design, it will always boot dirty and > continue writing back dirty data and reliably finish filesystem > transactions. Only when the backing device has ack'ed all data written, the > caching data is marked clean. Here's another caveat in case of power-loss: > If your hard disk ack's the data as written but internally it's still in its > cache and not yet written, and you experience a power-loss, the knowledge > from bcache about data written is inconsistent with what the harddisk has > written. To be safe, you may want to disable write-caching of your harddisks > (with hdparm) and instead enable write-back in bcache to compensate for > that. You also may want to lower SCTERC for your harddisk (with smartctl) > from default 120s to 7s, so that sector errors become signalled to bcache > and the kernel before the SCSI layer resets, and thus bcache and your > filesystem can reliably handle the problem. This is often a feature only of > enterprise-grade and/or RAID-ready drives. If your drive doesn't support it, > you may instead want to increase your kernel SCSI timeout from default 30s > to something slightly above 120s. This way, you ensure that bcache is safe > even in case of hardware failures. > > In case of recovery (when you have to access data without the caching > device, e.g. when the SSD has died), it is only safe to access your data > when you didn't use write-back mode - because of the aforementioned design > that the cache is always dirty, even after clean shutdown. Tho, in normal > operation bcache doesn't keep dirty data around for too long. But it is > filesystem-agnostic and thus doesn't know what makes up a transaction on the > filesystem, so your filesystem probably has broken meta data without > accessing through the cache. But I think it supports write barriers, so if > your filesystem does, it should be transactionally safe and you may just > loose the last minutes of data but at least meta-data is consistent. > > HTH > Kai Very good article about Bcache in general. Thank you so much for all these basic information. > > arnaud gaboury <arnaud.gaboury@xxxxxxxxx> schrieb am Mo., 6. Apr. 2015 um > 12:49 Uhr: >> >> On Sun, Apr 5, 2015 at 11:05 PM, Kai Krakow <kai@xxxxxxxxxxx> wrote: >> > Apparently you didn't CC to the list... >> > >> > Subvolumes in a btrfs pool are not like volumes in a lvm pool. Bcache >> > only >> > acts on complete filesystem, thus you cannot only bcache a single >> > subvolume. >> > It also doesn't make sense. >> > >> > You usually make one single bcache caching device on SSD (make-bcache >> > -C), >> > then create backing devices on to be cached partitions (make-bcache -B), >> > and >> > attach them to each other. Multiple backing partitions can be attached >> > to >> > the same single caching device. Those backing device neither need to >> > have to >> > belong to the same filesystem, neither need to have the same filesystem >> > format. They can be completely unrelated. Currently, there's not much >> > sense >> > in using multiple caching devices (except you want to implement >> > different >> > caching strategies). In the future, bcache will support RAID like >> > schemes by >> > combining multiple caching devices. >> > >> > So to keep it easy, I suggest to only use one caching device (except you >> > have more than one SSD in which case you probably should create an LVM >> > mirror of them and create bcache on top for error resilience). >> >> Very good. Now I am still worried of one point I may miss. >> My setup is this one : my root filesystem on a SSD, and an HD for >> storing extra stuff, with an encrypted partition for DB. The idea is >> to use SSD as caching device. >> Bcache, I guess, like any cache: it will write data on cache. How >> about if my caching device has already a root filesystem? Where will >> baccahe store the data? >> I am afraid in fact I misunderstood the whole thing and need an empty >> device? Thus my first idea to dedicate a Btrfs partition to bcache. >> >> Thank you for any hint how bcache will manage the cached device >> writing and if I can use ny root filesystem or need an empty SSD (or >> partition). >> >> >> > >> > Here's my setup, 3x btrfs mraid-1 draid-0 (sdc,sdd,sde) with one caching >> > device on SSD (sdb): >> > >> > $ lsblk >> > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT >> > sda 8:0 0 1,8T 0 disk >> > └─sda1 8:1 0 1,8T 0 part >> > sdb 8:16 0 119,2G 0 disk >> > ├─sdb1 8:17 0 512M 0 part >> > ├─sdb2 8:18 0 20G 0 part [SWAP] >> > ├─sdb3 8:19 0 79,5G 0 part >> > │ ├─bcache0 252:0 0 925,5G 0 disk >> > │ ├─bcache1 252:1 0 925,5G 0 disk >> > │ └─bcache2 252:2 0 925,5G 0 disk /usr/portage >> > └─sdb4 8:20 0 19,2G 0 part >> > sdc 8:32 0 931,5G 0 disk >> > ├─sdc1 8:33 0 6G 0 part [SWAP] >> > └─sdc2 8:34 0 925,5G 0 part >> > └─bcache2 252:2 0 925,5G 0 disk /usr/portage >> > sdd 8:48 0 931,5G 0 disk >> > ├─sdd1 8:49 0 6G 0 part [SWAP] >> > └─sdd2 8:50 0 925,5G 0 part >> > └─bcache0 252:0 0 925,5G 0 disk >> > sde 8:64 0 931,5G 0 disk >> > ├─sde1 8:65 0 6G 0 part [SWAP] >> > └─sde2 8:66 0 925,5G 0 part >> > └─bcache1 252:1 0 925,5G 0 disk >> > >> > Ignore the mount points, they are more or less bogus with btrfs since >> > lsblk >> > is not able to differentiate multiple device setups and multiple >> > subvolumes >> > correctly. I'm running my rootfs from this setup. Bcache{0,1,2} belong >> > to >> > the same filesystem. >> > >> > arnaud gaboury <arnaud.gaboury@xxxxxxxxx> schrieb am So., 5. Apr. 2015 >> > um >> > 21:40 Uhr: >> >> >> >> On Sun, Apr 5, 2015 at 5:23 PM, Kai Krakow <kai@xxxxxxxxxxx> wrote: >> >> > arnaud gaboury <arnaud.gaboury@xxxxxxxxx> schrieb: >> >> > >> >> >> Here is what I did: >> >> >> # make-bcache -C /dev/sdb1 >> >> >> # echo "UUID" > /sys/block/bcache0/bcache/attach >> >> >> -------------------------------------- >> >> >> $ lsblk -o >> >> >> ....................... >> >> >> sdb sdb >> >> >> └─sdb1 sdb1 bcache >> >> >> └─bcache0 bcache0 >> >> >> -------------------------------- >> >> >> # mkfs.btrfs -L poppy-root /dev/sdb1 >> >> >> /dev/sdb1 appears to contain an existing filesystem (bcache). >> >> >> Error: Use the -f option to force overwrite. >> >> >> >> >> >> >> >> >> Please may you tell what is wrong and how can I make btrfs on >> >> >> bcached >> >> >> partition? >> >> > >> >> > You need to mkfs on /dev/bcache0. Bcache itself reserves the >> >> > partition >> >> > with >> >> > its own superblock and creates a subdevice so you cannot accidently >> >> > access >> >> > the data without passing through the cache layer. This means you will >> >> > also >> >> > mount /dev/bcache0 as your btrfs in fstab (or simply use >> >> > LABEL=poppy-root or >> >> > /dev/disk/by-label/...). >> >> >> >> Thank you for hint. I finally managed to do it with /dev/bcache0 >> >> └─sdd4 sdd4 8:52 bcache >> >> └─bcache1 bcache1 254:1 crypto_LUKS >> >> └─sdd4_crypt dm-8 253:8 btrfs poppy-encrypt >> >> >> >> Curious to see if the above setup will survive. >> >> >> >> Now I am trying to set up my caching device, on a ssd. I of course can >> >> bcache the whole device: >> >> └─sdb2 sdb2 8:18 bcache >> >> └─bcache0 bcache0 254:0 btrfs poppy-root >> >> But his ssd will in fact be my root file system. So I decided to >> >> create some btrfs subvolumes, with one for caching. >> >> gabx@hortensia ➤➤ ~ % sudo btrfs subvolume list /mnt/btrfs >> >> ID 257 gen 7 top level 5 path var >> >> ID 258 gen 8 top level 5 path home >> >> ID 259 gen 9 top level 5 path root >> >> ID 260 gen 10 top level 5 path cache >> >> >> >> Now I want (if possible) blocks to-bache only the cache subvolume. >> >> Obviously, # blocks to-bcache /mnt/backup/cache does not work. >> >> >> >> Is it possible to achieve caching only a btrfs subvolume? If yes, >> >> against what shall I run the to-bcache command? >> >> >> >> Thank you much for your help. Took me very long to achieve the whole >> >> setup! >> >> >> >> >> >> > >> >> > If you already overwrote something, I suggest to wipefs first on the >> >> > partitions, otherwise the kernel may accidently misdetect filesystems >> >> > that >> >> > are no longer there. >> >> > >> >> > -- >> >> > Replies to list only preferred. >> >> >> >> >> >> >> >> -- >> >> >> >> google.com/+arnaudgabourygabx >> -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html