Re: Issue when mkfs.btrfs on a bcached partition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 6, 2015 at 8:39 PM, Kai Krakow <kai@xxxxxxxxxxx> wrote:
> Bcache itself needs at least one partition or device for the caching layer.
> This is, you make one empty partition on your SSD and format it with
> make-bcache -C. Take care to use a bucket size that fits your SSDs erase
> block size. Usually 2MB is a safe value. You also want to enable discard
> since most modern SSDs work better with it than relying on the hidden
> reservation area for wear-levelling. If you want to gain maximum
> performance, you can also choose write-back mode. This is usually safe.
> Ensure that your SSD supports power-loss protection. Otherwise you may loose
> data when the power is lost. Usually, all major manufacturers like Intel,
> Samsung, Crucial, and SanDisk support it - at least for the more expensive
> drives. The product specs will tell you.
>
> Next, you create the partitions for the to-be-cached filesystems. You cannot
> simply put the filesystem raw onto the partition. You have to prepare the
> partition with make-bcache -B first ("B" for backing device). This then
> creates you a new virtual device "bcacheX" (with X being a number) which you
> use to operate your filesystem. Attach this virtual device to your caching
> device/partition. Then use you normal mkfs tools to format this virtual
> device. The underlying raw device/partition is not used by you, it is
> managed by bcache. This configuration is stored by bcache and automatically
> restored on next reboot by the kernel. You can attach multiple backing
> devices to the same caching device - it's designed that multiple filesystems
> can share the same bcache dynamically.
>
> You can also put your rootfs on bcache (I did it) but this involves
> recreating the rootfs (because you need to format your rootfs partition with
> make-bcache -B first, attach it, then recreate it on the new virtual bcache
> device), and I am not sure if you need an initramfs then to boot. But most
> distributions boot from initramfs anyways, but you should make sure they
> support bcache in initramfs. I think this is because the kernel has to wait
> for the bcache devices to appear first because they are not immediatly
> available and thus a "root=/dev/bcache/bcache0" (or similar) would fail as
> it is not immediatly found. I think udev rules need to run first so the
> symlinks and device nodes are created, and detected bcache's become
> registered again and imported into the kernel's knowledge.
>
> bcache caching device = intermediate storage for cached data
> bcache backing device = persistent storage for data
>
> bcache migrates data from the caching device to the backing device to
> persist data and make room for new data in the cache. In write-back mode it
> will persist data with a delay and with idle priority in the background when
> you write to the backing device (in reality it is a bit more advanced and
> complicated for performance reasons and does a very good and reliable job).
> In write-through mode it will write to the caching device and the backing
> device at the same time, ensuring that data is persisted to the backing
> device when the kernel acknowledges the process that data was flushed and
> written. Upon re-read it is then already present in the cache. In
> write-around mode, data is never written to the cache, only to the backing
> device. Only reads will be written to the cache and can be re-read from the
> cache on successive read-requests.
>
> Write-back mode is usually safe because the caching device is journalled.
> Bcache will rewrite all dirty data after (unexpected) reboots to the
> persistent backing device, in fact bcache doesn't even finish writing dirty
> data at shutdown as part of its design, it will always boot dirty and
> continue writing back dirty data and reliably finish filesystem
> transactions. Only when the backing device has ack'ed all data written, the
> caching data is marked clean. Here's another caveat in case of power-loss:
> If your hard disk ack's the data as written but internally it's still in its
> cache and not yet written, and you experience a power-loss, the knowledge
> from bcache about data written is inconsistent with what the harddisk has
> written. To be safe, you may want to disable write-caching of your harddisks
> (with hdparm) and instead enable write-back in bcache to compensate for
> that. You also may want to lower SCTERC for your harddisk (with smartctl)
> from default 120s to 7s, so that sector errors become signalled to bcache
> and the kernel before the SCSI layer resets, and thus bcache and your
> filesystem can reliably handle the problem. This is often a feature only of
> enterprise-grade and/or RAID-ready drives. If your drive doesn't support it,
> you may instead want to increase your kernel SCSI timeout from default 30s
> to something slightly above 120s. This way, you ensure that bcache is safe
> even in case of hardware failures.
>
> In case of recovery (when you have to access data without the caching
> device, e.g. when the SSD has died), it is only safe to access your data
> when you didn't use write-back mode - because of the aforementioned design
> that the cache is always dirty, even after clean shutdown. Tho, in normal
> operation bcache doesn't keep dirty data around for too long. But it is
> filesystem-agnostic and thus doesn't know what makes up a transaction on the
> filesystem, so your filesystem probably has broken meta data without
> accessing through the cache. But I think it supports write barriers, so if
> your filesystem does, it should be transactionally safe and you may just
> loose the last minutes of data but at least meta-data is consistent.
>
> HTH
> Kai

Very good article about Bcache in general. Thank you so much for all
these basic information.
>
> arnaud gaboury <arnaud.gaboury@xxxxxxxxx> schrieb am Mo., 6. Apr. 2015 um
> 12:49 Uhr:
>>
>> On Sun, Apr 5, 2015 at 11:05 PM, Kai Krakow <kai@xxxxxxxxxxx> wrote:
>> > Apparently you didn't CC to the list...
>> >
>> > Subvolumes in a btrfs pool are not like volumes in a lvm pool. Bcache
>> > only
>> > acts on complete filesystem, thus you cannot only bcache a single
>> > subvolume.
>> > It also doesn't make sense.
>> >
>> > You usually make one single bcache caching device on SSD (make-bcache
>> > -C),
>> > then create backing devices on to be cached partitions (make-bcache -B),
>> > and
>> > attach them to each other. Multiple backing partitions can be attached
>> > to
>> > the same single caching device. Those backing device neither need to
>> > have to
>> > belong to the same filesystem, neither need to have the same filesystem
>> > format. They can be completely unrelated. Currently, there's not much
>> > sense
>> > in using multiple caching devices (except you want to implement
>> > different
>> > caching strategies). In the future, bcache will support RAID like
>> > schemes by
>> > combining multiple caching devices.
>> >
>> > So to keep it easy, I suggest to only use one caching device (except you
>> > have more than one SSD in which case you probably should create an LVM
>> > mirror of them and create bcache on top for error resilience).
>>
>> Very good. Now I am still worried of one point I may miss.
>> My setup is this one : my root filesystem on a SSD, and an HD for
>> storing extra stuff, with an encrypted partition for DB. The idea is
>> to use SSD as caching device.
>> Bcache, I guess, like any cache: it will write data on cache. How
>> about if my caching device has already a root filesystem? Where will
>> baccahe store the data?
>> I am afraid in fact I misunderstood the whole thing and need an empty
>> device? Thus my first idea to dedicate a Btrfs partition to bcache.
>>
>> Thank you for any hint how bcache will manage the cached device
>> writing and if I can use ny root filesystem or need an empty SSD (or
>> partition).
>>
>>
>> >
>> > Here's my setup, 3x btrfs mraid-1 draid-0 (sdc,sdd,sde) with one caching
>> > device on SSD (sdb):
>> >
>> > $ lsblk
>> > NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>> > sda           8:0    0   1,8T  0 disk
>> > └─sda1        8:1    0   1,8T  0 part
>> > sdb           8:16   0 119,2G  0 disk
>> > ├─sdb1        8:17   0   512M  0 part
>> > ├─sdb2        8:18   0    20G  0 part [SWAP]
>> > ├─sdb3        8:19   0  79,5G  0 part
>> > │ ├─bcache0 252:0    0 925,5G  0 disk
>> > │ ├─bcache1 252:1    0 925,5G  0 disk
>> > │ └─bcache2 252:2    0 925,5G  0 disk /usr/portage
>> > └─sdb4        8:20   0  19,2G  0 part
>> > sdc           8:32   0 931,5G  0 disk
>> > ├─sdc1        8:33   0     6G  0 part [SWAP]
>> > └─sdc2        8:34   0 925,5G  0 part
>> >   └─bcache2 252:2    0 925,5G  0 disk /usr/portage
>> > sdd           8:48   0 931,5G  0 disk
>> > ├─sdd1        8:49   0     6G  0 part [SWAP]
>> > └─sdd2        8:50   0 925,5G  0 part
>> >   └─bcache0 252:0    0 925,5G  0 disk
>> > sde           8:64   0 931,5G  0 disk
>> > ├─sde1        8:65   0     6G  0 part [SWAP]
>> > └─sde2        8:66   0 925,5G  0 part
>> >   └─bcache1 252:1    0 925,5G  0 disk
>> >
>> > Ignore the mount points, they are more or less bogus with btrfs since
>> > lsblk
>> > is not able to differentiate multiple device setups and multiple
>> > subvolumes
>> > correctly. I'm running my rootfs from this setup. Bcache{0,1,2} belong
>> > to
>> > the same filesystem.
>> >
>> > arnaud gaboury <arnaud.gaboury@xxxxxxxxx> schrieb am So., 5. Apr. 2015
>> > um
>> > 21:40 Uhr:
>> >>
>> >> On Sun, Apr 5, 2015 at 5:23 PM, Kai Krakow <kai@xxxxxxxxxxx> wrote:
>> >> > arnaud gaboury <arnaud.gaboury@xxxxxxxxx> schrieb:
>> >> >
>> >> >> Here is what I did:
>> >> >>  # make-bcache -C /dev/sdb1
>> >> >>  # echo "UUID" > /sys/block/bcache0/bcache/attach
>> >> >> --------------------------------------
>> >> >> $ lsblk -o
>> >> >> .......................
>> >> >> sdb                      sdb
>> >> >> └─sdb1                   sdb1    bcache
>> >> >>   └─bcache0              bcache0
>> >> >> --------------------------------
>> >> >> # mkfs.btrfs -L poppy-root /dev/sdb1
>> >> >> /dev/sdb1 appears to contain an existing filesystem (bcache).
>> >> >> Error: Use the -f option to force overwrite.
>> >> >>
>> >> >>
>> >> >> Please may you tell what is wrong and how can I make btrfs on
>> >> >> bcached
>> >> >> partition?
>> >> >
>> >> > You need to mkfs on /dev/bcache0. Bcache itself reserves the
>> >> > partition
>> >> > with
>> >> > its own superblock and creates a subdevice so you cannot accidently
>> >> > access
>> >> > the data without passing through the cache layer. This means you will
>> >> > also
>> >> > mount /dev/bcache0 as your btrfs in fstab (or simply use
>> >> > LABEL=poppy-root or
>> >> > /dev/disk/by-label/...).
>> >>
>> >> Thank you for hint. I finally managed to do it with /dev/bcache0
>> >> └─sdd4                   sdd4      8:52  bcache
>> >>   └─bcache1              bcache1 254:1   crypto_LUKS
>> >>     └─sdd4_crypt         dm-8    253:8   btrfs       poppy-encrypt
>> >>
>> >> Curious to see if the above setup will survive.
>> >>
>> >> Now I am trying to set up my caching device, on a ssd. I of course can
>> >> bcache the whole device:
>> >> └─sdb2                   sdb2      8:18  bcache
>> >>   └─bcache0              bcache0 254:0   btrfs       poppy-root
>> >> But his ssd will in fact be my root file system. So I decided to
>> >> create some btrfs subvolumes, with one for caching.
>> >> gabx@hortensia ➤➤ ~ % sudo btrfs subvolume list /mnt/btrfs
>> >> ID 257 gen 7 top level 5 path var
>> >> ID 258 gen 8 top level 5 path home
>> >> ID 259 gen 9 top level 5 path root
>> >> ID 260 gen 10 top level 5 path cache
>> >>
>> >> Now I want (if possible) blocks to-bache only the cache subvolume.
>> >> Obviously, # blocks to-bcache /mnt/backup/cache  does not work.
>> >>
>> >> Is it possible to achieve caching only a btrfs subvolume? If yes,
>> >> against what shall I run the to-bcache command?
>> >>
>> >> Thank you much for your help. Took me very long to achieve the whole
>> >> setup!
>> >>
>> >>
>> >> >
>> >> > If you already overwrote something, I suggest to wipefs first on the
>> >> > partitions, otherwise the kernel may accidently misdetect filesystems
>> >> > that
>> >> > are no longer there.
>> >> >
>> >> > --
>> >> > Replies to list only preferred.
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> google.com/+arnaudgabourygabx
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux