Re: Reserve space for specific thin logical volumes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Il 11-09-2017 12:35 Zdenek Kabelac ha scritto:
The first question here is - why do you want to use thin-provisioning ?

Because classic LVM snapshot behavior (slow write speed and linear performance decrease as snapshot count increases) make them useful for nightly backups only.

On the other side, the very fast CoW thinp's behavior mean very usable and frequent snapshots (which are very useful to recover from user errors).

As thin-provisioning is about 'promising the space you can deliver
later when needed'  - it's not about hidden magic to make the space
out-of-nowhere.

I fully agree. In fact, I was asking about how to reserve space to *protect* critical thin volumes from "liberal" resource use by less important volumes. Fully-allocated thin volumes sound very interesting - even if I think this is a performance optimization rather than a "safety measure".

The idea of planning to operate thin-pool on 100% fullness boundary is
simply not going to work well - it's  not been designed for that
use-case - so if that's been your plan - you will need to seek for
other solution.
(Unless you seek for those 100% provisioned devices)

I do *not* want to run at 100% data usage. Actually, I want to avoid it entirely by setting a reserved space which cannot be used for things as snapshot. In other words, I would very like to see a snapshot to fail rather than its volume becoming unavailable *and* corrupted.

Let me de-tour by using ZFS as an example (don't bash me for doing that!)

In ZFS words, there are object called ZVOLs - ZFS volumes/block devices, which can either be "fully-preallocated" or "sparse".

By default, they are "fully-preallocated": their entire nominal space is reseved and subtracted from the ZPOOL total capacity. Please note that this does *not* means that space is really allocated on the ZPOOL, rather that nominal space is accounted against other ZFS dataset/volumes when creating new object. A filesystem sitting on top of such a ZVOL will never run out of space; rather, if the remaining capacity is not enough to guaranteed this constrain, new volume/snapshot creating is forbidden.

Example:
# 1 GB ZPOOL
[root@blackhole ~]# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank  1008M   456K  1008M         -     0%     0%  1.00x  ONLINE  -

# Creating a 600 MB ZVOL (note the different USED vs REFER values)
[root@blackhole ~]# zfs create -V 600M tank/vol1
[root@blackhole ~]# zfs list
NAME        USED  AVAIL  REFER  MOUNTPOINT
tank        621M   259M    96K  /tank
tank/vol1   621M   880M    56K  -

# Snapshot creating - please see that, as REFER is very low (I did write nothig on the volume), snapshot creating is allowed
[root@blackhole ~]# zfs snapshot tank/vol1@snap1
[root@blackhole ~]# zfs list -t all
NAME              USED  AVAIL  REFER  MOUNTPOINT
tank              621M   259M    96K  /tank
tank/vol1         621M   880M    56K  -
tank/vol1@snap1     0B      -    56K  -

# Let write something to the volume (note how REFER is higher than free, unreserved space)
[root@blackhole ~]# zfs destroy tank/vol1@snap1
[root@blackhole ~]# dd if=/dev/zero of=/dev/zvol/tank/vol1 bs=1M count=500 oflag=direct
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 12.7038 s, 41.3 MB/s
[root@blackhole ~]# zfs list -t all
NAME        USED  AVAIL  REFER  MOUNTPOINT
tank        622M   258M    96K  /tank
tank/vol1   621M   378M   501M  -

# Snapshot creation now FAILS!
[root@blackhole ~]# zfs snapshot tank/vol1@snap1
cannot create snapshot 'tank/vol1@snap1': out of space
[root@blackhole ~]# zfs list -t all
NAME        USED  AVAIL  REFER  MOUNTPOINT
tank        622M   258M    96K  /tank
tank/vol1   621M   378M   501M  -

The above surely is safe behavior: when free, unused space is too low to guarantee the reserved space, snapshot creation is disallowed.

On the other side, using the "-s" option you can create a "sparse" ZVOL - a volume which nominal space is *not* accounted/subtracted from the total ZPOOL capacity. Such a volume have similar warnings that thin volumes. From the man page:

'Though not recommended, a "sparse volume" (also known as "thin provisioning") can be created by specifying the -s option to the zfs create -V command, or by changing the reservation after the volume has been created. A "sparse volume" is a volume where the reservation is less then the volume size. Consequently, writes to a sparse volume can fail with ENOSPC when the pool is low on space. For a sparse volume, changes to volsize are not reflected in the reservation.'

The only real difference vs a fully preallocated volume is the property carrying the reserved space expectation. I can even switch at run-time between a fully preallocated vs sparse volume by simply changing the right property. Indeed, a very important thing to understand is that this property can be set to *any value* between 0 ("none") and max volume (nominal) size.

On a 600M fully preallocated volumes:
[root@blackhole ~]# zfs get refreservation tank/vol1
NAME       PROPERTY        VALUE      SOURCE
tank/vol1  refreservation  621M       local

On a 600M sparse volume:
[root@blackhole ~]# zfs get refreservation tank/vol1
NAME       PROPERTY        VALUE      SOURCE
tank/vol1  refreservation  none       local

Now, a sparse (refreservation=none) volume *can* be snapshotted even if very little free space if available in the ZPOOL:

# The very same command that previously failed, now completes successfully
[root@blackhole ~]# zfs snapshot tank/vol1@snap1
[root@blackhole ~]# zfs list -t all
NAME              USED  AVAIL  REFER  MOUNTPOINT
tank              502M   378M    96K  /tank
tank/vol1         501M   378M   501M  -
tank/vol1@snap1     0B      -   501M  -

# Using a non-zero, but lower-than-nominal threshold (refreservation=100M) allows the snapshot to be taken:
[root@blackhole ~]# zfs set refreservation=100M tank/vol1
[root@blackhole ~]# zfs snapshot tank/vol1@snap1
[root@blackhole ~]# zfs list -t all
NAME              USED  AVAIL  REFER  MOUNTPOINT
tank              602M   278M    96K  /tank
tank/vol1         601M   378M   501M  -
tank/vol1@snap1     0B      -   501M  -

# If free space drops under the lower-but-not-zero reservation (refreservation=100M), snapshot again fails: [root@blackhole ~]# dd if=/dev/zero of=/dev/zvol/tank/vol1 bs=1M count=300 oflag=direct
300+0 records in
300+0 records out
314572800 bytes (315 MB) copied, 4.85282 s, 64.8 MB/s
[root@blackhole ~]# zfs list -t all
NAME              USED  AVAIL  REFER  MOUNTPOINT
tank              804M  76.3M    96K  /tank
tank/vol1         802M  76.3M   501M  -
tank/vol1@snap1   301M      -   501M  -
[root@blackhole ~]# zfs snapshot tank/vol1@snap2
cannot create snapshot 'tank/vol1@snap2': out of space

OK - now back to the original question: why reserved space can be useful? Consider the following two scenarios:

A) You want to efficiently use snapshots and *never* encounter unexpected full ZPOOL. Your main constrain it to use at most <50% of available space for your "critical" ZVOL. With such a setup, any "excessive" snapshot/volume creation will surely fail, but the main ZVOL will be unaffected;

B) You want to somewhat overprovision (taking account worst-case snapshot behavior), but with *large* operating margin. In this case, you can create a sparse volume with lower (but non-zero) reservation. Any snapshot/volume creation done when this margin is crossed will fail. You surely need to clean-up some space (eg: delete older snapshot), but you avoid the runaway effect of new snapshot being continuously created, consuming additional space.

Now leave ZWORLD, and back to thinp: it would be *really* cool to provide the same sort of functionality. Sure, you had to track space usage both at pool and a volume level - but the safety increase would be massive. There is an big difference between a corrupted main volume and a failed snapshot: while the latter can be resolved without too much concert, the former (volume corruption) really is a scary thing.

Don't misunderstand me, Zdenek: I *REALLY* appreciate you core developers from the outstanding work on LVM. This is especially true in the light of BTRFS's problems, and with stratis (which is heavily based on thinp) becoming the new next thing. I even more appreciate that you are on the mailing list, replying to your users.

Thin volumes are really cool (and fast!), but they can fail deadly. A fail-safe approach (ie: no new snapshot allowed) is much more desirable.

Thanks.


Regards


Zdenek

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux