Re: Urgent Help Needed (regarding rbd cache)

Muhammad Junaid <junaid.fsd.pk@xxxxxxxxx> · Thu, 1 Aug 2019 10:31:08 +0500

Thanks Paul and Janne
Your email has cleared many things to me. Let me repeat my understanding. Every Critical data (Like Oracle/Any Other DB) writes will be done with sync, fsync flags, meaning they will be only confirmed to DB/APP after it is actually written to Hard drives/OSD's. Any other application can do it also.
All other writes, like OS logs etc will be confirmed immediately to app/user but later on written  passing through kernel, RBD Cache, Physical drive Cache (If any)  and then to disks. These are susceptible to power-failure-loss but overall things are recoverable/non-critical. 

Please confirm if my understanding is correct. Best Regards.

Muhammad Junaid 

On Wed, Jul 31, 2019 at 7:12 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:
Yes, this is power-failure safe. It behaves exactly the same as a real

disk's write cache.

It's really a question about semantics: what does it mean to write

data? Should the data still be guaranteed to be there after a power

failure?

The answer for most writes is: no, such a guarantee is neither

necessary nor helpful.

If you really want to write data then you'll have to specify that

during the write (sync or similar commands). It doesn't matter if you

are using

a networked file system with a cache or a real disk with a cache

underneath, the behavior will be the same. Data not flushed out

explicitly

or written with a sync flag will be lost after a power outage. But

that's fine because all reasonable file systems, applications, and

operating

systems are designed to handle exactly this case (disk caches aren't a

new thing).

Paul

-- 

Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH

Freseniusstr. 31h

81247 München

www.croit.io

Tel: +49 89 1896585 90

On Wed, Jul 31, 2019 at 1:08 PM Janne Johansson <icepic.dz@xxxxxxxxx> wrote:

>

> Den ons 31 juli 2019 kl 06:55 skrev Muhammad Junaid <junaid.fsd.pk@xxxxxxxxx>:

>>

>> The question is about RBD Cache in write-back mode using KVM/libvirt. If we enable this, it uses local KVM Host's RAM as cache for VM's write requests. And KVM Host immediately responds to VM's OS that data has been written to Disk (Actually it is still not on OSD's yet). Then how can be it power failure safe?

>>

> It is not. Nothing is power-failure safe. However you design things, you will always have the possibility of some long write being almost done when the power goes away, and that write (and perhaps others in caches) will be lost. Different filesystems handle losses good or bad, databases running on those filesystems will have their I/O interrupted and not acknowledged which may be not-so-bad or very bad.

>

> The write-caches you have in the KVM guest and this KVM RBD cache will make the guest I/Os faster, at the expense of higher risk of losing data in a power outage, but there will be some kind of roll-back, some kind of fsck/trash somewhere to clean up if a KVM host dies with guests running.

> In 99% of the cases, this is ok, the only lost things are "last lines to this log file" or "those 3 temp files for the program that ran" and in the last 1% you need to pull out your backups like when the physical disks die.

>

> If someone promises "power failure safe" then I think they are misguided. Chances may be super small for bad things to happen, but it will just never be 0%. Also, I think the guest VMs have the possibility to ask kvm-rbd to flush data out, and as such take the "pain" of waiting for real completion when it is actually needed, so that other operation can go fast (and slightly less safe) and the IO that needs harder guarantees can call for flushing and know when data actually is on disk for real.

>

> --

> May the most significant bit of your life be positive.

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com