On Tue, Aug 4, 2020 at 2:12 AM Georg Schönberger <g.schoenberger@xxxxxxxxxx> wrote: > > On 03.08.20 14:56, Jason Dillaman wrote: > > On Mon, Aug 3, 2020 at 4:11 AM Georg Schönberger > > <g.schoenberger@xxxxxxxxxx> wrote: > >> Hey Ceph users, > >> > >> we are currently facing some serious problems on our Ceph Cluster with > >> libvirt (KVM), RBD devices and FSTRIM running inside VMs. > >> > >> The problem is right after running the fstrim command inside the VM the > >> ext4 filesystem is corrupted and read-only with the following error message: > >> > >> EXT4-fs error (device sda1): ext4_mb_generate_buddy:756: group 136, > >> block bitmap and bg descriptor inconsistent: 32200 vs 32768 free clusters > >> Aborting journal on device sda1-8 > >> EXT4-fs (sda1): Remounting filesystem read-only > >> EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected > >> aborted journal > >> EXT4-fs (sda1): Remounting filesystem read-only > >> > >> This behavior is reproducible across several VMs with different OS > >> (Ubuntu 14.04, 16.04 and 18.04) so we guess it is a bug or a > >> configuration problem regarding RBD devices. > >> > >> Our setup on the hosts running the VMs looks like: > >> # lsb_release -d > >> Description: Ubuntu 20.04 LTS > >> # uname -a > >> Linux XXX 5.4.0-37-generic #41-Ubuntu SMP Wed Jun 3 18:57:02 UTC 2020 > >> x86_64 x86_64 x86_64 GNU/Linux > >> # ceph --version > >> ceph version 15.2.3 (d289bbdec69ed7c1f516e0a093594580a76b78d0) octopus > >> (stable) > >> > >> -> I know there's the update to Ceph 15.2.4 but I haven't seen any > >> fstrim/discard related changes in the changelog. If we could fix the > >> problem with 15.2.4 I would be happy... > >> > >> The libvirt config for the RBD device with supporting fstrim (discard) > >> is the following: > >> > >> <disk type='network' device='disk'> > >> <driver name='qemu' type='raw' cache='directsync' io='native' > >> discard='unmap'/> > >> <auth username='libvirt'> > >> <secret type='ceph' usage='client.libvirt'/> > >> </auth> > >> <source protocol='rbd' name='cephstorage/testtrim_system'> > >> <host name='XXX' port='6789'/> > >> <host name='XXX' port='6789'/> > >> <host name='XXX' port='6789'/> > >> <host name='XXX' port='6789'/> > >> <host name='XXX' port='6789'/> > >> </source> > >> <target dev='sda' bus='scsi'/> > >> <boot order='2'/> > >> <address type='drive' controller='0' bus='0' target='0' unit='0'/> > >> </disk> > >> > >> The ceph docs (https://docs.ceph.com/docs/octopus/rbd/qemu-rbd/) gave me > >> some hints about enabling trim/discard and I tested using 4M as discard > >> granularity, but I got the same error resulting in a corrupted ext4 file > >> system. > >> Changes made to the libvirt config: > >> <qemu:commandline> > >> <qemu:arg value='-set'/> > >> <qemu:arg value='device.scsi0-0-0-0.discard_granularity=4194304'/> > >> </qemu:commandline> > >> > >> As the RBD devices are thin-provisioned we really need calling fstrim > >> inside the VM regularly to free up unused blocks, otherwise our Ceph > >> pool will run out of space. > >> > >> Any ideas what could be wrong with our RBD setup or can somebody else > >> reproduce the problem? > >> Any hints on how to debug this problem? > >> Any related/open Ceph issues? (I could not fined one) > > I haven't heard of any similar issue. I would recommend trying an > > older release of librbd1 (i.e. Nautilus), older release of QEMU, or > > older guest OS in different combinations to see what the common factor > > is. > > > >> Thanks a lot for your help, Georg > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > Digging deeper into the problem I can tell that fstrim is not the cause > but the trigger that leads to a read-only ext4 filesystem. > > Our problem only occurs with VMs that were recently migrated from one > Ceph Cluster to another one with rbd export and import. This is how this > was done: > 1. rbd first snap of running VM > 2. rbd export-diff of first snap -> rbd import-diff in new cluster > 3. Stop VM > 4. rbd second snap of stopped VM > 5. rbd export-diff of second snap with --from-snap first snap -> rbd > import-diff in new cluster Ah, well that might be this issue [1]. Can you re-test using the latest available Octopus dev release of librbd1 / ceph-common (for the 'rbd' CLI) [2]? > In some cases we now got corrupted ext4 filesystem with this type of > migration but it is not clear to us why because the VM got stopped in > step 3! > Anything wrong with our sequence of commands? > > One thing we think it could a possible cause is the enabled rbd cache in > our ceph.conf: > [client] > rbd_cache = true > rbd_cache_writethrough_until_flush = false > rbd_cache_size = 536870912 > rbd_cache_max_dirty = 134217728 > rbd_cache_target_dirty = 33554432 > rbd_cache_max_dirty_age = 5 > > So if step 4 is directly done after step 3 without waiting for the rbd > cache to be flushed, can this be the cause for data corruption? > Any rbd commands to tell Ceph to flush the rbd cache? > > THX, Georg > > [1] https://tracker.ceph.com/issues/46674 [2] https://shaman.ceph.com/repos/ceph/octopus/ -- Jason _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx