Re: Running fstrim (discard) inside KVM machine with RBD as disk device corrupts ext4 filesystem

Jason Dillaman <jdillama@xxxxxxxxxx> · Mon, 3 Aug 2020 08:56:05 -0400

On Mon, Aug 3, 2020 at 4:11 AM Georg Schönberger
<g.schoenberger@xxxxxxxxxx> wrote:
>
> Hey Ceph users,
>
> we are currently facing some serious problems on our Ceph Cluster with
> libvirt (KVM), RBD devices and FSTRIM running inside VMs.
>
> The problem is right after running the fstrim command inside the VM the
> ext4 filesystem is corrupted and read-only with the following error message:
>
> EXT4-fs error (device sda1): ext4_mb_generate_buddy:756: group 136,
> block bitmap and bg descriptor inconsistent: 32200 vs 32768 free clusters
> Aborting journal on device sda1-8
> EXT4-fs (sda1): Remounting filesystem read-only
> EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected
> aborted journal
> EXT4-fs (sda1): Remounting filesystem read-only
>
> This behavior is reproducible across several VMs with different OS
> (Ubuntu 14.04, 16.04 and 18.04) so we guess it is a bug or a
> configuration problem regarding RBD devices.
>
> Our setup on the hosts running the VMs looks like:
> # lsb_release -d
> Description:    Ubuntu 20.04 LTS
> # uname -a
> Linux XXX 5.4.0-37-generic #41-Ubuntu SMP Wed Jun 3 18:57:02 UTC 2020
> x86_64 x86_64 x86_64 GNU/Linux
> # ceph --version
> ceph version 15.2.3 (d289bbdec69ed7c1f516e0a093594580a76b78d0) octopus
> (stable)
>
> -> I know there's the update to Ceph 15.2.4 but I haven't seen any
> fstrim/discard related changes in the changelog. If we could fix the
> problem with 15.2.4 I would be happy...
>
> The libvirt config for the RBD device with supporting fstrim (discard)
> is the following:
>
>      <disk type='network' device='disk'>
>        <driver name='qemu' type='raw' cache='directsync' io='native'
> discard='unmap'/>
>        <auth username='libvirt'>
>          <secret type='ceph' usage='client.libvirt'/>
>        </auth>
>        <source protocol='rbd' name='cephstorage/testtrim_system'>
>          <host name='XXX' port='6789'/>
>          <host name='XXX' port='6789'/>
>          <host name='XXX' port='6789'/>
>          <host name='XXX' port='6789'/>
>          <host name='XXX' port='6789'/>
>        </source>
>        <target dev='sda' bus='scsi'/>
>        <boot order='2'/>
>        <address type='drive' controller='0' bus='0' target='0' unit='0'/>
>      </disk>
>
> The ceph docs (https://docs.ceph.com/docs/octopus/rbd/qemu-rbd/) gave me
> some hints about enabling trim/discard and I tested using 4M as discard
> granularity, but I got the same error resulting in a corrupted ext4 file
> system.
> Changes made to the libvirt config:
>    <qemu:commandline>
>      <qemu:arg value='-set'/>
>      <qemu:arg value='device.scsi0-0-0-0.discard_granularity=4194304'/>
>    </qemu:commandline>
>
> As the RBD devices are thin-provisioned we really need calling fstrim
> inside the VM regularly to free up unused blocks, otherwise our Ceph
> pool will run out of space.
>
> Any ideas what could be wrong with our RBD setup or can somebody else
> reproduce the problem?
> Any hints on how to debug this problem?
> Any related/open Ceph issues? (I could not fined one)

I haven't heard of any similar issue. I would recommend trying an
older release of librbd1 (i.e. Nautilus), older release of QEMU, or
older guest OS in different combinations to see what the common factor
is.

>
> Thanks a lot for your help, Georg
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

-- 
Jason
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx