Running fstrim (discard) inside KVM machine with RBD as disk device corrupts ext4 filesystem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Ceph users,

we are currently facing some serious problems on our Ceph Cluster with libvirt (KVM), RBD devices and FSTRIM running inside VMs.

The problem is right after running the fstrim command inside the VM the ext4 filesystem is corrupted and read-only with the following error message:

EXT4-fs error (device sda1): ext4_mb_generate_buddy:756: group 136, block bitmap and bg descriptor inconsistent: 32200 vs 32768 free clusters
Aborting journal on device sda1-8
EXT4-fs (sda1): Remounting filesystem read-only
EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
EXT4-fs (sda1): Remounting filesystem read-only

This behavior is reproducible across several VMs with different OS (Ubuntu 14.04, 16.04 and 18.04) so we guess it is a bug or a configuration problem regarding RBD devices.

Our setup on the hosts running the VMs looks like:
# lsb_release -d
Description:    Ubuntu 20.04 LTS
# uname -a
Linux XXX 5.4.0-37-generic #41-Ubuntu SMP Wed Jun 3 18:57:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
# ceph --version
ceph version 15.2.3 (d289bbdec69ed7c1f516e0a093594580a76b78d0) octopus (stable)

-> I know there's the update to Ceph 15.2.4 but I haven't seen any fstrim/discard related changes in the changelog. If we could fix the problem with 15.2.4 I would be happy...

The libvirt config for the RBD device with supporting fstrim (discard) is the following:

    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='directsync' io='native' discard='unmap'/>
      <auth username='libvirt'>
        <secret type='ceph' usage='client.libvirt'/>
      </auth>
      <source protocol='rbd' name='cephstorage/testtrim_system'>
        <host name='XXX' port='6789'/>
        <host name='XXX' port='6789'/>
        <host name='XXX' port='6789'/>
        <host name='XXX' port='6789'/>
        <host name='XXX' port='6789'/>
      </source>
      <target dev='sda' bus='scsi'/>
      <boot order='2'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

The ceph docs (https://docs.ceph.com/docs/octopus/rbd/qemu-rbd/) gave me some hints about enabling trim/discard and I tested using 4M as discard granularity, but I got the same error resulting in a corrupted ext4 file system.
Changes made to the libvirt config:
  <qemu:commandline>
    <qemu:arg value='-set'/>
    <qemu:arg value='device.scsi0-0-0-0.discard_granularity=4194304'/>
  </qemu:commandline>

As the RBD devices are thin-provisioned we really need calling fstrim inside the VM regularly to free up unused blocks, otherwise our Ceph pool will run out of space.

Any ideas what could be wrong with our RBD setup or can somebody else reproduce the problem?
Any hints on how to debug this problem?
Any related/open Ceph issues? (I could not fined one)

Thanks a lot for your help, Georg
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux