RBD corruption when removing tier cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,
today I tested adding SSD cache tier to pool.
Everything worked, but when I tried to remove it and run

rados -p hot-pool cache-flush-evict-all

I got

        rbd_data.9c000238e1f29.0000000000000000
failed to flush /rbd_data.9c000238e1f29.0000000000000000: (2) No such file or directory
        rbd_data.9c000238e1f29.0000000000000621
failed to flush /rbd_data.9c000238e1f29.0000000000000621: (2) No such file or directory
        rbd_data.9c000238e1f29.0000000000000001
failed to flush /rbd_data.9c000238e1f29.0000000000000001: (2) No such file or directory
        rbd_data.9c000238e1f29.0000000000000a2c
failed to flush /rbd_data.9c000238e1f29.0000000000000a2c: (2) No such file or directory
        rbd_data.9c000238e1f29.0000000000000200
failed to flush /rbd_data.9c000238e1f29.0000000000000200: (2) No such file or directory
        rbd_data.9c000238e1f29.0000000000000622
failed to flush /rbd_data.9c000238e1f29.0000000000000622: (2) No such file or directory
        rbd_data.9c000238e1f29.0000000000000009
failed to flush /rbd_data.9c000238e1f29.0000000000000009: (2) No such file or directory
        rbd_data.9c000238e1f29.0000000000000208
failed to flush /rbd_data.9c000238e1f29.0000000000000208: (2) No such file or directory
        rbd_data.9c000238e1f29.00000000000000c1
failed to flush /rbd_data.9c000238e1f29.00000000000000c1: (2) No such file or directory
        rbd_data.9c000238e1f29.0000000000000625
failed to flush /rbd_data.9c000238e1f29.0000000000000625: (2) No such file or directory
        rbd_data.9c000238e1f29.00000000000000d8
failed to flush /rbd_data.9c000238e1f29.00000000000000d8: (2) No such file or directory
        rbd_data.9c000238e1f29.0000000000000623
failed to flush /rbd_data.9c000238e1f29.0000000000000623: (2) No such file or directory
        rbd_data.9c000238e1f29.0000000000000624
failed to flush /rbd_data.9c000238e1f29.0000000000000624: (2) No such file or directory
error from cache-flush-evict-all: (1) Operation not permitted

I also notice, that switching cache tier to "forward" is not safe?

Error EPERM: 'forward' is not a well-supported cache mode and may corrupt your data. pass --yes-i-really-mean-it to force.

In the moment of flushing (or switching to forward mode) RBD got corrupted and even fsck was unable to repair it (unable to set superblock flags). I don't know if it is due to cache still active and corrupted or ext4 got messed, that it cannot work anymore.

Even if VM that was using that pool is stopped I cannot flush it.

So what I did wrong? Can I get my data back? Is it safe to remove tier cache and how?

Using rados get I can dump objects to disk, but why I cannot flush it (evict)?

It looks like the same issue as on
http://tracker.ceph.com/issues/12659
but it is unresolved.

I also have some snapshot of RBD image in the cold pool, but that should not cause problems in production.

I'm using 12.2.1 version on all 4 nodes.

With regards
Jan Pekar
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux