Re: RBD corruption when removing tier cache

Jan Pekař - Imatic <jan.pekar@xxxxxxxxx> · Sun, 3 Dec 2017 02:54:21 +0100

Hi all,

today I continued with my investigation. and maybe somebody will be 
interested with my research, so I'm sending it here.

I compared object in hot pool with object in cold pool and they were the 
same so I removed cache tier from cold pool.

Then I tried to fsck my rbd image using libvirt virtual with booted 
rescue cd.

I was successful only with read-only mount and with not replaying 
journal (mount -o ro,noload)

I noticed, that I'm getting IO errors on the disk.

sd 2:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda] tag#0 Sense Key : Aborted Command [current]
sd 2:0:0:0: [sda] tag#0 Add. Sense: I/O process terminated
sd 2:0:0:0: [sda] tag#0 CDB: Write(10) 2a 00 00 00 08 08 00 00 10 00
blk_update_request: 4 callbacks suppressed
blk_update_request: I/O error, dev sda, sector 2056
buffer_io_error: 61 callbacks suppressed
Buffer I/O error on dev sda1, logical block 1, lost async page write
Buffer I/O error on dev sda1, logical block 2, lost async page write
VFS: Dirty inode writeback failed for block device sda1 (err=-5).

I wanted to write to that block manually. To be sure I created rbd 
snapshot of that filesystem and after I created it, problems disappeared.

After creating snapshot I was able to fsck that filesystem, replay ext4 
jorunal.

It looks, that objects in cold pool were locked somehow so they cannot 
be modified? After snapshot they changed name and modification was 
possible? Can I debug it somehow?

I continued with cleaning hot pool, i tried to delete objects. Delete 
operation succeeded with rados rm, but some objects stayed there and I 
couldn't delete or get them anymore.

rados -p hot ls

rbd_data.9c000238e1f29.0000000000000000
rbd_data.9c000238e1f29.0000000000000621
rbd_data.9c000238e1f29.0000000000000001
rbd_data.9c000238e1f29.0000000000000a2c
rbd_data.9c000238e1f29.0000000000000200
rbd_data.9c000238e1f29.0000000000000622
rbd_data.9c000238e1f29.0000000000000009
rbd_data.9c000238e1f29.0000000000000208
rbd_data.9c000238e1f29.00000000000000c1
rbd_data.9c000238e1f29.0000000000000625
rbd_data.9c000238e1f29.00000000000000d8
rbd_data.9c000238e1f29.0000000000000623
rbd_data.9c000238e1f29.0000000000000624

rados -p hot rm
error removing hot>rbd_data.9c000238e1f29.0000000000000000: (2) No such 
file or directory

How to cleanup that pool? What could happen to that pool?

After some additional tests I think, that my initial problem caused 
switching cache mode to forward, so I recommend not only warn, like it 
is now when using that mode, but also to change official webpage

http://docs.ceph.com/docs/master/rados/operations/cache-tiering/

and find some other ways to flush all objects (like turn off VMs, set 
short time to evict or target size) and remove overlay after that.

With regards
Jan Pekar

On 1.12.2017 03:43, Jan Pekař - Imatic wrote:
Hi all,
today I tested adding SSD cache tier to pool.
Everything worked, but when I tried to remove it and run

rados -p hot-pool cache-flush-evict-all

I got

         rbd_data.9c000238e1f29.0000000000000000
failed to flush /rbd_data.9c000238e1f29.0000000000000000: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000621
failed to flush /rbd_data.9c000238e1f29.0000000000000621: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000001
failed to flush /rbd_data.9c000238e1f29.0000000000000001: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000a2c
failed to flush /rbd_data.9c000238e1f29.0000000000000a2c: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000200
failed to flush /rbd_data.9c000238e1f29.0000000000000200: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000622
failed to flush /rbd_data.9c000238e1f29.0000000000000622: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000009
failed to flush /rbd_data.9c000238e1f29.0000000000000009: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000208
failed to flush /rbd_data.9c000238e1f29.0000000000000208: (2) No such 
file or directory
         rbd_data.9c000238e1f29.00000000000000c1
failed to flush /rbd_data.9c000238e1f29.00000000000000c1: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000625
failed to flush /rbd_data.9c000238e1f29.0000000000000625: (2) No such 
file or directory
         rbd_data.9c000238e1f29.00000000000000d8
failed to flush /rbd_data.9c000238e1f29.00000000000000d8: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000623
failed to flush /rbd_data.9c000238e1f29.0000000000000623: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000624
failed to flush /rbd_data.9c000238e1f29.0000000000000624: (2) No such 
file or directory
error from cache-flush-evict-all: (1) Operation not permitted

I also notice, that switching cache tier to "forward" is not safe?

Error EPERM: 'forward' is not a well-supported cache mode and may 
corrupt your data.  pass --yes-i-really-mean-it to force.

In the moment of flushing (or switching to forward mode) RBD got 
corrupted and even fsck was unable to repair it (unable to set 
superblock flags). I don't know if it is due to cache still active and 
corrupted or ext4 got messed, that it cannot work anymore.

Even if VM that was using that pool is stopped I cannot flush it.

So what I did wrong? Can I get my data back? Is it safe to remove tier 
cache and how?

Using rados get I can dump objects to disk, but why I cannot flush it 
(evict)?

It looks like the same issue as on
http://tracker.ceph.com/issues/12659
but it is unresolved.

I also have some snapshot of RBD image in the cold pool, but that should 
not cause problems in production.

I'm using 12.2.1 version on all 4 nodes.

With regards
Jan Pekar
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
============
Ing. Jan Pekař
jan.pekar@xxxxxxxxx | +420603811737
----
Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz
============
--
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com