Hi, at this night I had same issue on Hammer LTS. I think that this is a ceph bug.
My history:
Ceph version: 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) Distro: Debian 7 (Proxmox 3.4) Kernel: 2.6.32-39-pve
We have 9x 6TB SAS Drives in main pool and 6x 128GB PCIe SSD in cache pool on 3 nodes in same box.
Long time cache pool worked in writeback mode. But we got a poor response of the SSD drives and 100% utilisation, because we decide try to switch cache pool to readonly mode.
First, I removed writeback cache as shown here: http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#removing-a-writeback-cache
Second, I fully removed cache pool and created new with same parameters and assigned same crush_ruleset, but cache-mode in readonly. Everything was ok, vm's booted, but i / o was still bad. SSDs were still remained bottle neck. Then I tried to disable the cache with this command:
ceph osd tier cache-mode cache_pool none
And all, i/o has stopped. Fully!
Restart of virtual machines, revealed damage to the file system and start fsck and chkfs at boot. Many vm's started well, although after a partial loss of data. Then I as you thought that some of the data simply somehow in the cache pool. And I turned back cache to readonly mode. Then when I realized that it does not help, I again disable it, and remove it completely as written here: http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#removing-a-read-only-cache But it only made things worse. The file system of virtual machines has been damaged to such an extent that they have not been booting properly and reports an any errors of data corruption. Recovering from a snapshot is not helped.
Now I can say for sure that removing readonly cache in my configuration causes data corruption :(
Xiangyu (Raijin, BP&IT Dept) писал 2015-09-25 12:11:
Hi, this night I had same issue on Hammer LTS.
Ceph version: 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) Distro: Debian 7 (Proxmox 3.4) Kernel: 2.6.32-39-pve
We have: 9x 6TB SAS Drives in main pool and 6x 128Gb PCIe SSD in cache pool on 3 nodes in same box
Xiangyu (Raijin, BP&IT Dept) писал 2015-09-25 12:11:
Hi,
I have a ceph cluster as the nova backend storage, and I enabled the cache tier with readonly cache-mode for the nova_pool, now the nova instance cannot boot after remove the nova_pool cache tier,
The instance show the error is “boot failed:not a bootable disk”
I used the below command to remove the cache tier which refer to the ceph document
ceph osd tier cache-mode cache_pool none
ceph osd tier remove nova_pool cache_pool
when I perform the troubleshooting, I found that some images existed in nova_cache but not found in nova_pool, so it seems that the cache pool(nove_cache) work with writeback mode before, why ?
I confirmed that I set it with readonly mode before,what is wrong ?
And if there is any way can fix the instance boot issue ?
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Квапил Андрей +7 966 05 666 50
Hi,
I have a ceph cluster as the nova backend storage, and I enabled the cache tier with readonly cache-mode for the nova_pool, now the nova instance cannot boot after remove the nova_pool cache tier,
The instance show the error is “boot failed:not a bootable disk”
I used the below command to remove the cache tier which refer to the ceph document
ceph osd tier cache-mode cache_pool none
ceph osd tier remove nova_pool cache_pool
when I perform the troubleshooting, I found that some images existed in nova_cache but not found in nova_pool, so it seems that the cache pool(nove_cache) work with writeback mode before, why ?
I confirmed that I set it with readonly mode before,what is wrong ?
And if there is any way can fix the instance boot issue ?
We have: 9x 6TB SAS Drives in main pool and 6x 128 PCIe SSD in cache pool on same boxes in 3 nodes
у нас есть: 1 6tb SAS дисков в главный бассейн и 6 128 PCIe SSD в кэш бассейн на же коробки в 3 узлов
--
Квапил Андрей +7 966 05 666 50
|