Re: nova instance cannot boot after remove cache tier--help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Good news, while I wrote the previous letter I found the solution, to recovery back my vm's:

    ceph osd tier remove cold-storage

I've been thinking how it can affect what happened. But I still do not understand why overlay option has so strange behavior.
I know that overlay option sets overlay of the storage pool, so that all the IOs are now routing to the cache pool.

I am guessing that it works the same for readonly cache as for the writeback cache.
That is forwards all requests to read and write in the cache pool, while write requests it must send to the main pool, not cache.

Now when the overlay is turned off, the cache pool is not used by ceph...

Квапил писал 2016-02-12 15:02:

Hi, at this night I had same issue on Hammer LTS.
I think that this is a ceph bug.

My history:

Ceph version: 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
Distro: Debian 7 (Proxmox 3.4)
Kernel: 2.6.32-39-pve

We have 9x 6TB SAS Drives in main pool and 6x 128GB PCIe SSD in cache pool on 3 nodes in same box.

Long time cache pool worked in writeback mode. But we got a poor response of the SSD drives and 100% utilisation, because we decide try to switch cache pool to readonly mode.

First, I removed writeback cache as shown here:
http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#removing-a-writeback-cache

Second, I fully removed cache pool and created new with same parameters and assigned same crush_ruleset, but cache-mode in readonly. Everything was ok, vm's booted, but i / o was still bad. SSDs were still remained bottle neck.
Then I tried to disable the cache with this command:

    ceph osd tier cache-mode cache_pool none

And all, i/o has stopped. Fully!

Restart of virtual machines, revealed damage to the file system and start fsck and chkfs at boot. Many vm's started well, although after a partial loss of data.
Then I as you thought that some of the data simply somehow in the cache pool. And I turned back cache to readonly mode. Then when I realized that it does not help, I again disable it, and remove it completely as written here:
http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#removing-a-read-only-cache
But it only made things worse. The file system of virtual machines has been damaged to such an extent that they have not been booting properly and reports an any errors of data corruption. Recovering from a snapshot is not helped.

Now I can say for sure that removing readonly cache in my configuration causes data corruption :(

 

Xiangyu (Raijin, BP&IT Dept) писал 2015-09-25 12:11:

Hi, this night I had same issue on Hammer LTS.

Ceph version: 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
Distro: Debian 7 (Proxmox 3.4)
Kernel: 2.6.32-39-pve

We have: 9x 6TB SAS Drives in main pool and 6x 128Gb PCIe SSD in cache pool on 3 nodes in same box

Xiangyu (Raijin, BP&IT Dept) писал 2015-09-25 12:11:

Hi,

 

I have a ceph cluster as the nova backend storage, and I enabled the cache tier with readonly cache-mode for the nova_pool, now the nova instance cannot boot after remove the nova_pool cache tier,

 

The instance show the error is “boot failed:not a bootable disk”

 

I used the below command to remove the cache tier which refer to the ceph document

ceph osd tier cache-mode cache_pool none

ceph osd tier remove nova_pool cache_pool

 

when I perform the troubleshooting, I found that some images existed in nova_cache but not found in nova_pool, so it seems that the cache pool(nove_cache) work with writeback mode before, why ?

I confirmed that I set it with readonly mode before,what is wrong ?

 

And if there is any way can fix the instance boot issue ?  

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--

Квапил Андрей
+7 966 05 666 50

Hi,

 

I have a ceph cluster as the nova backend storage, and I enabled the cache tier with readonly cache-mode for the nova_pool, now the nova instance cannot boot after remove the nova_pool cache tier,

 

The instance show the error is “boot failed:not a bootable disk”

 

I used the below command to remove the cache tier which refer to the ceph document

ceph osd tier cache-mode cache_pool none

ceph osd tier remove nova_pool cache_pool

 

when I perform the troubleshooting, I found that some images existed in nova_cache but not found in nova_pool, so it seems that the cache pool(nove_cache) work with writeback mode before, why ?

I confirmed that I set it with readonly mode before,what is wrong ?

 

And if there is any way can fix the instance boot issue ?  


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Квапил Андрей
+7 966 05 666 50

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Квапил Андрей
+7 966 05 666 50
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux