Hello!
I've got a weird situation with rdb drive image
reliability. I found that after hard-reset VM with ceph rbd
drive from my new cluster become corrupted. I accidentally
found it during HA tests of my new cloud cluster: after host
reset VM was not able to boot again because of the virtual
drive errors. The same result will be if you just kill qemu
process (like would happened at host crash time).
First of all I thought it is a guest OS problem. But then
I tried RouterOS (linux based), Linux, FreeBSD - all options
show the same behavior.
Then I blamed OpenNebula installation. For the test sake
I've installed the latest Proxmox (5.1-36) to another server.
The first subtest: I've created a VM in OpenNebula from
predefined image, shut it down, then create Proxmox VM and
pointed it to the image was created from OpenNebula.
The second subtest: I've made a clean install from ISO with
from Proxmox console, having previously created from Proxmox
VM and drive image (of course, on the same ceph pool).
Both results: unbootable VMs.
Finally I've made a clean install to the fresh VM with
local LVM-backed drive image. And - guess what? - it survived
qemu process kill.
This is the first situation of this kind in my practice
so I would like to ask for guidance. I believe that it is a
cache problem of some kind, but I haven't faced it with
earlier releases.
Some cluster details:
It's a small test cluster with 4 nodes, each has:
2x CPU E5-2665,
128GB RAM
1 OSD with Samsung sm863 1.92TB drive
IB connection with IPoIB on QDR IB network
OS: Ubuntu 16.04 with 4.10 kernel
ceph: luminous 12.2.1
Client (kvm host) OSes:
1. Ubuntu 16.04 (the same hosts as ceph cluster)
2. Debian 9.1 in case of Proxmox
ceph.conf:
[global]
fsid = 6a8ffc55-fa2e-48dc-a71c-647e1fff749b
mon_initial_members = e001n01, e001n02, e001n03
mon_host = 10.103.0.1,10.103.0.2,10.103.0.3
rbd default format = 2
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd mount options = rw,noexec,nodev,noatime,nodiratime,nobarrier
osd mount options xfs = rw,noexec,nodev,noatime,nodiratime,nobarrier
osd_mkfs_type = xfs
bluestore fsck on mount = true
debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_filer = 0/0
debug_objecter = 0/0
debug_rados = 0/0
debug_rbd = 0/0
debug_journaler = 0/0
debug_objectcatcher = 0/0
debug_client = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
debug_mon = 0/0
debug_paxos = 0/0
debug_rgw = 0/0
[osd]
osd op threads = 4
osd disk threads = 2
osd max backfills = 1
osd recovery threads = 1
osd recovery max active = 1
--