Possible corruption with MSSQL on RBD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I filed a issue in the tracker, but I'm looking for some feedback to diagnose this a bit further: http://tracker.ceph.com/issues/17545

The situation is that with a Firefly or Hammer (haven't tested Jewel yet) a MSSQL server running on RBD will sometimes complain about corruption.

Using SQLioSim we can reproduce the issue on a small Proxmox + Ceph cluster and after an hour or so it will yield:

Expected FileId: 0x0
Received FileId: 0x0
Expected PageId: 0xCB19C
Received PageId: 0xCB19A (does not match expected)
Received CheckSum: 0x9F444071
Calculated CheckSum: 0x89603EC9 (does not match expected)
Received Buffer Length: 0x2000

The issue only seems to happen with RBD caching enabled. When disabling the RBD cache or using cache=directsync we were not able to reproduce the issue.

When using LVM/file based backends for Qemu the problem also didn't pop up.

So this seems to be either a librbd issue or the RBD driver inside Qemu.

Any hints on how to debug this further to find the root cause?

Wido
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux