Hello, Am 14.10.2016 um 12:04 schrieb Wido den Hollander: > >> Op 12 oktober 2016 om 17:57 schreef Wido den Hollander <wido@xxxxxxxx>: >> >> >> >>> Op 12 oktober 2016 om 17:01 schreef Gregory Farnum <gfarnum@xxxxxxxxxx>: >>> >>> >>> On Wed, Oct 12, 2016 at 7:57 AM, Wido den Hollander <wido@xxxxxxxx> wrote: >>>> Hi, >>>> >>>> I filed a issue in the tracker, but I'm looking for some feedback to diagnose this a bit further: http://tracker.ceph.com/issues/17545 >>>> >>>> The situation is that with a Firefly or Hammer (haven't tested Jewel yet) a MSSQL server running on RBD will sometimes complain about corruption. >>>> >>>> Using SQLioSim we can reproduce the issue on a small Proxmox + Ceph cluster and after an hour or so it will yield: >>>> >>>> Expected FileId: 0x0 >>>> Received FileId: 0x0 >>>> Expected PageId: 0xCB19C >>>> Received PageId: 0xCB19A (does not match expected) >>>> Received CheckSum: 0x9F444071 >>>> Calculated CheckSum: 0x89603EC9 (does not match expected) >>>> Received Buffer Length: 0x2000 >>>> >>>> The issue only seems to happen with RBD caching enabled. When disabling the RBD cache or using cache=directsync we were not able to reproduce the issue. >>>> >>>> When using LVM/file based backends for Qemu the problem also didn't pop up. >>>> >>>> So this seems to be either a librbd issue or the RBD driver inside Qemu. >>>> >>>> Any hints on how to debug this further to find the root cause? >>> >>> If you've got control over the clients, try building with commit >>> 9ec6e7f608608088d51e449c9d375844631dcdde backported to them (I believe >>> it's also in the latest Hammer release, but maybe there hasn't been >>> one cut since the backport?); tracked at >>> http://tracker.ceph.com/issues/16002 but of course the web site is >>> dead so you can't look at that right now. :( >> >> I verified, but 9ec6e7 is not in v0.94.9, but it is in the Jewel release. >> >> Tests are running with Jewel now and I will probably have the results tomorrow. If Jewel doesn't break the commit you send might indeed resolve it. >> i've seen several reports in the last month about missing backport even they were marked as to be backported. Is there a generall problem with that? Greets, Stefan > > The tests have been running for over 24 hours and all still looks good. We will let it run for over the weekend to make sure it has been fixed. > > Wido > >> Wido >> >>> -Greg >>> >>>> >>>> Wido >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html