On Sun, 26 Aug 2012, S?bastien Han wrote: > Hi Greg, > > My first test was: > > * put a journal on tmpfs > * umount the tmpfs This should have errored out with 'device is busy'. Are you sure it actually umounted? > The second one was almost the same: > > * rm -rf /journals/* > > Here /journals contains every journal... (3 actually) That's normal Unix behavior. The file doesn't go away until all names are unlinked/removed *and* all open file handles are closed... sage > This action wiped off the journal and ceph didn't detect anything. > > After that I created a new pool, a new image inside it, mapped it, > formated it and wrote data in it with dd. I also used rados bench to > write data. > Ceph didn't see anything, only a service ceph restart osd make them > the OSDs crashed. > > > On Sun, Aug 26, 2012 at 9:43 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > > > > On Sunday, August 26, 2012 at 11:09 AM, S?bastien Han wrote: > > > Hi guys! > > > > > > Ceph doesn't seem to detect a journal failure. The cluster keeps > > > writing data even if the journal doesn't exist anymore. > > > I can find anywhere in the log or from the ceph's command output any > > > information about a journal failure. Obviously if an OSD is restarted > > > ceph will complain but a failure on fly won't be detected. > > > > > > It seems that Ceph just writes directly to the backend filesystem > > > without complaining. > > > > > > Yes my monitoring system will tell me that the disk which contains my > > > journals is down... But it's not the point here :) > > > > > > The really good point for me is that the cluster keeps running even if > > > the journal is gone. The bad point is obviously that the cluster keeps > > > writing data to the backend filesystem (without O_DIRECT I guess...). > > > I'll prefer a 'read only' cluster facility while the journal is down. > > > Being able to retrieve the data is as crucial as writing data. > > > > > > Any reaction about that? Roadmap feature maybe? > > So are you saying the filesystem that the journal was located on disappeared? Or the underlying disk disappeared? > > And then the OSD didn't notice? > > > > If so, that's definitely a problem to be corrected as soon as we can?It's more likely to make the OSD shut down than to continue serving reads, though. > > -Greg > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html