Re: Ceph doesn't detect journal failure while the OSD is running

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Greg,

My first test was:

* put a journal on tmpfs
* umount the tmpfs

The second one was almost the same:

* rm -rf /journals/*

Here /journals contains every journal... (3 actually)

This action wiped off the journal and ceph didn't detect anything.

After that I created a new pool, a new image inside it, mapped it,
formated it and wrote data in it with dd. I also used rados bench to
write data.
Ceph didn't see anything, only a service ceph restart osd make them
the OSDs crashed.


On Sun, Aug 26, 2012 at 9:43 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>
> On Sunday, August 26, 2012 at 11:09 AM, Sébastien Han wrote:
> > Hi guys!
> >
> > Ceph doesn't seem to detect a journal failure. The cluster keeps
> > writing data even if the journal doesn't exist anymore.
> > I can find anywhere in the log or from the ceph's command output any
> > information about a journal failure. Obviously if an OSD is restarted
> > ceph will complain but a failure on fly won't be detected.
> >
> > It seems that Ceph just writes directly to the backend filesystem
> > without complaining.
> >
> > Yes my monitoring system will tell me that the disk which contains my
> > journals is down... But it's not the point here :)
> >
> > The really good point for me is that the cluster keeps running even if
> > the journal is gone. The bad point is obviously that the cluster keeps
> > writing data to the backend filesystem (without O_DIRECT I guess...).
> > I'll prefer a 'read only' cluster facility while the journal is down.
> > Being able to retrieve the data is as crucial as writing data.
> >
> > Any reaction about that? Roadmap feature maybe?
> So are you saying the filesystem that the journal was located on disappeared? Or the underlying disk disappeared?
> And then the OSD didn't notice?
>
> If so, that's definitely a problem to be corrected as soon as we can…It's more likely to make the OSD shut down than to continue serving reads, though.
> -Greg
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux