Re: what happens if a server crashes with cephfs?

Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> · Thu, 8 Dec 2022 18:38:09 +0100

Hi Charles,

are you concerned with a single Ceph cluster server crash or the whole
server crashing? If you have sufficient redundancy, nothing bad should
happen but the file system should remain available. The same should be true
if you perform an upgrade in the "correct" way, e.g., through the cephadm
commands.

The folks over at 45 drives made a little show of tearing down a ceph
cluster bit by bit while it is running:

https://www.youtube.com/watch?v=8paAkGx2_OA

Cheers,
Manuel

On Thu, Dec 8, 2022 at 6:34 PM Charles Hedrick <hedrick@xxxxxxxxxxx> wrote:

> network and local file systems have different requirements. If I have a
> long job and the machine I'm running on crashes, I have to rerun it. The
> fact that the last 500 msec of data didn't get flushed to disk is unlikely
> to matter.
>
> If I have a long job using a network file system, and the server crashes,
> my job itself doesn't crash. You really want it to continue after the
> server reboots without any errors. It's true that you could return an error
> for write or close, and the job could detect that and either rewrite the
> file or exit. However a very large amount of code is written for local
> files, and doesn't check errors for write and close.
>
> I don't actually know how our long jobs would behave if a close fails.
> Perhaps it's OK. It's mostly python. Presumably the python interpreter
> would throw an I/O error.
>
> A related question: what is likely to happen when you do a version
> upgrade? Is that done in a way that won't generate errors in user code?
>
> ------------------------------
> *From:* Gregory Farnum <gfarnum@xxxxxxxxxx>
> *Sent:* Thursday, December 8, 2022 11:44 AM
> *To:* Manuel Holtgrewe <zyklenfrei@xxxxxxxxx>
> *Cc:* Charles Hedrick <hedrick@xxxxxxxxxxx>; Dhairya Parmar <
> dparmar@xxxxxxxxxx>; ceph-users@xxxxxxx <ceph-users@xxxxxxx>
> *Subject:* Re:  Re: what happens if a server crashes with
> cephfs?
>
> On Thu, Dec 8, 2022 at 8:42 AM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx>
> wrote:
> >
> > Hi Charles,
> >
> > as far as I know, CephFS implements POSIX semantics. That is, if the
> CephFS server cluster dies for whatever reason then this will translate in
> I/O errors. This is the same as if your NFS server dies or you run the
> program locally on a workstation/laptop and the machine loses power. POSIX
> file systems guarantee that data is persisted on the storage after a file
> is closed
>
> Actually the "commit on close" is *entirely* an NFS-ism and is not
> part of posix. If you expect a closed file to be flushed to disk
> anywhere else (including CephFS), you will be disappointed. You need
> to use fsync/fdatasync/sync/syncfs.
> -Greg
>
> > or fsync() is called. Otherwise, the data may still be "in flight",
> e.g., in the OS I/O cache or even the runtime library's cache.
> >
> > This is not a bug but a feature as this improves performance when
> appending small bits to a file and the HDD head does not have to move every
> time something is written and not a full 4kb block has to be written for
> SSD.
> >
> > Posix semantics even go further, enforcing certain guarantees if files
> are written from multiple clients. Recently, something called "lazy I/O"
> has been introduced [1] in CephFS which allows to explicitly relax certain
> of these guarantees to improve performance.
> >
> > I don't think there even is a ceph mount setting that allows you to
> configure local cache mechanisms as for NFS. For NFS, I have seen setups
> where two clients saw two different versions of the same -- closed -- file
> because one had written to the file and this was not yet reflected on the
> second client. To the best of my knowledge, this will not happen with
> CephFS.
> >
> > I'd be happy to learn to be wrong if I'm wrong. ;-)
> >
> > Best wishes,
> > Manuel
> >
> > [1] https://docs.ceph.com/en/latest/cephfs/lazyio/
> >
> > On Thu, Dec 8, 2022 at 5:09 PM Charles Hedrick <hedrick@xxxxxxxxxxx>
> wrote:
> >>
> >> thanks. I'm evaluating cephfs for a computer science dept. We have
> users that run week-long AI training jobs. They use standard packages,
> which they probably don't want to modify. At the moment we use NFS. It uses
> synchronous I/O, so if somethings goes wrong, the users' jobs pause until
> we reboot, and then continue. However there's an obvious performance
> penalty for this.
> >> ________________________________
> >> From: Gregory Farnum <gfarnum@xxxxxxxxxx>
> >> Sent: Thursday, December 8, 2022 2:08 AM
> >> To: Dhairya Parmar <dparmar@xxxxxxxxxx>
> >> Cc: Charles Hedrick <hedrick@xxxxxxxxxxx>; ceph-users@xxxxxxx <
> ceph-users@xxxxxxx>
> >> Subject: Re:  Re: what happens if a server crashes with
> cephfs?
> >>
> >> More generally, as Manuel noted you can (and should!) make use of fsync
> et al for data safety. Ceph’s async operations are not any different at the
> application layer from how data you send to the hard drive can sit around
> in volatile caches until a consistency point like fsync is invoked.
> >> -Greg
> >>
> >> On Wed, Dec 7, 2022 at 10:02 PM Dhairya Parmar <dparmar@xxxxxxxxxx
> <mailto:dparmar@xxxxxxxxxx>> wrote:
> >> Hi Charles,
> >>
> >> There are many scenarios where the write/close operation can fail but
> >> generally
> >> failures/errors are logged (normally every time) to help debug the case.
> >> Therefore
> >> there are no silent failures as such except you encountered  a very rare
> >> bug.
> >> - Dhairya
> >>
> >>
> >> On Wed, Dec 7, 2022 at 11:38 PM Charles Hedrick <hedrick@xxxxxxxxxxx
> <mailto:hedrick@xxxxxxxxxxx>> wrote:
> >>
> >> > I believe asynchronous operations are used for some operations in
> cephfs.
> >> > That means the server acknowledges before data has been written to
> stable
> >> > storage. Does that mean there are failure scenarios when a write or
> close
> >> > will return an error? fail silently?
> >> >
> >> > _______________________________________________
> >> > ceph-users mailing list -- ceph-users@xxxxxxx<
> mailto:ceph-users@xxxxxxx <ceph-users@xxxxxxx>>
> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:
> ceph-users-leave@xxxxxxx>
> >> >
> >> >
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx
> <ceph-users@xxxxxxx>>
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:
> ceph-users-leave@xxxxxxx>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx