Re: Cephfs IO halt on Node failure

Amudhan P <amudhan83@xxxxxxxxx> · Tue, 19 May 2020 11:32:11 +0530

Behaviour is same even after setting min_size 2.

On Mon 18 May, 2020, 12:34 PM Eugen Block, <eblock@xxxxxx> wrote:

> If your pool has a min_size 2 and size 2 (always a bad idea) it will
> pause IO in case of a failure until the recovery has finished. So the
> described behaviour is expected.
>
>
> Zitat von Amudhan P <amudhan83@xxxxxxxxx>:
>
> > Hi,
> >
> > Crush rule is "replicated" and min_size 2 actually. I am trying to test
> > multiple volume configs in a single filesystem
> > using file layout.
> >
> > I have created metadata pool with rep 3 (min_size2 and replicated crush
> > rule) and data pool with rep 3  (min_size2 and replicated crush rule).
> and
> > also  I have created multiple (replica 2, ec2-1 & ec4-2) pools and added
> to
> > the filesystem.
> >
> > Using file layout I have set different data pool to a different folders.
> so
> > I can test different configs in the same filesystem. all data pools
> > min_size set to handle single node failure.
> >
> > Single node failure is handled properly when only having metadata pool
> and
> > one data pool (rep3).
> >
> > After adding additional data pool to fs, single node failure scenario is
> > not working.
> >
> > regards
> > Amudhan P
> >
> > On Sun, May 17, 2020 at 1:29 AM Eugen Block <eblock@xxxxxx> wrote:
> >
> >> What’s your pool configuration wrt min_size and crush rules?
> >>
> >>
> >> Zitat von Amudhan P <amudhan83@xxxxxxxxx>:
> >>
> >> > Hi,
> >> >
> >> > I am using ceph Nautilus cluster with below configuration.
> >> >
> >> > 3 node's (Ubuntu 18.04) each has 12 OSD's, and mds, mon and mgr are
> >> running
> >> > in shared mode.
> >> >
> >> > The client mounted through ceph kernel client.
> >> >
> >> > I was trying to emulate a node failure when a write and read were
> going
> >> on
> >> > (replica2) pool.
> >> >
> >> > I was expecting read and write continue after a small pause due to a
> Node
> >> > failure but it halts and never resumes until the failed node is up.
> >> >
> >> > I remember I tested the same scenario before in ceph mimic where it
> >> > continued IO after a small pause.
> >> >
> >> > regards
> >> > Amudhan P
> >> > _______________________________________________
> >> > ceph-users mailing list -- ceph-users@xxxxxxx
> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> >>
> >>
> >>
>
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx