Re: Cephfs IO halt on Node failure

Amudhan P <amudhan83@xxxxxxxxx> · Sun, 24 May 2020 12:55:23 +0530

Sorry for the late reply.
I have pasted crush map in below url : https://pastebin.com/ASPpY2VB
and this my osd tree output and this issue are only when i use it with
filelayout.

ID CLASS WEIGHT    TYPE NAME          STATUS REWEIGHT PRI-AFF
-1       327.48047 root default
-3       109.16016     host strgsrv01
 0   hdd   5.45799         osd.0          up  1.00000 1.00000
 2   hdd   5.45799         osd.2          up  1.00000 1.00000
 3   hdd   5.45799         osd.3          up  1.00000 1.00000
 4   hdd   5.45799         osd.4          up  1.00000 1.00000
 5   hdd   5.45799         osd.5          up  1.00000 1.00000
 6   hdd   5.45799         osd.6          up  1.00000 1.00000
 7   hdd   5.45799         osd.7          up  1.00000 1.00000
19   hdd   5.45799         osd.19         up  1.00000 1.00000
20   hdd   5.45799         osd.20         up  1.00000 1.00000
21   hdd   5.45799         osd.21         up  1.00000 1.00000
22   hdd   5.45799         osd.22         up  1.00000 1.00000
23   hdd   5.45799         osd.23         up  1.00000 1.00000
-5       109.16016     host strgsrv02
 1   hdd   5.45799         osd.1          up  1.00000 1.00000
 8   hdd   5.45799         osd.8          up  1.00000 1.00000
 9   hdd   5.45799         osd.9          up  1.00000 1.00000
10   hdd   5.45799         osd.10         up  1.00000 1.00000
11   hdd   5.45799         osd.11         up  1.00000 1.00000
12   hdd   5.45799         osd.12         up  1.00000 1.00000
24   hdd   5.45799         osd.24         up  1.00000 1.00000
25   hdd   5.45799         osd.25         up  1.00000 1.00000
26   hdd   5.45799         osd.26         up  1.00000 1.00000
27   hdd   5.45799         osd.27         up  1.00000 1.00000
28   hdd   5.45799         osd.28         up  1.00000 1.00000
29   hdd   5.45799         osd.29         up  1.00000 1.00000
-7       109.16016     host strgsrv03
13   hdd   5.45799         osd.13         up  1.00000 1.00000
14   hdd   5.45799         osd.14         up  1.00000 1.00000
15   hdd   5.45799         osd.15         up  1.00000 1.00000
16   hdd   5.45799         osd.16         up  1.00000 1.00000
17   hdd   5.45799         osd.17         up  1.00000 1.00000
18   hdd   5.45799         osd.18         up  1.00000 1.00000
30   hdd   5.45799         osd.30         up  1.00000 1.00000
31   hdd   5.45799         osd.31         up  1.00000 1.00000
32   hdd   5.45799         osd.32         up  1.00000 1.00000
33   hdd   5.45799         osd.33         up  1.00000 1.00000
34   hdd   5.45799         osd.34         up  1.00000 1.00000
35   hdd   5.45799         osd.35         up  1.00000 1.00000

On Tue, May 19, 2020 at 12:16 PM Eugen Block <eblock@xxxxxx> wrote:

> Was that a typo and you mean you changed min_size to 1? I/O paus with
> min_size 1 and size 2 is unexpected, can you share more details like
> your crushmap and your osd tree?
>
>
> Zitat von Amudhan P <amudhan83@xxxxxxxxx>:
>
> > Behaviour is same even after setting min_size 2.
> >
> > On Mon 18 May, 2020, 12:34 PM Eugen Block, <eblock@xxxxxx> wrote:
> >
> >> If your pool has a min_size 2 and size 2 (always a bad idea) it will
> >> pause IO in case of a failure until the recovery has finished. So the
> >> described behaviour is expected.
> >>
> >>
> >> Zitat von Amudhan P <amudhan83@xxxxxxxxx>:
> >>
> >> > Hi,
> >> >
> >> > Crush rule is "replicated" and min_size 2 actually. I am trying to
> test
> >> > multiple volume configs in a single filesystem
> >> > using file layout.
> >> >
> >> > I have created metadata pool with rep 3 (min_size2 and replicated
> crush
> >> > rule) and data pool with rep 3  (min_size2 and replicated crush rule).
> >> and
> >> > also  I have created multiple (replica 2, ec2-1 & ec4-2) pools and
> added
> >> to
> >> > the filesystem.
> >> >
> >> > Using file layout I have set different data pool to a different
> folders.
> >> so
> >> > I can test different configs in the same filesystem. all data pools
> >> > min_size set to handle single node failure.
> >> >
> >> > Single node failure is handled properly when only having metadata pool
> >> and
> >> > one data pool (rep3).
> >> >
> >> > After adding additional data pool to fs, single node failure scenario
> is
> >> > not working.
> >> >
> >> > regards
> >> > Amudhan P
> >> >
> >> > On Sun, May 17, 2020 at 1:29 AM Eugen Block <eblock@xxxxxx> wrote:
> >> >
> >> >> What’s your pool configuration wrt min_size and crush rules?
> >> >>
> >> >>
> >> >> Zitat von Amudhan P <amudhan83@xxxxxxxxx>:
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > I am using ceph Nautilus cluster with below configuration.
> >> >> >
> >> >> > 3 node's (Ubuntu 18.04) each has 12 OSD's, and mds, mon and mgr are
> >> >> running
> >> >> > in shared mode.
> >> >> >
> >> >> > The client mounted through ceph kernel client.
> >> >> >
> >> >> > I was trying to emulate a node failure when a write and read were
> >> going
> >> >> on
> >> >> > (replica2) pool.
> >> >> >
> >> >> > I was expecting read and write continue after a small pause due to
> a
> >> Node
> >> >> > failure but it halts and never resumes until the failed node is up.
> >> >> >
> >> >> > I remember I tested the same scenario before in ceph mimic where it
> >> >> > continued IO after a small pause.
> >> >> >
> >> >> > regards
> >> >> > Amudhan P
> >> >> > _______________________________________________
> >> >> > ceph-users mailing list -- ceph-users@xxxxxxx
> >> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >> >>
> >> >>
> >> >>
> >> >>
> >>
> >>
> >>
> >>
>
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx