Re: Cephfs IO halt on Node failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Was that a typo and you mean you changed min_size to 1? I/O paus with min_size 1 and size 2 is unexpected, can you share more details like your crushmap and your osd tree?


Zitat von Amudhan P <amudhan83@xxxxxxxxx>:

Behaviour is same even after setting min_size 2.

On Mon 18 May, 2020, 12:34 PM Eugen Block, <eblock@xxxxxx> wrote:

If your pool has a min_size 2 and size 2 (always a bad idea) it will
pause IO in case of a failure until the recovery has finished. So the
described behaviour is expected.


Zitat von Amudhan P <amudhan83@xxxxxxxxx>:

> Hi,
>
> Crush rule is "replicated" and min_size 2 actually. I am trying to test
> multiple volume configs in a single filesystem
> using file layout.
>
> I have created metadata pool with rep 3 (min_size2 and replicated crush
> rule) and data pool with rep 3  (min_size2 and replicated crush rule).
and
> also  I have created multiple (replica 2, ec2-1 & ec4-2) pools and added
to
> the filesystem.
>
> Using file layout I have set different data pool to a different folders.
so
> I can test different configs in the same filesystem. all data pools
> min_size set to handle single node failure.
>
> Single node failure is handled properly when only having metadata pool
and
> one data pool (rep3).
>
> After adding additional data pool to fs, single node failure scenario is
> not working.
>
> regards
> Amudhan P
>
> On Sun, May 17, 2020 at 1:29 AM Eugen Block <eblock@xxxxxx> wrote:
>
>> What’s your pool configuration wrt min_size and crush rules?
>>
>>
>> Zitat von Amudhan P <amudhan83@xxxxxxxxx>:
>>
>> > Hi,
>> >
>> > I am using ceph Nautilus cluster with below configuration.
>> >
>> > 3 node's (Ubuntu 18.04) each has 12 OSD's, and mds, mon and mgr are
>> running
>> > in shared mode.
>> >
>> > The client mounted through ceph kernel client.
>> >
>> > I was trying to emulate a node failure when a write and read were
going
>> on
>> > (replica2) pool.
>> >
>> > I was expecting read and write continue after a small pause due to a
Node
>> > failure but it halts and never resumes until the failed node is up.
>> >
>> > I remember I tested the same scenario before in ceph mimic where it
>> > continued IO after a small pause.
>> >
>> > regards
>> > Amudhan P
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>>
>>
>>






_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux