Re: How many nodes/OSD can fail

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Tu,

yes that's correct. The mon nodes run as well on the OSD nodes. So I have

3 nodes in total. OSD, MDS and Mon on each Node.

Regards - Willi

Am 03.07.16 um 09:56 schrieb Tu Holmes:

Where are your mon nodes?

Were you mixing mon and OSD together?

Are 2 of the mon nodes down as well?

On Jul 3, 2016 12:53 AM, "Willi Fehler" <willi.fehler@xxxxxxxxxxx> wrote:
Hello Sean,

I've powered down 2 nodes. So 6 of 9 OSD are down. But my client can't write and read anymore from my Ceph mount. Also 'ceph -s' hangs.

pool 1 'cephfs_data' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 300 pgp_num 300 last_change 447 flags hashpspool crash_replay_interval 45 stripe_width 0
pool 2 'cephfs_metadata' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 300 pgp_num 300 last_change 445 flags hashpspool stripe_width 0

2016-07-03 09:49:40.695953 7f3da56f9700  0 -- 192.168.0.5:0/2773396901 >> 192.168.0.7:6789/0 pipe(0x7f3da0001f50 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3da0000f20).fault
2016-07-03 09:49:44.195029 7f3da57fa700  0 -- 192.168.0.5:0/2773396901 >> 192.168.0.6:6789/0 pipe(0x7f3da0005500 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3da00067c0).fault
2016-07-03 09:49:50.205788 7f3da55f8700  0 -- 192.168.0.5:0/2773396901 >> 192.168.0.6:6789/0 pipe(0x7f3da0005500 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3da0004c40).fault
2016-07-03 09:49:52.720116 7f3da57fa700  0 -- 192.168.0.5:0/2773396901 >> 192.168.0.7:6789/0 pipe(0x7f3da00023f0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3da00036b0).fault

Regards - Willi

Am 03.07.16 um 09:36 schrieb Sean Redmond:

It would need to be set to 1

On 3 Jul 2016 8:17 a.m., "Willi Fehler" <willi.fehler@xxxxxxxxxxx> wrote:
Hello David,

so in a 3 node Cluster how should I set min_size if I want that 2 nodes could fail?

Regards - Willi

Am 28.06.16 um 13:07 schrieb David:
Hi,

This is probably the min_size on your cephfs data and/or metadata pool. I believe the default is 2, if you have less than 2 replicas available I/O will stop. See: http://docs.ceph.com/docs/master/rados/operations/pools/#set-the-number-of-object-replicas

On Tue, Jun 28, 2016 at 10:23 AM, willi.fehler@xxxxxxxxxxx <willi.fehler@xxxxxxxxxxx> wrote:

Hello,

I'm still very new to Ceph. I've created a small test Cluster.

 

ceph-node1

osd0

osd1

osd2

ceph-node2

osd3

osd4

osd5

ceph-node3

osd6

osd7

osd8

 

My pool for CephFS has a replication count of 3. I've powered of 2 nodes(6 OSDs went down) and my cluster status became critical and my ceph clients(cephfs) run into a timeout. My data(I had only one file on my pool) was still on one of the active OSDs. Is this the expected behaviour that the Cluster status became critical and my Clients run into a timeout?

 

Many thanks for your feedback.

 

Regards - Willi

 



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux