Re: Frozen Client Mounts

Diego Castro <diego.castro@xxxxxxxxxxxxxx> · Fri, 1 Apr 2016 12:31:46 -0300

Hello Oliver, sorry if i wasn't clear at my first post.I agree with you that a network issue isn't desirable but should it crash mount clients? I mean, doesn't the client be smart enough to retry connection or so? 
My point is cloud environments (public) doesn't have the same availability as a local setup, so shouldn't we at least don't freeze the clients?

---
Diego Castro / The CloudFather
GetupCloud.com - Eliminamos a Gravidade

2016-04-01 12:27 GMT-03:00 Oliver Dzombic <info@xxxxxxxxxxxxxxxxx>:
Hi Diego,

ok so this is a new case scenario.

Before you said its "until i put some load on it".

Now you say, you can't reproduce it and mention that it happends during

a (known) network maintenance.

So i agree with you, we can assume that your problems were based on

network issues.

Thats also was your logs implies:

"failed lossy con, dropping message"

--

Mit freundlichen Gruessen / Best regards

Oliver Dzombic

IP-Interactive

mailto:info@xxxxxxxxxxxxxxxxx

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )

Zum Sonnenberg 1-3

63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau

Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1

UST ID: DE274086107

Am 01.04.2016 um 14:07 schrieb Diego Castro:

> Hello Oliver, this issue showed very hard to reproduce, i couldn't make

> it again.

> My best guess is something with the Azure's network since last week

> (when happened a lot) there were a ongoing maintenance.

>

> Here's  the outputs:

>

> $ ceph -s

>     cluster 25736883-dbf1-4d7a-8796-50e36f9de7a6

>      health HEALTH_OK

>      monmap e1: 4 mons at

> {osmbr0=10.0.3.4:6789/0,osmbr1=10.0.3.6:6789/0,osmbr2=10.0.3.14:6789/0,osmbr3=10.0.3.7:6789/0

> <http://10.0.3.4:6789/0,osmbr1=10.0.3.6:6789/0,osmbr2=10.0.3.14:6789/0,osmbr3=10.0.3.7:6789/0>}

>             election epoch 602, quorum 0,1,2,3 osmbr0,osmbr1,osmbr3,osmbr2

>      osdmap e1816: 10 osds: 10 up, 10 in

>       pgmap v3158931: 128 pgs, 1 pools, 11512 MB data, 3522 objects

>             34959 MB used, 10195 GB / 10229 GB avail

>                  128 active+clean

>   client io 87723 B/s wr, 8 op/s

>

> $ ceph osd df

> ID WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE VAR

>  6 1.00000  1.00000  1022G  3224M  1019G 0.31 0.92

>  1 1.00000  1.00000  1022G  3489M  1019G 0.33 1.00

>  2 1.00000  1.00000  1022G  3945M  1019G 0.38 1.13

>  4 1.00000  1.00000  1022G  3304M  1019G 0.32 0.95

>  7 1.00000  1.00000  1022G  3427M  1019G 0.33 0.98

>  3 1.00000  1.00000  1022G  4361M  1018G 0.42 1.25

>  9 1.00000  1.00000  1022G  3650M  1019G 0.35 1.04

>  0 1.00000  1.00000  1022G  3210M  1019G 0.31 0.92

>  5 1.00000  1.00000  1022G  3577M  1019G 0.34 1.02

>  8 1.00000  1.00000  1022G  2765M  1020G 0.26 0.79

>               TOTAL 10229G 34957M 10195G 0.33

> MIN/MAX VAR: 0.79/1.25  STDDEV: 0.04

>

>

>

> $ ceph osd perf

> osd fs_commit_latency(ms) fs_apply_latency(ms)

>   0                     1                    2

>   1                     1                    2

>   2                     2                    3

>   3                     2                    3

>   4                     1                    2

>   5                     2                    3

>   6                     1                    2

>   7                     2                    3

>   8                     1                    2

>   9                     1                    1

>

>

>

>

>

>

>

> ---

> Diego Castro / The CloudFather

> GetupCloud.com - Eliminamos a Gravidade

>

> 2016-03-31 18:00 GMT-03:00 Oliver Dzombic <info@xxxxxxxxxxxxxxxxx

> <mailto:info@xxxxxxxxxxxxxxxxx>>:

>

>     Hi Diego,

>

>     lets start with the basics and please give us the output of

>

>     ceph -s

>     ceph osd df

>     ceph osd perf

>

>     at best before and after you provike the iowait.

>

>     Thank you !

>

>     --

>     Mit freundlichen Gruessen / Best regards

>

>     Oliver Dzombic

>     IP-Interactive

>

>     mailto:info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>

>

>     Anschrift:

>

>     IP Interactive UG ( haftungsbeschraenkt )

>     Zum Sonnenberg 1-3

>     63571 Gelnhausen

>

>     HRB 93402 beim Amtsgericht Hanau

>     Geschäftsführung: Oliver Dzombic

>

>     Steuer Nr.: 35 236 3622 1

>     UST ID: DE274086107

>

>

>     Am 31.03.2016 um 21:38 schrieb Diego Castro:

>     > Hello, everyone.

>     > I have a pretty basic ceph setup running on top of Azure Cloud, (4 mons

>     > and 10 osd's) for rbd images.

>     > Everything seems to be working as expected until i put some load on it,

>     > sometimes it doesn't complete the process (mysql restore for ex.) and

>     > sometimes it does without any issues.

>     >

>     >

>     > Client Kernel: 3.10.0-327.10.1.el7.x86_64

>     > OSD Kernel: 3.10.0-229.7.2.el7.x86_64

>     >

>     > Ceph: ceph-0.94.5-0.el7.x86_64

>     >

>     > On the client side, i have 100%iowait, a lot of "INFO: task blocked for

>     > more than 120 seconds"

>     > On the osd side, i have no evidences of faulty disk or read/write

>     > latency, but i found the following messages:

>     >

>     >

>     > 2016-03-28 17:04:03.425249 7f7329fc5700  0 bad crc in data 641367213 !=

>     > exp 3107019767

>     > 2016-03-28 17:04:03.440599 7f7329fc5700  0 -- 10.0.3.9:6800/2272 <http://10.0.3.9:6800/2272>

>     > <http://10.0.3.9:6800/2272> >> 10.0.2.5:0/1998047321

>     <http://10.0.2.5:0/1998047321>

>     > <http://10.0.2.5:0/1998047321> pipe(0x13cc4800 sd=54 :6800 s=0 pgs=0

>     > cs=0 l=0 c=0x13883f40).accept peer addr is really 10.0.2.5:0/1998047321 <http://10.0.2.5:0/1998047321>

>     > <http://10.0.2.5:0/1998047321> (socket is 10.0.2.5:34702/0

>     <http://10.0.2.5:34702/0>

>     > <http://10.0.2.5:34702/0>)

>     > 2016-03-28 17:04:03.487497 7f7333e6a700  0 -- 10.0.3.9:6800/2272 <http://10.0.3.9:6800/2272>

>     > <http://10.0.3.9:6800/2272> submit_message osd_op_reply(20046

>     > rb.0.6040.238e1f29.000000000074 [set-alloc-hint object_size 4194304

>     > write_size 4194304,write 0~524288] v1753'32512 uv32512 _ondisk_ = 0) v6

>     > remote, 10.0.2.5:0/1998047321 <http://10.0.2.5:0/1998047321>

>     <http://10.0.2.5:0/1998047321>, failed

>     > lossy con, dropping message 0x12b539c0

>     > 2016-03-28 17:04:03.532302 7f733666f700  0 -- 10.0.3.9:6800/2272 <http://10.0.3.9:6800/2272>

>     > <http://10.0.3.9:6800/2272> submit_message osd_op_reply(20047

>     > rb.0.6040.238e1f29.000000000074 [set-alloc-hint object_size 4194304

>     > write_size 4194304,write 524288~524288] v1753'32513 uv32513 _ondisk_ = 0)

>     > v6 remote, 10.0.2.5:0/1998047321 <http://10.0.2.5:0/1998047321>

>     <http://10.0.2.5:0/1998047321>, failed

>     > lossy con, dropping message 0x1667bc80

>     > 2016-03-28 17:04:03.535143 7f7333e6a700  0 -- 10.0.3.9:6800/2272 <http://10.0.3.9:6800/2272>

>     > <http://10.0.3.9:6800/2272> submit_message osd_op_reply(20048

>     > rb.0.6040.238e1f29.000000000074 [set-alloc-hint object_size 4194304

>     > write_size 4194304,write 1048576~524288] v1753'32514 uv32514 _ondisk_ = 0)

>     > v6 remote, 10.0.2.5:0/1998047321 <http://10.0.2.5:0/1998047321>

>     <http://10.0.2.5:0/1998047321>, failed

>     > lossy con, dropping message 0x12b56e00

>     >

>     > ---

>     > Diego Castro / The CloudFather

>     > GetupCloud.com - Eliminamos a Gravidade

>     >

>     >

>     > _______________________________________________

>     > ceph-users mailing list

>     > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>

>     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>     >

>     _______________________________________________

>     ceph-users mailing list

>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>

>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

>

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com