Re: Frozen Client Mounts

Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> · Fri, 1 Apr 2016 19:35:15 +0200

Hi Diego,

IF a OSD goes down a hill and IF there are currently read/write requests
on it, THEN you will have again the "pull the plug" event.

Means again, read only mount of filesystem / IO Errors on that specific VM.

---

I also asked already the same question here on the mailing list, but
there was no answer on it.

To me, its also strange, that ceph is a (supposed to be) HA/Fault
tolerant/super_duper storage. So how comes that if a single hdd is going
down, that kind of stuff happens.

I think, that with some specific config magic, this issue can be
handled. But until now, i didnt find it.

---

As far as i see the situation ceph is supposed to keep the "whole" up
and running without loosing data.

The single VM, accessing at the wrong time, the wrong PG on the OSD
which is >at that moment< going down, will suffer and will have IO issues.

In that situation, asuming that you have more than 1 replication
running, this specific VM has to be restarted in worst case.

But thats the point, it CAN restart and will be fine. So you will have a
benefit from that solution.

But of course, since the single OSD holds multiple PG's and they are
serving multiple VM's it can happen that this multiple VM's will have to
be restarted.

But that, again, needs the szenario, that the specific VM, is accessing
in the moment where the OSD/PG goes down right this OSD/PG.

--------------------

So far my understanding of the situation.

Anyone is highly welcome to correct me, if i am wrong :-)

And, please keep on asking questions. It would be really dump not to do so.

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:info@xxxxxxxxxxxxxxxxx

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107

Am 01.04.2016 um 18:45 schrieb Diego Castro:
> Ok, i got it.
> Having a stable network will save the system from a node crash? What
> happens if a osd goes down? Will the clients suffer from freeze mounts
> and things like that?
> Just asking dummy questions to see if i'm the right path, since AFAIK
> ceph is meant to be a high available/fault tolerant  storage system, right?
> 
> 
> ---
> Diego Castro / The CloudFather
> GetupCloud.com - Eliminamos a Gravidade
> 
> 2016-04-01 13:23 GMT-03:00 Oliver Dzombic <info@xxxxxxxxxxxxxxxxx
> <mailto:info@xxxxxxxxxxxxxxxxx>>:
> 
>     Hi Diego,
> 
>     you can see the network connection as your HDD cables.
> 
>     So if you get interruptions there, its like you are pulling out the HDD
>     cables of your server/computer and putting it back.
> 
>     You can just check easily how much your server/computer will like that
>     with your local HDD's ;-)
> 
>     ----
> 
>     And no, ceph will not protect you from this.
> 
>     If the requested data is on a PG / OSD which will receive a network
>     interrupt, you will get IO Errors.
> 
>     The question is what the OS of the VM will do with that. Maybe it will
>     mount the whole HDD read-only.
> 
>     Maybe it will just throw some errors until its good again.
> 
>     Maybe you will have a stale/freeze until its good again.
> 
>     Maybe .....
> 
>     In any case, a stable network connection is the absolute basic
>     requirement for network storage. If your cloud environment cant provide
>     that, you cant provide stable services.
> 
>     --
>     Mit freundlichen Gruessen / Best regards
> 
>     Oliver Dzombic
>     IP-Interactive
> 
>     mailto:info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>
> 
>     Anschrift:
> 
>     IP Interactive UG ( haftungsbeschraenkt )
>     Zum Sonnenberg 1-3
>     63571 Gelnhausen
> 
>     HRB 93402 beim Amtsgericht Hanau
>     Geschäftsführung: Oliver Dzombic
> 
>     Steuer Nr.: 35 236 3622 1
>     UST ID: DE274086107
> 
> 
>     Am 01.04.2016 um 17:31 schrieb Diego Castro:
>     > Hello Oliver, sorry if i wasn't clear at my first post.
>     > I agree with you that a network issue isn't desirable but should it
>     > crash mount clients? I mean, doesn't the client be smart enough to retry
>     > connection or so?
>     > My point is cloud environments (public) doesn't have the same
>     > availability as a local setup, so shouldn't we at least don't freeze the
>     > clients?
>     >
>     >
>     > ---
>     > Diego Castro / The CloudFather
>     > GetupCloud.com - Eliminamos a Gravidade
>     >
>     > 2016-04-01 12:27 GMT-03:00 Oliver Dzombic <info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>
>     > <mailto:info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>>>:
>     >
>     >     Hi Diego,
>     >
>     >     ok so this is a new case scenario.
>     >
>     >     Before you said its "until i put some load on it".
>     >
>     >     Now you say, you can't reproduce it and mention that it happends during
>     >     a (known) network maintenance.
>     >
>     >     So i agree with you, we can assume that your problems were based on
>     >     network issues.
>     >
>     >     Thats also was your logs implies:
>     >
>     >     "failed lossy con, dropping message"
>     >
>     >     --
>     >     Mit freundlichen Gruessen / Best regards
>     >
>     >     Oliver Dzombic
>     >     IP-Interactive
>     >
>     >     mailto:info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>
>     <mailto:info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>>
>     >
>     >     Anschrift:
>     >
>     >     IP Interactive UG ( haftungsbeschraenkt )
>     >     Zum Sonnenberg 1-3
>     >     63571 Gelnhausen
>     >
>     >     HRB 93402 beim Amtsgericht Hanau
>     >     Geschäftsführung: Oliver Dzombic
>     >
>     >     Steuer Nr.: 35 236 3622 1
>     >     UST ID: DE274086107
>     >
>     >
>     >     Am 01.04.2016 um 14:07 schrieb Diego Castro:
>     >     > Hello Oliver, this issue showed very hard to reproduce, i
>     couldn't make
>     >     > it again.
>     >     > My best guess is something with the Azure's network since
>     last week
>     >     > (when happened a lot) there were a ongoing maintenance.
>     >     >
>     >     > Here's  the outputs:
>     >     >
>     >     > $ ceph -s
>     >     >     cluster 25736883-dbf1-4d7a-8796-50e36f9de7a6
>     >     >      health HEALTH_OK
>     >     >      monmap e1: 4 mons at
>     >     >
>     {osmbr0=10.0.3.4:6789/0,osmbr1=10.0.3.6:6789/0,osmbr2=10.0.3.14:6789/0,osmbr3=10.0.3.7:6789/0
>     <http://10.0.3.4:6789/0,osmbr1=10.0.3.6:6789/0,osmbr2=10.0.3.14:6789/0,osmbr3=10.0.3.7:6789/0>
>     >   
>      <http://10.0.3.4:6789/0,osmbr1=10.0.3.6:6789/0,osmbr2=10.0.3.14:6789/0,osmbr3=10.0.3.7:6789/0>
>     >     >
>     >   
>      <http://10.0.3.4:6789/0,osmbr1=10.0.3.6:6789/0,osmbr2=10.0.3.14:6789/0,osmbr3=10.0.3.7:6789/0>}
>     >     >             election epoch 602, quorum 0,1,2,3
>     >     osmbr0,osmbr1,osmbr3,osmbr2
>     >     >      osdmap e1816: 10 osds: 10 up, 10 in
>     >     >       pgmap v3158931: 128 pgs, 1 pools, 11512 MB data, 3522
>     objects
>     >     >             34959 MB used, 10195 GB / 10229 GB avail
>     >     >                  128 active+clean
>     >     >   client io 87723 B/s wr, 8 op/s
>     >     >
>     >     > $ ceph osd df
>     >     > ID WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE VAR
>     >     >  6 1.00000  1.00000  1022G  3224M  1019G 0.31 0.92
>     >     >  1 1.00000  1.00000  1022G  3489M  1019G 0.33 1.00
>     >     >  2 1.00000  1.00000  1022G  3945M  1019G 0.38 1.13
>     >     >  4 1.00000  1.00000  1022G  3304M  1019G 0.32 0.95
>     >     >  7 1.00000  1.00000  1022G  3427M  1019G 0.33 0.98
>     >     >  3 1.00000  1.00000  1022G  4361M  1018G 0.42 1.25
>     >     >  9 1.00000  1.00000  1022G  3650M  1019G 0.35 1.04
>     >     >  0 1.00000  1.00000  1022G  3210M  1019G 0.31 0.92
>     >     >  5 1.00000  1.00000  1022G  3577M  1019G 0.34 1.02
>     >     >  8 1.00000  1.00000  1022G  2765M  1020G 0.26 0.79
>     >     >               TOTAL 10229G 34957M 10195G 0.33
>     >     > MIN/MAX VAR: 0.79/1.25  STDDEV: 0.04
>     >     >
>     >     >
>     >     >
>     >     > $ ceph osd perf
>     >     > osd fs_commit_latency(ms) fs_apply_latency(ms)
>     >     >   0                     1                    2
>     >     >   1                     1                    2
>     >     >   2                     2                    3
>     >     >   3                     2                    3
>     >     >   4                     1                    2
>     >     >   5                     2                    3
>     >     >   6                     1                    2
>     >     >   7                     2                    3
>     >     >   8                     1                    2
>     >     >   9                     1                    1
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >     > ---
>     >     > Diego Castro / The CloudFather
>     >     > GetupCloud.com - Eliminamos a Gravidade
>     >     >
>     >     > 2016-03-31 18:00 GMT-03:00 Oliver Dzombic
>     <info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>
>     >     <mailto:info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>>
>     >     > <mailto:info@xxxxxxxxxxxxxxxxx
>     <mailto:info@xxxxxxxxxxxxxxxxx> <mailto:info@xxxxxxxxxxxxxxxxx
>     <mailto:info@xxxxxxxxxxxxxxxxx>>>>:
>     >     >
>     >     >     Hi Diego,
>     >     >
>     >     >     lets start with the basics and please give us the output of
>     >     >
>     >     >     ceph -s
>     >     >     ceph osd df
>     >     >     ceph osd perf
>     >     >
>     >     >     at best before and after you provike the iowait.
>     >     >
>     >     >     Thank you !
>     >     >
>     >     >     --
>     >     >     Mit freundlichen Gruessen / Best regards
>     >     >
>     >     >     Oliver Dzombic
>     >     >     IP-Interactive
>     >     >
>     >     >     mailto:info@xxxxxxxxxxxxxxxxx
>     <mailto:info@xxxxxxxxxxxxxxxxx> <mailto:info@xxxxxxxxxxxxxxxxx
>     <mailto:info@xxxxxxxxxxxxxxxxx>>
>     >     <mailto:info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>
>     <mailto:info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>>>
>     >     >
>     >     >     Anschrift:
>     >     >
>     >     >     IP Interactive UG ( haftungsbeschraenkt )
>     >     >     Zum Sonnenberg 1-3
>     >     >     63571 Gelnhausen
>     >     >
>     >     >     HRB 93402 beim Amtsgericht Hanau
>     >     >     Geschäftsführung: Oliver Dzombic
>     >     >
>     >     >     Steuer Nr.: 35 236 3622 1
>     >     >     UST ID: DE274086107
>     >     >
>     >     >
>     >     >     Am 31.03.2016 um 21:38 schrieb Diego Castro:
>     >     >     > Hello, everyone.
>     >     >     > I have a pretty basic ceph setup running on top of Azure
>     >     Cloud, (4 mons
>     >     >     > and 10 osd's) for rbd images.
>     >     >     > Everything seems to be working as expected until i put
>     some
>     >     load on it,
>     >     >     > sometimes it doesn't complete the process (mysql
>     restore for
>     >     ex.) and
>     >     >     > sometimes it does without any issues.
>     >     >     >
>     >     >     >
>     >     >     > Client Kernel: 3.10.0-327.10.1.el7.x86_64
>     >     >     > OSD Kernel: 3.10.0-229.7.2.el7.x86_64
>     >     >     >
>     >     >     > Ceph: ceph-0.94.5-0.el7.x86_64
>     >     >     >
>     >     >     > On the client side, i have 100%iowait, a lot of "INFO:
>     task
>     >     blocked for
>     >     >     > more than 120 seconds"
>     >     >     > On the osd side, i have no evidences of faulty disk or
>     >     read/write
>     >     >     > latency, but i found the following messages:
>     >     >     >
>     >     >     >
>     >     >     > 2016-03-28 17:04:03.425249 7f7329fc5700  0 bad crc in data
>     >     641367213 !=
>     >     >     > exp 3107019767
>     >     >     > 2016-03-28 17:04:03.440599 7f7329fc5700  0 --
>     >     10.0.3.9:6800/2272 <http://10.0.3.9:6800/2272>
>     <http://10.0.3.9:6800/2272>
>     >     <http://10.0.3.9:6800/2272>
>     >     >     > <http://10.0.3.9:6800/2272> >> 10.0.2.5:0/1998047321
>     <http://10.0.2.5:0/1998047321>
>     >     <http://10.0.2.5:0/1998047321>
>     >     >     <http://10.0.2.5:0/1998047321>
>     >     >     > <http://10.0.2.5:0/1998047321> pipe(0x13cc4800 sd=54 :6800
>     >     s=0 pgs=0
>     >     >     > cs=0 l=0 c=0x13883f40).accept peer addr is really
>     >     10.0.2.5:0/1998047321 <http://10.0.2.5:0/1998047321>
>     <http://10.0.2.5:0/1998047321>
>     >     <http://10.0.2.5:0/1998047321>
>     >     >     > <http://10.0.2.5:0/1998047321> (socket is
>     10.0.2.5:34702/0 <http://10.0.2.5:34702/0>
>     >     <http://10.0.2.5:34702/0>
>     >     >     <http://10.0.2.5:34702/0>
>     >     >     > <http://10.0.2.5:34702/0>)
>     >     >     > 2016-03-28 17:04:03.487497 7f7333e6a700  0 --
>     >     10.0.3.9:6800/2272 <http://10.0.3.9:6800/2272>
>     <http://10.0.3.9:6800/2272>
>     >     <http://10.0.3.9:6800/2272>
>     >     >     > <http://10.0.3.9:6800/2272> submit_message
>     osd_op_reply(20046
>     >     >     > rb.0.6040.238e1f29.000000000074 [set-alloc-hint
>     object_size
>     >     4194304
>     >     >     > write_size 4194304,write 0~524288] v1753'32512 uv32512
>     >     ondisk = 0) v6
>     >     >     > remote, 10.0.2.5:0/1998047321
>     <http://10.0.2.5:0/1998047321> <http://10.0.2.5:0/1998047321>
>     >     <http://10.0.2.5:0/1998047321>
>     >     >     <http://10.0.2.5:0/1998047321>, failed
>     >     >     > lossy con, dropping message 0x12b539c0
>     >     >     > 2016-03-28 17:04:03.532302 7f733666f700  0 --
>     >     10.0.3.9:6800/2272 <http://10.0.3.9:6800/2272>
>     <http://10.0.3.9:6800/2272>
>     >     <http://10.0.3.9:6800/2272>
>     >     >     > <http://10.0.3.9:6800/2272> submit_message
>     osd_op_reply(20047
>     >     >     > rb.0.6040.238e1f29.000000000074 [set-alloc-hint
>     object_size
>     >     4194304
>     >     >     > write_size 4194304,write 524288~524288] v1753'32513
>     uv32513
>     >     ondisk = 0)
>     >     >     > v6 remote, 10.0.2.5:0/1998047321
>     <http://10.0.2.5:0/1998047321>
>     >     <http://10.0.2.5:0/1998047321> <http://10.0.2.5:0/1998047321>
>     >     >     <http://10.0.2.5:0/1998047321>, failed
>     >     >     > lossy con, dropping message 0x1667bc80
>     >     >     > 2016-03-28 17:04:03.535143 7f7333e6a700  0 --
>     >     10.0.3.9:6800/2272 <http://10.0.3.9:6800/2272>
>     <http://10.0.3.9:6800/2272>
>     >     <http://10.0.3.9:6800/2272>
>     >     >     > <http://10.0.3.9:6800/2272> submit_message
>     osd_op_reply(20048
>     >     >     > rb.0.6040.238e1f29.000000000074 [set-alloc-hint
>     object_size
>     >     4194304
>     >     >     > write_size 4194304,write 1048576~524288] v1753'32514
>     uv32514
>     >     ondisk = 0)
>     >     >     > v6 remote, 10.0.2.5:0/1998047321
>     <http://10.0.2.5:0/1998047321>
>     >     <http://10.0.2.5:0/1998047321> <http://10.0.2.5:0/1998047321>
>     >     >     <http://10.0.2.5:0/1998047321>, failed
>     >     >     > lossy con, dropping message 0x12b56e00
>     >     >     >
>     >     >     > ---
>     >     >     > Diego Castro / The CloudFather
>     >     >     > GetupCloud.com - Eliminamos a Gravidade
>     >     >     >
>     >     >     >
>     >     >     > _______________________________________________
>     >     >     > ceph-users mailing list
>     >     >     > ceph-users@xxxxxxxxxxxxxx
>     <mailto:ceph-users@xxxxxxxxxxxxxx> <mailto:ceph-users@xxxxxxxxxxxxxx
>     <mailto:ceph-users@xxxxxxxxxxxxxx>>
>     >     <mailto:ceph-users@xxxxxxxxxxxxxx
>     <mailto:ceph-users@xxxxxxxxxxxxxx> <mailto:ceph-users@xxxxxxxxxxxxxx
>     <mailto:ceph-users@xxxxxxxxxxxxxx>>>
>     >     >     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     >     >     >
>     >     >     _______________________________________________
>     >     >     ceph-users mailing list
>     >     >     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     <mailto:ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>>
>     >     <mailto:ceph-users@xxxxxxxxxxxxxx
>     <mailto:ceph-users@xxxxxxxxxxxxxx> <mailto:ceph-users@xxxxxxxxxxxxxx
>     <mailto:ceph-users@xxxxxxxxxxxxxx>>>
>     >     >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     >     >
>     >     >
>     >     _______________________________________________
>     >     ceph-users mailing list
>     >     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     <mailto:ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>>
>     >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     >
>     >
>     _______________________________________________
>     ceph-users mailing list
>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com