Re: Frozen Client Mounts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Diego,

lets start with the basics and please give us the output of

ceph -s
ceph osd df
ceph osd perf

at best before and after you provike the iowait.

Thank you !

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:info@xxxxxxxxxxxxxxxxx

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 31.03.2016 um 21:38 schrieb Diego Castro:
> Hello, everyone.
> I have a pretty basic ceph setup running on top of Azure Cloud, (4 mons
> and 10 osd's) for rbd images.
> Everything seems to be working as expected until i put some load on it,
> sometimes it doesn't complete the process (mysql restore for ex.) and
> sometimes it does without any issues.
> 
> 
> Client Kernel: 3.10.0-327.10.1.el7.x86_64
> OSD Kernel: 3.10.0-229.7.2.el7.x86_64
> 
> Ceph: ceph-0.94.5-0.el7.x86_64
> 
> On the client side, i have 100%iowait, a lot of "INFO: task blocked for
> more than 120 seconds"
> On the osd side, i have no evidences of faulty disk or read/write
> latency, but i found the following messages:
> 
> 
> 2016-03-28 17:04:03.425249 7f7329fc5700  0 bad crc in data 641367213 !=
> exp 3107019767
> 2016-03-28 17:04:03.440599 7f7329fc5700  0 -- 10.0.3.9:6800/2272
> <http://10.0.3.9:6800/2272> >> 10.0.2.5:0/1998047321
> <http://10.0.2.5:0/1998047321> pipe(0x13cc4800 sd=54 :6800 s=0 pgs=0
> cs=0 l=0 c=0x13883f40).accept peer addr is really 10.0.2.5:0/1998047321
> <http://10.0.2.5:0/1998047321> (socket is 10.0.2.5:34702/0
> <http://10.0.2.5:34702/0>)
> 2016-03-28 17:04:03.487497 7f7333e6a700  0 -- 10.0.3.9:6800/2272
> <http://10.0.3.9:6800/2272> submit_message osd_op_reply(20046
> rb.0.6040.238e1f29.000000000074 [set-alloc-hint object_size 4194304
> write_size 4194304,write 0~524288] v1753'32512 uv32512 ondisk = 0) v6
> remote, 10.0.2.5:0/1998047321 <http://10.0.2.5:0/1998047321>, failed
> lossy con, dropping message 0x12b539c0
> 2016-03-28 17:04:03.532302 7f733666f700  0 -- 10.0.3.9:6800/2272
> <http://10.0.3.9:6800/2272> submit_message osd_op_reply(20047
> rb.0.6040.238e1f29.000000000074 [set-alloc-hint object_size 4194304
> write_size 4194304,write 524288~524288] v1753'32513 uv32513 ondisk = 0)
> v6 remote, 10.0.2.5:0/1998047321 <http://10.0.2.5:0/1998047321>, failed
> lossy con, dropping message 0x1667bc80
> 2016-03-28 17:04:03.535143 7f7333e6a700  0 -- 10.0.3.9:6800/2272
> <http://10.0.3.9:6800/2272> submit_message osd_op_reply(20048
> rb.0.6040.238e1f29.000000000074 [set-alloc-hint object_size 4194304
> write_size 4194304,write 1048576~524288] v1753'32514 uv32514 ondisk = 0)
> v6 remote, 10.0.2.5:0/1998047321 <http://10.0.2.5:0/1998047321>, failed
> lossy con, dropping message 0x12b56e00
> 
> ---
> Diego Castro / The CloudFather
> GetupCloud.com - Eliminamos a Gravidade
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux