Re: Cannot write to cephfs if some osd's are not available on the client network

"Marc Roos" <M.Roos@xxxxxxxxxxxxxxxxx> · Fri, 5 Oct 2018 12:12:46 +0200

I guess then this waiting "quietly" should be looked at again, I am 
having load of 10 on this vm.

[@~]# uptime
 11:51:58 up 4 days,  1:35,  1 user,  load average: 10.00, 10.01, 10.05

[@~]# uname -a
Linux smb 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

[@~]# cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)

[@~]# dmesg
[348948.927734] libceph: osd23 192.168.10.114:6810 socket closed (con 
state CONNECTING)
[348957.120090] libceph: osd27 192.168.10.114:6802 socket closed (con 
state CONNECTING)
[349010.370171] libceph: osd26 192.168.10.114:6806 socket closed (con 
state CONNECTING)
[349114.822301] libceph: osd24 192.168.10.114:6804 socket closed (con 
state CONNECTING)
[349141.447330] libceph: osd29 192.168.10.114:6812 socket closed (con 
state CONNECTING)
[349278.668658] libceph: osd25 192.168.10.114:6800 socket closed (con 
state CONNECTING)
[349440.467038] libceph: osd28 192.168.10.114:6808 socket closed (con 
state CONNECTING)
[349465.043957] libceph: osd23 192.168.10.114:6810 socket closed (con 
state CONNECTING)
[349473.236400] libceph: osd27 192.168.10.114:6802 socket closed (con 
state CONNECTING)
[349526.486408] libceph: osd26 192.168.10.114:6806 socket closed (con 
state CONNECTING)
[349630.938498] libceph: osd24 192.168.10.114:6804 socket closed (con 
state CONNECTING)
[349657.563561] libceph: osd29 192.168.10.114:6812 socket closed (con 
state CONNECTING)
[349794.784936] libceph: osd25 192.168.10.114:6800 socket closed (con 
state CONNECTING)
[349956.583300] libceph: osd28 192.168.10.114:6808 socket closed (con 
state CONNECTING)
[349981.160225] libceph: osd23 192.168.10.114:6810 socket closed (con 
state CONNECTING)
[349989.352510] libceph: osd27 192.168.10.114:6802 socket closed (con 
state CONNECTING)
..
..
..

-----Original Message-----
From: John Spray [mailto:jspray@xxxxxxxxxx] 
Sent: donderdag 27 september 2018 11:43
To: Marc Roos
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Cannot write to cephfs if some osd's are not 
available on the client network

On Thu, Sep 27, 2018 at 10:16 AM Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> 
wrote:
>
>
> I have a test cluster and on a osd node I put a vm. The vm is using a 
> macvtap on the client network interface of the osd node. Making access 

> to local osd's impossible.
>
> the vm of course reports that it cannot access the local osd's. What I 

> am getting is:
>
> - I cannot reboot this vm normally, need to reset it.

When linux tries to shut down cleanly, part of that is flushing buffers 
from any mounted filesystem back to disk.  If you have a network 
filesystem mounted, and the network is unavailable, that can cause the 
process to block.  You can try forcibly unmounting before rebooting.

> - vm is reporting very high load.

The CPU load part is surprising -- in general Ceph clients should wait 
quietly when blocked, rather than spinning.

> I guess this should not be happening not? Because it should choose an 
> other available osd of the 3x replicated pool and just write the data 
> to that one?

No -- writes always go through the primary OSD for the PG being written 
to.  If an OSD goes down, then another OSD will become the primary.  In 
your case, the primary OSD is not going down, it's just being cut off 
from the client by the network, so the writes are blocking indefinitely.

John

>
>
>
>
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com