On 2015-11-02 10:19 pm, gjprabu wrote:
Hi Taylor,
I have checked DNS name and all host resolve to the correct IP. MTU
size is 1500 in switch level configuration done. There is no
firewall/
selinux is running currently.
Also we would like to know below query's which already in the thread.
Regards
Prabu
---- On Tue, 03 Nov 2015 11:20:07 +0530 CHRIS TAYLOR
<CTAYLOR@xxxxxxxxxx> wrote ----
I would double check the network configuration on the new node.
Including hosts files and DNS names. Do all the host names resolve to
the correct IP addresses from all hosts?
"... 192.168.112.231:6800/49908 >> 192.168.113.42:0/599324131 ..."
Looks like the communication between subnets is a problem. Is
xxx.xxx.113.xxx a typo? If that's correct, check MTU sizes. Are they
configured correctly on the switch and all NICs?
Is there any iptables/firewall rules that could be blocking traffic
between hosts?
Hope that helps,
Chris
On 2015-11-02 9:18 pm, gjprabu wrote:
Hi,
Anybody please help me on this issue.
Regards
Prabu
---- On Mon, 02 Nov 2015 17:54:27 +0530 GJPRABU
<GJPRABU@xxxxxxxxxxxx>
wrote ----
Hi Team,
We have ceph setup with 2 OSD and replica 2 and it is mounted with
ocfs2 clients and its working. When we added new osd all the clients
rbd mapped device disconnected and got hanged by running rbd ls or
rbd
map command. We waited for long hours to scale the new osd size but
peering not completed event data sync finished, but client side issue
was persist and thought to try old osd service stop/start, after some
time rbd mapped automatically using existing map script.
After service stop/start in old osd again 3rd OSD rebuild and back
filling started and after some time clients rbd mapped device
disconnected and got hanged by running rbd ls or rbd map command. We
thought to wait till to finished data sync in 3'rd OSD and its
completed, even though client side rbd not mapped. After we restarted
all mon and osd service and client side issue got fixed and mounted
rbd. We suspected some issue in our setup. also attached logs for
your
reference.
What does 'ceph -s' look like? is the cluster HEALTH_OK?
Something we are missing in our setup i don't know, highly
appreciated
if anybody help us to solve this issue.
Before new osd.2 addition :
osd.0 - size : 13T and used 2.7 T
osd.1 - size : 13T and used 2.7 T
After new osd addition :
osd.0 size : 13T and used 1.8T
osd.1 size : 13T and used 2.1T
osd.2 size : 15T and used 2.5T
rbd ls
repo / integrepository (pg_num: 126)
rbd / integdownloads (pg_num: 64)
Also we would like to know few clarifications .
If any new osd will be added whether all client will be unmounted
automatically .
Clients do not need to unmount images when OSDs are added.
While add new osd can we access ( read / write ) from client machines
?
Clients still have read/write access to RBD images in the cluster
while
adding OSDs and during recovery.
How much data will be added in new osd - without change any repilca /
pg_num ?
The data will re-balance between OSDs automatically. I found having
more
PGs help distribute the load more evenly.
How long to take finish this process ?
Depends greatly on the hardware and configuration. Whether Journals on
SSD or spinning disks, network connectivity, max_backfills, etc.
If we missed any common configuration - please share the same .
I don't see any configuration for public and cluster networks. If you
are sharing the same network for clients and object
replication/recovery
the cluster re-balancing data between OSDs could cause problems with
the
client traffic.
Take a look at:
http://docs.ceph.com/docs/master/rados/configuration/network-config-ref/
[1]
ceph.conf
[global]
fsid = 944fa0af-b7be-45a9-93ff-b9907cfaee3f
mon_initial_members = integ-hm5, integ-hm6, integ-hm7
mon_host = 192.168.112.192,192.168.112.193,192.168.112.194
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2
[mon]
mon_clock_drift_allowed = .500
[client]
rbd_cache = false
Current Logs from new osd also attached old logs.
2015-11-02 12:47:48.481641 7f386f691700 0 bad crc in data 3889133030
!=
exp 2857248268
2015-11-02 12:47:48.482230 7f386f691700 0 --
192.168.112.231:6800/49908
192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0
cs=0 l=0 c=0xc510580).accept peer addr is really
192.168.113.42:0/599324131 (socket is 192.168.113.42:42530/0)
2015-11-02 12:47:48.483951 7f386f691700 0 bad crc in data 3192803598
!=
exp 1083014631
2015-11-02 12:47:48.484512 7f386f691700 0 --
192.168.112.231:6800/49908
192.168.113.42:0/599324131 pipe(0x170ea000 sd=28 :6800 s=0 pgs=0
cs=0 l=0 c=0xc516f60).accept peer addr is really
192.168.113.42:0/599324131 (socket is 192.168.113.42:42531/0)
2015-11-02 12:47:48.486284 7f386f691700 0 bad crc in data 133120597
!=
exp 393328400
2015-11-02 12:47:48.486777 7f386f691700 0 --
192.168.112.231:6800/49908
192.168.113.42:0/599324131 pipe(0x16a18000 sd=28 :6800 s=0 pgs=0
cs=0 l=0 c=0xc514620).accept peer addr is really
192.168.113.42:0/599324131 (socket is 192.168.113.42:42532/0)
2015-11-02 12:47:48.488624 7f386f691700 0 bad crc in data 3299720069
!=
exp 211350069
2015-11-02 12:47:48.489100 7f386f691700 0 --
192.168.112.231:6800/49908
192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0
cs=0 l=0 c=0xc513860).accept peer addr is really
192.168.113.42:0/599324131 (socket is 192.168.113.42:42533/0)
2015-11-02 12:47:48.490911 7f386f691700 0 bad crc in data 2381447347
!=
exp 1177846878
2015-11-02 12:47:48.491390 7f386f691700 0 --
192.168.112.231:6800/49908
192.168.113.42:0/599324131 pipe(0x170ea000 sd=28 :6800 s=0 pgs=0
cs=0 l=0 c=0xc513700).accept peer addr is really
192.168.113.42:0/599324131 (socket is 192.168.113.42:42534/0)
2015-11-02 12:47:48.493167 7f386f691700 0 bad crc in data 2093712440
!=
exp 2175112954
2015-11-02 12:47:48.493682 7f386f691700 0 --
192.168.112.231:6800/49908
192.168.113.42:0/599324131 pipe(0x16a18000 sd=28 :6800 s=0 pgs=0
cs=0 l=0 c=0xc514200).accept peer addr is really
192.168.113.42:0/599324131 (socket is 192.168.113.42:42535/0)
2015-11-02 12:47:48.495150 7f386f691700 0 bad crc in data 3047197039
!=
exp 38098198
2015-11-02 12:47:48.495679 7f386f691700 0 --
192.168.112.231:6800/49908
192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0
cs=0 l=0 c=0xc510b00).accept peer addr is really
192.168.113.42:0/599324131 (socket is 192.168.113.42:42536/0)
2015-11-02 12:47:48.497259 7f386f691700 0 bad crc in data 1400444622
!=
exp 2648291990
2015-11-02 12:47:48.497756 7f386f691700 0 --
192.168.112.231:6800/49908
192.168.113.42:0/599324131 pipe(0x170ea000 sd=28 :6800 s=0 pgs=0
cs=0 l=0 c=0x17f7b700).accept peer addr is really
192.168.113.42:0/599324131 (socket is 192.168.113.42:42537/0)
2015-11-02 13:02:00.439025 7f386f691700 0 bad crc in data 4159064831
!=
exp 903679865
2015-11-02 13:02:00.441337 7f386f691700 0 --
192.168.112.231:6800/49908
192.168.113.42:0/599324131 pipe(0x16a18000 sd=28 :6800 s=0 pgs=0
cs=0 l=0 c=0x17f7e5c0).accept peer addr is really
192.168.113.42:0/599324131 (socket is 192.168.113.42:43128/0)
2015-11-02 13:02:00.442756 7f386f691700 0 bad crc in data 1134831440
!=
exp 892008036
2015-11-02 13:02:00.443369 7f386f691700 0 --
192.168.112.231:6800/49908
192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0
cs=0 l=0 c=0x17f7ee00).accept peer addr is really
192.168.113.42:0/599324131 (socket is 192.168.113.42:43129/0)
2015-11-02 13:08:43.272527 7f387049f700 0 --
192.168.112.231:6800/49908
192.168.112.115:0/4256128918 pipe(0x170ea000 sd=33 :6800 s=0 pgs=0
cs=0 l=0 c=0x17f7e1a0).accept peer addr is really
192.168.112.115:0/4256128918 (socket is 192.168.112.115:51660/0)
Regards
Prabu
Regards
G.J
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2] [1]
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2] [1]
Links:
------
[1] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]