Re: ceph same rbd on multiple client

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Henrik,

Thanks for your reply, Still we are facing same issue. we found this dmesg logs and this is known logs because our self made down node1 and made up,  this is showing in logs and other then we didn't found error message. Even we do have problem while unmounting. umount process goes to "D" stat and  fsck through fsck.ocfs2: I/O error. If required to run any other command pls let me know. 

ocfs2 version
debugfs.ocfs2 1.8.0

# cat /etc/sysconfig/o2cb
#
# This is a configuration file for automatic startup of the O2CB
# driver.  It is generated by running /etc/init.d/o2cb configure.
# On Debian based systems the preferred method is running
# 'dpkg-reconfigure ocfs2-tools'.
#

# O2CB_STACK: The name of the cluster stack backing O2CB.
O2CB_STACK=o2cb

# O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
O2CB_BOOTCLUSTER=ocfs2

# O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
O2CB_HEARTBEAT_THRESHOLD=31

# O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered dead.
O2CB_IDLE_TIMEOUT_MS=30000

# O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is sent
O2CB_KEEPALIVE_DELAY_MS=2000

# O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts
O2CB_RECONNECT_DELAY_MS=2000

# fsck.ocfs2 -fy /home/build/downloads/
fsck.ocfs2 1.8.0
fsck.ocfs2: I/O error on channel while opening "/zoho/build/downloads/"

dmesg logs

[ 4229.886284] o2dlm: Joining domain A895BC216BE641A8A7E20AA89D57E051 ( 5 ) 1 nodes
[ 4251.437451] o2dlm: Node 3 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 3 5 ) 2 nodes
[ 4267.836392] o2dlm: Node 1 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 1 3 5 ) 3 nodes
[ 4292.755589] o2dlm: Node 2 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 5 ) 4 nodes
[ 4306.262165] o2dlm: Node 4 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 5 ) 5 nodes
[316476.505401] (kworker/u192:0,95923,0):dlm_do_assert_master:1717 ERROR: Error -112 when sending message 502 (key 0xc3460ae7) to node 1
[316476.505470] o2cb: o2dlm has evicted node 1 from domain A895BC216BE641A8A7E20AA89D57E051
[316480.437231] o2dlm: Begin recovery on domain A895BC216BE641A8A7E20AA89D57E051 for node 1
[316480.442389] o2cb: o2dlm has evicted node 1 from domain A895BC216BE641A8A7E20AA89D57E051
[316480.442412] (kworker/u192:0,95923,20):dlm_begin_reco_handler:2765 A895BC216BE641A8A7E20AA89D57E051: dead_node previously set to 1, node 3 changing it to 1
[316480.541237] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316480.541241] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
[316485.542733] o2dlm: Begin recovery on domain A895BC216BE641A8A7E20AA89D57E051 for node 1
[316485.542740] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316485.542742] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
[316490.544535] o2dlm: Begin recovery on domain A895BC216BE641A8A7E20AA89D57E051 for node 1
[316490.544538] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316490.544539] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
[316495.546356] o2dlm: Begin recovery on domain A895BC216BE641A8A7E20AA89D57E051 for node 1
[316495.546362] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316495.546364] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
[316500.548135] o2dlm: Begin recovery on domain A895BC216BE641A8A7E20AA89D57E051 for node 1
[316500.548139] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316500.548140] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
[316505.549947] o2dlm: Begin recovery on domain A895BC216BE641A8A7E20AA89D57E051 for node 1
[316505.549951] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316505.549952] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
[316510.551734] o2dlm: Begin recovery on domain A895BC216BE641A8A7E20AA89D57E051 for node 1
[316510.551739] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316510.551740] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
[316515.553543] o2dlm: Begin recovery on domain A895BC216BE641A8A7E20AA89D57E051 for node 1
[316515.553547] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316515.553548] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
[316520.555337] o2dlm: Begin recovery on domain A895BC216BE641A8A7E20AA89D57E051 for node 1
[316520.555341] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316520.555343] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
[316525.557131] o2dlm: Begin recovery on domain A895BC216BE641A8A7E20AA89D57E051 for node 1
[316525.557136] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316525.557153] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
[316530.558952] o2dlm: Begin recovery on domain A895BC216BE641A8A7E20AA89D57E051 for node 1
[316530.558955] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316530.558957] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
[316535.560781] o2dlm: Begin recovery on domain A895BC216BE641A8A7E20AA89D57E051 for node 1
[316535.560789] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316535.560792] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
[319419.525609] o2dlm: Node 1 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 5 ) 5 nodes



ps -auxxxxx | grep umount
root     32083 21.8  0.0 125620  2828 pts/14   D+   19:37   0:18 umount /home/build/repository
root     32196  0.0  0.0 112652  2264 pts/8    S+   19:38   0:00 grep --color=auto umount


cat /proc/32083/stack 
[<ffffffff8132ad7d>] o2net_send_message_vec+0x71d/0xb00
[<ffffffff81352148>] dlm_send_remote_unlock_request.isra.2+0x128/0x410
[<ffffffff813527db>] dlmunlock_common+0x3ab/0x9e0
[<ffffffff81353088>] dlmunlock+0x278/0x800
[<ffffffff8131f765>] o2cb_dlm_unlock+0x35/0x50
[<ffffffff8131ecfe>] ocfs2_dlm_unlock+0x1e/0x30
[<ffffffff812a8776>] ocfs2_drop_lock.isra.29.part.30+0x1f6/0x700
[<ffffffff812ae40d>] ocfs2_simple_drop_lockres+0x2d/0x40
[<ffffffff8129b43c>] ocfs2_dentry_lock_put+0x5c/0x80
[<ffffffff8129b4a2>] ocfs2_dentry_iput+0x42/0x1d0
[<ffffffff81204dc2>] __dentry_kill+0x102/0x1f0
[<ffffffff81205294>] shrink_dentry_list+0xe4/0x2a0
[<ffffffff81205aa8>] shrink_dcache_parent+0x38/0x90
[<ffffffff81205b16>] do_one_tree+0x16/0x50
[<ffffffff81206e9f>] shrink_dcache_for_umount+0x2f/0x90
[<ffffffff811efb15>] generic_shutdown_super+0x25/0x100
[<ffffffff811eff57>] kill_block_super+0x27/0x70
[<ffffffff811f02a9>] deactivate_locked_super+0x49/0x60
[<ffffffff811f089e>] deactivate_super+0x4e/0x70
[<ffffffff8120da83>] cleanup_mnt+0x43/0x90
[<ffffffff8120db22>] __cleanup_mnt+0x12/0x20
[<ffffffff81093ba4>] task_work_run+0xc4/0xe0
[<ffffffff81013c67>] do_notify_resume+0x97/0xb0
[<ffffffff817d2ee7>] int_signal+0x12/0x17
[<ffffffffffffffff>] 0xffffffffffffffff


Regards
G.J




---- On Fri, 23 Oct 2015 13:41:19 +0530 Henrik Korkuc <lists@xxxxxxxxx> wrote ----

can you paste dmesg and system logs? I am using 3 node OCFS2 with RBD and had no problems.

On 15-10-23 08:40, gjprabu wrote:


_______________________________________________
ceph-users mailing list
Hi Frederic,

           Can you give me some solution, we are spending more time to solve this issue.

Regards
Prabu




---- On Thu, 15 Oct 2015 17:14:13 +0530 Tyler Bishop <tyler.bishop@xxxxxxxxxxxxxxxxx> wrote ----

I don't know enough on ocfs to help.  Sounds like you have unconccurent writes though

Sent from TypeMail
On Oct 15, 2015, at 1:53 AM, gjprabu <gjprabu@xxxxxxxxxxxx> wrote:
Hi Tyler,

   Can please send me the next setup action to be taken on this issue.

Regards
Prabu


---- On Wed, 14 Oct 2015 13:43:29 +0530 gjprabu <gjprabu@xxxxxxxxxxxx> wrote ----

Hi Tyler,

         Thanks for your reply. We have disabled rbd_cache but still issue is persist. Please find our configuration file.

# cat /etc/ceph/ceph.conf
[global]
fsid = 944fa0af-b7be-45a9-93ff-b9907cfaee3f
mon_initial_members = integ-hm5, integ-hm6, integ-hm7
mon_host = 192.168.112.192,192.168.112.193,192.168.112.194
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2

[mon]
mon_clock_drift_allowed = .500

[client]
rbd_cache = false

--------------------------------------------------------------------------------------

 cluster 944fa0af-b7be-45a9-93ff-b9907cfaee3f
     health HEALTH_OK
     monmap e2: 3 mons at {integ-hm5=192.168.112.192:6789/0,integ-hm6=192.168.112.193:6789/0,integ-hm7=192.168.112.194:6789/0}
            election epoch 480, quorum 0,1,2 integ-hm5,integ-hm6,integ-hm7
     osdmap e49780: 2 osds: 2 up, 2 in
      pgmap v2256565: 190 pgs, 2 pools, 1364 GB data, 410 kobjects
            2559 GB used, 21106 GB / 24921 GB avail
                 190 active+clean
  client io 373 kB/s rd, 13910 B/s wr, 103 op/s


Regards
Prabu

---- On Tue, 13 Oct 2015 19:59:38 +0530 Tyler Bishop <tyler.bishop@xxxxxxxxxxxxxxxxx> wrote ----

You need to disable RBD caching.



Tyler Bishop
Chief Technical Officer
513-299-7108 x10

Tyler.Bishop@xxxxxxxxxxxxxxxxx

If you are not the intended recipient of this transmission you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.





From: "gjprabu" <gjprabu@xxxxxxxxxxxx>
To: "Frédéric Nass" <frederic.nass@xxxxxxxxxxxxxxxx>
Cc: "<ceph-users@xxxxxxxxxxxxxx>" <ceph-users@xxxxxxxxxxxxxx>, "Siva Sokkumuthu" <sivakumar@xxxxxxxxxxxx>, "Kamal Kannan Subramani(kamalakannan)" <kamal@xxxxxxxxxxxxxxxx>
Sent: Tuesday, October 13, 2015 9:11:30 AM
Subject: Re: ceph same rbd on multiple client

Hi ,

 We have CEPH  RBD with OCFS2 mounted servers. we are facing i/o errors simultaneously while move the folder using one nodes in the same disk other nodes data replicating with below said error (Copying is not having any problem). Workaround if we remount the partition this issue get resolved but after sometime problem again reoccurred. please help on this issue.

Note : We have total 5 Nodes, here two nodes working fine other nodes are showing like below input/output error on moved data's.

ls -althr 
ls: cannot access LITE_3_0_M4_1_TEST: Input/output error 
ls: cannot access LITE_3_0_M4_1_OLD: Input/output error 
total 0 
d????????? ? ? ? ? ? LITE_3_0_M4_1_TEST 
d????????? ? ? ? ? ? LITE_3_0_M4_1_OLD 

Regards
Prabu


---- On Fri, 22 May 2015 17:33:04 +0530 Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> wrote ----

Hi,

Waiting for CephFS, you can use clustered filesystem like OCFS2 or GFS2 on top of RBD mappings so that each host can access the same device and clustered filesystem.

Regards,

Frédéric.

Le 21/05/2015 16:10, gjprabu a écrit :


-- 
Frédéric Nass

Sous direction des Infrastructures,
Direction du Numérique,
Université de Lorraine.

Tél : 03.83.68.53.83
_______________________________________________
ceph-users mailing list
Hi All,

        We are using rbd and map the same rbd image to the rbd device on two different client but i can't see the data until i umount and mount -a partition. Kindly share the solution for this issue.

Example
create rbd image named foo
map foo to /dev/rbd0 on server A,   mount /dev/rbd0 to /mnt
map foo to /dev/rbd0 on server B,   mount /dev/rbd0 to /mnt

Regards
Prabu



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list





_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux