Sorry Prabu, I forgot to mention the bold settings in the conf file you need to tweak based on your HW configuration (cpu, disk etc.) and number of OSDs otherwise
it may hit you back badly.
Thanks & Regards
Somnath
From: Somnath Roy
Sent: Wednesday, June 17, 2015 11:25 AM
To: 'gjprabu'
Cc: Kamala Subramani ; ceph-users@xxxxxxxxxxxxxx; Siva Sokkumuthu
Subject: RE: RE: Re: [ceph-users] Ceph OSD with OCFS2
Okay..You didn’t mention anything about your rbd client host config and the cpu cores of OSD/rbd system..Some thoughts what you can do…
1. Considering pretty lean cpu config you have , I would say check for cpu usage of both OSD and rbd nodes first. If it is saturated already, you are out of
luck J
2. There are quite a bit of write path improvement went in with Hammer and latest ceph, hope you are using that code base.
3. I would say put ceph journal on SSD at least, this should give you a boost.
4. Check the pool pg number , hope this is at least 64 or so.
5. if you are using kernel rbd to map, take the latest krbd code base and build for your kernel. Reason is, there are some very important krbd performance fix
went in that unfortunately probably yet to be part of any kernel. That should give you a boost.
6. Make the following changes in your conf file if you are not doing it already and see if it is improving anything. Make sure you are at least using ‘hammer’
for this..
auth_supported = none
auth_service_required = none
auth_client_required = none
auth_cluster_required = none
debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_filer = 0/0
debug_objecter = 0/0
debug_rados = 0/0
debug_rbd = 0/0
debug_journaler = 0/0
debug_objectcatcher = 0/0
debug_client = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_keyvaluestore = 0/0
debug_newstore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
debug_mon = 0/0
debug_paxos = 0/0
debug_rgw = 0/0
osd_op_threads = 2
ms_crc_data = false
ms_crc_header = false
osd_op_num_threads_per_shard = 1
osd_op_num_shards = 12
osd_enable_op_tracker = false
7. How many copies are you having , 1 or 2 ?
Thanks & Regards
Somnath
Hi Somnath,
Yes, We will analyze is there any bottleneck, do we have any valuable command to analyze this bottleneck.
>> 1. What is your backend cluster configuration like how many OSDs, PGs/pool, HW details etc
We are using 2 OSD and there is no PGs/Pool created , it is default. Hardware is physical machine above 2 GB RAM.
>>2. Is it a single big rbd image you mounted from different hosts and running OCFS2 on top ? Please give some details on that front.
Yes, It is single rbd image we are using in different hosts and running OCFS2 on top
rbd ls
newinteg
rbd showmapped
id pool image snap device
1 rbd newinteg - /dev/rbd1
rbd info newinteg
rbd image 'newinteg':
size 70000 MB in 17500 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.1149.74b0dc51
format: 1
>> 3. Also, is this HDD or SSD setup ? If HDD, hope you have journals on SSD.
Hope so this HDD and below is the out put for disk.
*-ide
description: IDE interface
product: 82371SB PIIX3 IDE [Natoma/Triton II]
vendor: Intel Corporation
physical id: 1.1
bus info: pci@0000:00:01.1
version: 00
width: 32 bits
clock: 33MHz
capabilities: ide bus_master
configuration: driver=ata_piix latency=0
resources: irq:0 ioport:1f0(size=8) ioport:3f6 ioport:170(size=8) ioport:376 ioport:c000(size=16)
*-scsi
description: SCSI storage controller
product: Virtio block device
vendor: Red Hat, Inc
physical id: 4
bus info: pci@0000:00:04.0
version: 00
width: 32 bits
clock: 33MHz
capabilities: scsi msix bus_master cap_list
configuration: driver=virtio-pci latency=0
resources: irq:11 ioport:c080(size=64) memory:f2040000-f2040fff
Okay…I think the extra layers you have will add some delay, but 1m is high probably (I never tested Ceph on HDD though).
We can minimize it probably by optimizing the cluster setup.
Please monitor your backend cluster or even the rbd nodes to see if anything is bottleneck there.
Also, check if there is any delay between you are issuing request on OCFS2/rbd getting that/cluster getting that.
Could you please share the following details ?
1. What is your backend cluster configuration like how many OSDs, PGs/pool, HW details etc.
2. Is it a single big rbd image you mounted from different hosts and running OCFS2 on top ? Please give some details on that front.
3. Also, is this HDD or SSD setup ? If HDD, hope you have journals on SSD.
Thanks & Regards
Somnath
Yes , we are cloning repository in ceph client shared directory.
Please refer the time analyzation
Ceph Client Shared Directory :
Cloning into 'elasticsearch'...
remote: Counting objects: 373468, done.
remote: Compressing objects: 100% (76/76), done.
remote: Total 373468 (delta 66), reused 20 (delta 20), pack-reused 373371
Receiving objects: 100% (373468/373468), 137.56 MiB | 7.10 MiB/s, done.
Resolving deltas: 100% (210489/210489), done.
Checking connectivity... done.
Checking out files: 100% (5531/5531), done.
Cloning into 'elasticsearch'...
remote: Counting objects: 373594, done.
remote: Compressing objects: 100% (172/172), done.
remote: Total 373594 (delta 104), reused 20 (delta 20), pack-reused 373399
Receiving objects: 100% (373594/373594), 137.65 MiB | 8.30 MiB/s, done.
Resolving deltas: 100% (210550/210550), done.
Not only clone - repository operation like checkout , update , fetch command also getting delayed .
Prabu,
I am still not clear..
You are cloning git source repository on top of RBD + OCFS2 and that is taking extra time ?
Thanks & Regards
Somnath
Hi Somnath,
Is there any fine tune for the blow issues.
<< Also please let us know the reason ( Extra 2-3 mins is taken for hg / git repository operation like clone , pull , checkout and update.)
<< Could you please explain a bit what you are trying to do here ?
In ceph shared directory , we will clone source repository then will access the same from ceph client .
Regards
Prabu
Hi
The size differ issue is solved, This is related to ocfs2 format option and -C count should be 4K.
(mkfs.ocfs2 /dev/mapper/mpatha -N 64 -b 4K -C 256K -T mail --fs-features=extended-slotmap --fs-feature-level=max-features -L )
Need to change like below.
(mkfs.ocfs2 /dev/mapper/mpatha -b4K -C 4K -L label -T mail -N 2 /dev/sdX
<< Also please let us know the reason ( Extra 2-3 mins is taken for hg / git repository operation like clone , pull , checkout and update.)
<< Could you please explain a bit what you are trying to do here ?
In ceph shared directory , we will clone source repository then will access the same from ceph client .
Sorry, it was a typo , I meant to say 1GB only.
I would say break the problem like the following.
1. Run some fio workload say (1G) on RBD and run ceph command like ‘ceph df’ to see how much data it written. I am sure you will be seeing same data. Remember by default ceph rados object size is 4MB, so, it should write 1GB/4MB
number of objects.
2. Also, you can use ‘rados’ utility to directly put/get say 1GB file to the cluster and check similar way.
As I said, if your journal in the same device and if you measure the space consumed by entire OSD mount point , it will be more because of WA induced by Ceph. But, individual file size you transferred should not differ.
<< Also please let us know the reason ( Extra 2-3 mins is taken for hg / git repository operation like clone , pull , checkout and update.)
Could you please explain a bit what you are trying to do here ?
Thanks & Regards
Somnath
Hi,
I measured the data only what i transfered from client. Example 500MB file transfered after complete if i measured the same file size will be 1GB not 10GB.
Our Configuration is :-
=============================================================================================
ceph -w
cluster f428f5d6-7323-4254-9f66-56a21b099c1a
health HEALTH_OK
monmap e1: 3 mons at {cephadmin=172.20.19.235:6789/0,cephnode1=172.20.7.168:6789/0,cephnode2=172.20.9.41:6789/0}, election epoch 114, quorum 0,1,2 cephnode1,cephnode2,cephadmin
osdmap e9: 2 osds: 2 up, 2 in
pgmap v1022: 64 pgs, 1 pools, 7507 MB data, 1952 objects
26139 MB used, 277 GB / 302 GB avail
64 active+clean
===============================================================================================
ceph.conf
[global]
osd pool default size = 2
auth_service_required = cephx
filestore_xattr_use_omap = true
auth_client_required = cephx
auth_cluster_required = cephx
mon_host = 172.20.7.168,172.20.9.41,172.20.19.235
mon_initial_members = zoho-cephnode1, zoho-cephnode2, zoho-cephadmin
fsid = f428f5d6-7323-4254-9f66-56a21b099c1a
================================================================================================
What is the replication policy you are using ?
We are using default OSD with 2 replica not using CRUSH Map, PG num and
Erasure etc.,
What interface you used to store the data ?
We are using RBD to store data and its has been mounted with OCFS2 in client side.
How are you removing data ? Are you removing a rbd image ?
We are not removing rbd image, only removing data which is already having and removing using rm command from client. We didn't set async way to transfer or remove data
Also please let us know the reason ( Extra 2-3 mins is taken for hg / git repository operation like clone , pull , checkout and update.)
Hi,
Ceph journal works in different way. It’s a write ahead journal, all the data will be persisted first in journal and then will be written to actual place. Journal data is encoded. Journal is a fixed size partition/file and
written sequentially. So, if you are placing journal in HDDs, it will be overwritten, for SSD case , it will be GC later. So, if you are measuring amount of data written to the device it will be double. But, if you are saying you have written a 500MB file
to cluster and you are seeing the actual file size is 10G, it should not be the case. How are you seeing this size BTW ?
Could you please tell us more about your configuration ?
What is the replication policy you are using ?
What interface you used to store the data ?
Regarding your other query..
<< If i transfer 1GB data, what will be server size(OSD), Is this will write compressed format
No, actual data is not compressed. You don’t want to fill up OSD disk and there are some limits you can set . Check the following link
http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/
It will stop working if the disk is 95% full by default.
<< Is it possible to take backup from server compressed data and copy the same to other machine as Server_Backup - then start new client using Server_Backup
For backup, check the following link if that works for you.
https://ceph.com/community/blog/tag/backup/
Also, you can use RGW federated config for back up.
<< Data removal is very slow
How are you removing data ? Are you removing a rbd image ?
If you are removing entire pool , that should be fast and do deletes data async way I guess.
Thanks & Regards
Somnath
Hi Team,
Once data transfer completed the journal file should convert all memory data's to real places but our cause it showing double of the size after complete transfer, Here everyone will confuse what is real file and folder size. Also What will happen If i move
the monitoring from that osd server to separately, is the double size issue may solve ?
We have below query also.
1. Extra 2-3 mins is taken for hg / git repository operation like clone , pull , checkout and update.
2. If i transfer 1GB data, what will be server size(OSD), Is this will write compressed format.
3 . Is it possible to take backup from server compressed data and copy the same to other machine as Server_Backup - then start new client using Server_Backup.
4. Data removal is very slow.
Yes, Ceph will be writing twice , one for journal and one for actual data. Considering you configured journal in the same device , this is what you end up seeing if you are monitoring the device BW.
Thanks & Regards
Somnath
Dear Team,
We are newly using ceph with two OSD and two clients. Both clients are mounted with OCFS2 file system. Here suppose i transfer 500MB of data in the client its showing double of the size 1GB after finish data transfer. Is the behavior is correct or is there
any solution for this.
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this
message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy
any and all copies of this message in your possession (whether hard copies or electronically stored copies).
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|