Good day! Thank you, but it's not clear for me what is a bottleneck here.
- Hardware node - load average, disk IO
- underlying file system problem on osd or disk bad.
- ceph journal problem
Ceph osd partition is a part of block device which has practically no load
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda 12,00 0,00 0,12 0 0
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda 12,00 0,00 0,14 0 0
Disk with osd is good, just checked it and have good r/w speed with appropriate iops and latency.
But hardware node is working hard and have high load average. I fear that ceph-osd process lack resources. Is there any way to fix it? May be raise some kind of timeout when syncing or make this osd less weight or so?
Or its better to move this osd to another server?
Regards, Artem Silenkov, 2GIS TM. --- 2GIS LLC http://2gis.ru a.silenkov at 2gis.ru gtalk:artem.silenkov at gmail.com cell:+79231534853
2013/6/5 Gregory Farnum <greg@xxxxxxxxxxx>
This would be easier to see with a log than with all the GDB stuff, but the reference in the backtrace to "SyncEntryTimeout::finish(int)" tells me that the filesystem is taking too long to sync things to disk. Either this disk is bad or you're somehow subjecting it to a much heavier load than the others.-Greg--
On Wednesday, June 5, 2013, Artem Silenkov wrote:Good day!Tried to nullify thid osd and reinject it with no success. It works a little bit then the crash again.Regards, Artem Silenkov, 2GIS TM.---2GIS LLCcell:+792315348532013/6/5 Artem Silenkov <artem.silenkov@xxxxxxxxx>Hello!We have simple setup as follows:Debian GNU/Linux 6.0 x64Linux h08 2.6.32-19-pve #1 SMP Wed May 15 07:32:52 CEST 2013 x86_64 GNU/Linuxii ceph 0.61.2-1~bpo60+1 distributed storage and file systemii ceph-common 0.61.2-1~bpo60+1 common utilities to mount and interact with a ceph storage clusterii ceph-fs-common 0.61.2-1~bpo60+1 common utilities to mount and interact with a ceph file systemii ceph-fuse 0.61.2-1~bpo60+1 FUSE-based client for the Ceph distributed file systemii ceph-mds 0.61.2-1~bpo60+1 metadata server for the ceph distributed file systemii libcephfs1 0.61.2-1~bpo60+1 Ceph distributed file system client libraryii libc-bin 2.11.3-4 Embedded GNU C Library: Binariesii libc-dev-bin 2.11.3-4 Embedded GNU C Library: Development binariesii libc6 2.11.3-4 Embedded GNU C Library: Shared librariesii libc6-dev 2.11.3-4 Embedded GNU C Library: Development Libraries and Header FilesAll programs are running fine except osd.2 which is crashing repeatedly.All other nodes have the same operating system onboard and all the system environment is quite identical.#cat /etc/ceph/ceph.conf[global]pid file = /var/run/ceph/$name.pidauth cluster required = noneauth service required = noneauth client required = nonemax open files = 65000[mon][mon.0]host = h01mon addr = 10.1.1.3:6789[mon.1]host = h07mon addr = 10.1.1.10:6789[mon.2]host = h08mon addr = 10.1.1.11:6789[mds][mds.3]host = h09[mds.4]host = h06[osd]osd journal size = 10000osd journal = /var/lib/ceph/journal/$cluster-$id/journalosd mkfs type = xfs[osd.0]host = h01addr = 10.1.1.3devs = /dev/sda3[osd.1]host = h07addr = 10.1.1.10devs = /dev/sda3[osd.2]host = h08addr = 10.1.1.11devs = /dev/sda3[osd.3]host = h09addr = 10.1.1.12devs = /dev/sda3[osd.4]host = h06addr = 10.1.1.9devs = /dev/sda3~#ceph osd tree# id weight type name up/down reweight-1 5 root default-3 5 rack unknownrack-2 1 host h010 1 osd.0 up 1-4 1 host h071 1 osd.1 up 1-5 1 host h082 1 osd.2 down 0-6 1 host h093 1 osd.3 up 1-7 1 host h064 1 osd.4 up 1
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com