On Ubuntu 13.04, ceph 0.61.4.
I was running an fio read test as below, then it hung:
root@ceph-node2:/mnt# fio -filename=/dev/rbd1 -direct=1 -iodepth 1 -thread -rw=read -ioengine=psync -bs=4k -size=50G -numjobs=16 -group_reporting -name=mytest
mytest: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=psync, iodepth=1
...
mytest: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=psync, iodepth=1
2.0.8
Starting 16 threads
^Cbs: 16 (f=16): [RRRRRRRRRRRRRRRR] [0.1% done] [0K/0K /s] [0 /0 iops] [eta 02d:01h:34m:39s]
fio: terminating on signal 2
^Cbs: 16 (f=16): [RRRRRRRRRRRRRRRR] [0.1% done] [0K/0K /s] [0 /0 iops] [eta 02d:18h:36m:23s]
fio: terminating on signal 2
Jobs: 16 (f=16): [RRRRRRRRRRRRRRRR] [0.1% done] [0K/0K /s] [0 /0 iops] [eta 04d:07h:40m:55s]
The top command shown that one cpu was waiting for disk IO, and the other was idle:
top - 20:28:30 up 1 day, 6:02, 3 users, load average: 16.00, 13.91, 8.55
Tasks: 141 total, 1 running, 139 sleeping, 0 stopped, 1 zombie
%Cpu0 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 0.3 us, 0.3 sy, 0.0 ni, 0.0 id, 99.3 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 4013924 total, 702112 used, 3311812 free, 3124 buffers
KiB Swap: 3903484 total, 184520 used, 3718964 free, 74156 cached
root@ceph-node4:~# ceph -s
health HEALTH_OK
monmap e5: 3 mons at {ceph-node0=172.18.11.30:6789/0,ceph-node2=172.18.11.32:6789/0,ceph-node4=172.18.11.34:6789/0}, election epoch 714, quorum 0,1,2 ceph-node0,ceph-node2,ceph-node4
osdmap e4043: 11 osds: 11 up, 11 in
pgmap v92429: 1192 pgs: 1192 active+clean; 530 GB data, 1090 GB used, 9041 GB / 10131 GB avail
mdsmap e1: 0/0/1 up
Nothing error found in the ceph.log.
Anything else I can collect for investigation?
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com