On Wed, Apr 8, 2015 at 7:36 PM, Lionel Bouton <lionel+ceph@xxxxxxxxxxx> wrote: > On 04/08/15 18:24, Jeff Epstein wrote: >> Hi, I'm having sporadic very poor performance running ceph. Right now >> mkfs, even with nodiscard, takes 30 mintes or more. These kind of >> delays happen often but irregularly .There seems to be no common >> denominator. Clearly, however, they make it impossible to deploy ceph >> in production. >> >> I reported this problem earlier on ceph's IRC, and was told to add >> nodiscard to mkfs. That didn't help. Here is the command that I'm >> using to format an rbd: >> >> For example: mkfs.ext4 -text4 -m0 -b4096 -E nodiscard /dev/rbd1 > > I probably won't be able to help much, but people knowing more will need > at least: > - your Ceph version, > - the kernel version of the host on which you are trying to format > /dev/rbd1, > - which hardware and network you are using for this cluster (CPU, RAM, > HDD or SSD models, network cards, jumbo frames, ...). > >> >> Ceph says everything is okay: >> >> cluster e96e10d3-ad2b-467f-9fe4-ab5269b70206 >> health HEALTH_OK >> monmap e1: 3 mons at >> {a=192.168.224.4:6789/0,b=192.168.232.4:6789/0,c=192.168.240.4:6789/0}, election >> epoch 12, quorum 0,1,2 a,b,c >> osdmap e972: 6 osds: 6 up, 6 in >> pgmap v4821: 4400 pgs, 44 pools, 5157 MB data, 1654 objects >> 46138 MB used, 1459 GB / 1504 GB avail >> 4400 active+clean Are there any "slow request" warnings in the logs? Assuming a 30 minute mkfs is somewhat reproducible, can you bump osd and ms log levels and try to capture it? Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com