Hi Christian, On 18/12/2015 04:16, Christian Balzer wrote: >> It seems to me very bad. > Indeed. > Firstly let me state that I don't use CephFS and have no clues how this > influences things and can/should be tuned. Ok, no problem. Anyway, thanks for your answer. ;) > That being said, the fio above running in VM (RBD) gives me 440 IOPS > against a single OSD storage server (replica 1) with 4 crappy HDDs and > on-disk journals on my test cluster (1Gb/s links). > So yeah, given your configuration that's bad. I have tried a quick test with a rados block device (size = 4GB with filesystem EXT4) mounted on the same client node (the client node where I'm testing cephfs) and the same "fio" command give me iops read/write equal to ~1400. So my problem could be "cephfs" specific, no? That being said, I don't know if it's can be a symptom but during the bench the iops are real-time displayed and the value seems to me no very constant. I can see sometimes peacks at 1800 iops and suddenly the value is 800 iops and re-turns up at ~1400 etc. > In comparison I get 3000 IOPS against a production cluster (so not idle) > with 4 storage nodes. Each with 4 100GB DC S3700 for journals and OS and 8 > SATA HDDs, Infiniband (IPoIB) connectivity for everything. > > All of this is with .80.x (Firefly) on Debian Jessie. Ok, interesting. My cluster is idle and but I have approximatively twice as less disks than your cluster and my SATA disk are directly connected on the motherboard. So, it seems to me logical that I have ~1400 and you ~3000, no? > You want to use atop on all your nodes and look for everything from disks > to network utilization. > There might be nothing obvious going on, but it needs to be ruled out. It's a detail but I have noticed that atop (on Ubuntu Trusty) don't display the % of bandwidth of my 10GbE interface. Anyway, I have tried to inspect the node cluster during the cephfs bench, but I have seen no bottleneck concerning CPU, network and disks. >> I use Ubuntu 14.04 on each server with the 3.13 kernel (it's the same >> for the client ceph where I run my bench) and I use Ceph 9.2.0 >> (Infernalis). > > I seem to recall that this particular kernel has issues, you might want to > scour the archives here. But, in my case, I use cephfs-fuse in the client node so the kernel version is not relevant I think. And I thought that the kernel version was not very important in the cluster nodes side. Am I wrong? >> On the client, cephfs is mounted via cephfs-fuse with this >> in /etc/fstab: >> >> id=cephfs,keyring=/etc/ceph/ceph.client.cephfs.keyring,client_mountpoint=/ /mnt/cephfs >> fuse.ceph noatime,defaults,_netdev 0 0 >> >> I have 5 cluster node servers "Supermicro Motherboard X10SLM+-LN4 S1150" >> with one 1GbE port for the ceph public network and one 10GbE port for >> the ceph private network: >> > For the sake of latency (which becomes the biggest issues when you're not > exhausting CPU/DISK), you'd be better off with everything on 10GbE, unless > you need the 1GbE to connect to clients that have no 10Gb/s ports. Yes, exactly. My client is 1Gb/s only. >> - 1 x Intel Xeon E3-1265Lv3 >> - 1 SSD DC3710 Series 200GB (with partitions for the OS, the 3 >> OSD-journals and, just for ceph01, ceph02 and ceph03, the SSD contains >> too a partition for the workdir of a monitor > The 200GB DC S3700 would have been faster, but that's a moot point and not > your bottleneck for sure. > >> - 3 HD 4TB Western Digital (WD) SATA 7200rpm >> - RAM 32GB >> - NO RAID controlleur > > Which controller are you using? No controller, the 3 SATA disks of my client are directly connected on the SATA ports of the motherboard. > I recently came across an Adaptec SATA3 HBA that delivered only 176 MB/s > writes with 200GB DC S3700s as opposed to 280MB/s when used with Intel > onboard SATA-3 ports or a LSI 9211-4i HBA. Thanks for your help Christian. -- François Lafont _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com