Hi, Haomai Two many threads and connection is fine. But will it impact performance? >From our test environment, the VM with more than 700 threads has poor IOPS performance. The qemu-system-x86 process's cpu usage is high(1420%): 15801 libvirt- 20 0 33.7g 1.4g 11m R 1420 0.6 1322:26 qemu-system-x86 we use perf to trace the pid(15801), it seems context switch occupy the many CPU cycle and impact the performance (because i see do_raw_spin_lock will be called in sched.c in the kernel): Samples: 1M of event 'cycles', Event count (approx.): 1057109744252 - 75.23% qemu-system-x86 [kernel.kallsyms] [k] do_raw_spin_lock So, i hope i could get an answer here, will the threads number impact performance so much? If so, any solution for it? Change different LTS version of ceph? my server: 256G RAM 32 CPU (model name : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz) Two pools with 9 osd storage servers ssd pool: 2 OSD (intel DC 3500 SSD) on each server SAS pool 8 OSD (sas ) on each server. The VM has volumes from both pools, and we test on those volumes. The performance of volume from SSD cut down from 15K to 5K when the VM become abornormal(more than 700 threads as describe above). We are really stuck here! Thanks! ------------------ hzwulibin 2015-10-27 ------------------------------------------------------------- 发件人:Haomai Wang <haomaiwang@xxxxxxxxx> 发送日期:2015-10-27 13:36 收件人:hzwulibin 抄送:ceph-devel 主题:Re: Re: [ceph-users] Understanding the number of TCP connections between clients and OSDs On Tue, Oct 27, 2015 at 9:12 AM, hzwulibin <hzwulibin@xxxxxxxxx> wrote: > Hi, develops > > I also concerns about this problem. And my problem is how many threads will the qemu-system-x86 has? > When will it cut down the threads? It's because of network model, each connection will has two threads. We are actually working on this to avoid. BTW, for client level, maybe we can add a proxy for ceph message to avoid too much tcp socket on client host. But it need let us improve single connection's performance. > > From what i tested, it could between 100 to 800, yeah, maybe it has relationship with the osd number. But it > seems affect the performance when it has many threads. From what i tested, 4k randwrite will reduce from 15k > to 4k. That's really unacceptable! > > My evnironment: > > 1. nine OSD storage servers with two intel DC 3500 SSD on each > 2. hammer 0.94.3 > 3. QEMU emulator version 2.1.2 (Debian 1:2.1+dfsg-12+deb8u4~bpo70+1) > > Thanks! > > ------------------ > hzwulibin > 2015-10-27 > > ------------------------------------------------------------- > 发件人:Jan Schermer <jan@xxxxxxxxxxx> > 发送日期:2015-10-27 05:48 > 收件人:Rick Balsano > 抄送:ceph-users@xxxxxxxxxxxxxx > 主题:Re: [ceph-users] Understanding the number of TCP connections > between clients and OSDs > > If we're talking about RBD clients (qemu) then the number also grows with number of volumes attached to the client. With a single volume it was <1000. It grows when there's heavy IO happening in the guest. > I had to bump up the file open limits to several thusands (8000 was it?) to accomodate client with 10 volumes in our cluster. We just scaled the number of OSDs down so hopefully I could have a graph of that. > But I just guesstimated what it could become, and that's not necessarily what the theoretical limit is. Very bad things happen when you reach that threshold. It could also depend on the guest settings (like queue depth), and how much it seeks over the drive (how many different PGs it hits), but knowing the upper bound is most critical. > > Jan > >> On 26 Oct 2015, at 21:32, Rick Balsano <rick@xxxxxxxxxx> wrote: >> >> We've run into issues with the number of open TCP connections from a single client to the OSDs in our Ceph cluster. >> >> We can (& have) increased the open file limit to work around this, but we're looking to understand what determines the number of open connections maintained between a client and a particular OSD. Our naive assumption was 1 open TCP connection per OSD or per port made available by the Ceph node. There are many more than this, presumably to allow parallel connections, because we see 1-4 connections from each client per open port on a Ceph node. >> >> Here is some background on our cluster: >> * still running Firefly 0.80.8 >> * 414 OSDs, 35 nodes, one massive pool >> * clients are KVM processes, accessing Ceph RBD images using virtio >> * total number of open TCP connections from one client to all nodes between 500-1000 >> >> Is there any way to either know or cap the maximum number of connections we should expect? >> >> I can provide more info as required. I've done some searches and found references to "huge number of TCP connections" but nothing concrete to tell me how to predict how that scales. >> >> Thanks, >> Rick >> -- >> Rick Balsano >> Senior Software Engineer >> Opower <http://www.opower.com/> >> >> O +1 571 384 1210 >> We're Hiring! See jobs here <http://www.opower.com/careers>. >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Best Regards, Wheat ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f