RE: outliers when measuring boost msqueues latency v4.14.79-rt47 RT-Preempt

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
Thx again for digging into my problem!

> In include/uapi/linux/futex.h you have
> | #define FUTEX_LOCK_PI           6
> | #define FUTEX_WAIT_BITSET       9

> among other defines. So op=139 is FUTEX_PRIVATE_FLAG +
> FUTEX_WAIT_REQUEUE_PI. This looks good.

Thx, I figured this out in the meantime;-). But taking a closer look at the syscall_futex_entries I see also a lot of op=0 and op=1, meaning FUTEX_WAIT and FUTEX_WAKE. Is this problematic here since you mentioned I should only see FUTEX_LOCK_PI and FUTEX_UNLOCK_PI?
Here is my sending loop in MSQueueThreadSend:

while (msg_counter_send < cnt) {
	vosMessageTracepoint("RTTestTraceSend", msg_counter_send, "before wait");
	retval = msg_queue_sem->Wait(-1); // wait receiving thread posts the semaphore
	vosMessageTracepoint("RTTestTraceSend", retval, "after wait");
   	getPrecTime(&msq_t0); //clock_gettime()
   	msq_send->Send(&msg_counter_send, kmsg_size); //boost::interprocess::message_queue::send()
	vosMessageTracepoint("RTTestTraceSend", msg_counter_send, "send");
    	msg_counter_send++;
  }

and this is the receiving thread part:

while (msg_counter_rcv < cnt) {
	  vosMessageTracepoint("RTTestTraceRec", msg_counter_rcv, "start while");
	  if (msq_rcv->TimedReceive(reinterpret_cast<void*>(&msg), kmsg_size, &recvd_size, kmsq_tmo)) { //boost::interprocess::message_queue::timed_receive()
		  vosMessageTracepoint("RTTestTraceRec", msg_counter_rcv, "out of timed_receive");
		  getPrecTime(&msq_t1);
		  err = (msg != (int)msg_counter_rcv);
		  fillMsqUsageTime(cnt, &msq_t0, &msq_t1, err);
		  vosMessageTracepoint("RTTestTraceRec", msg_counter_rcv-msg, "received");
		  msg_counter_rcv++;
		  retval = msg_queue_sem->Post(); //signal to send the next message
		 vosMessageTracepoint("RTTestTraceRec", retval, "post");
	  }
  }

On a healthy run the traces look basically like this (sorry again for the lengthy stuff):

Trace	Timestamp	Channel	CPU	Event type	Contents	TID	Prio
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 666 237	u_0	0	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceSend, event_id=9844, message=send		
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 666 399	k_1	1	syscall_exit_futex	ret=0, uaddr=3069280280, uaddr2=128	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 668 677	k_0	0	syscall_entry_getcpu	tcache=0x0	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 669 327	k_1	1	syscall_entry_getcpu	tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 669 490	k_0	0	syscall_exit_getcpu	ret=0, cpup=0xb57e38c4, nodep=0x0, tcache=0x0	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 670 466	k_1	1	syscall_exit_getcpu	ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 670 466	u_0	0	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceSend, event_id=9845, message=before wait		
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 671 767	u_1	1	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceRec, event_id=9844, message= out of timed_receive		
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 672 906	k_0	0	syscall_entry_futex	uaddr=2079364, op=139, val=19371, utime=0, uaddr2=2079336, val3=2079360	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 673 882	k_1	1	syscall_entry_clock_gettime	which_clock=4	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 675 509	k_1	1	syscall_exit_clock_gettime	ret=0, tp=1733956	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 677 461	k_0	0	rcu_utilization	s=Start context switch	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 678 111	k_1	1	syscall_entry_getcpu	tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 678 274	k_0	0	rcu_utilization	s=End context switch	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 678 925	k_1	1	syscall_exit_getcpu	ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 679 901	u_1	1	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceRec, event_id=0, message=received		
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 682 503	k_1	1	syscall_entry_futex	uaddr=2079364, op=140, val=1, utime=0, uaddr2=2079336, val3=19372	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 683 642	k_0	0	sched_switch	prev_comm=MSQueueThreadSe, prev_tid=867, prev_prio=-49, prev_state=1, next_comm=CPU-BG-Thread0, next_tid=836, next_prio=-20	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 688 522	k_1	1	kmem_kmalloc	call_site=0xc02b5540, ptr=0xd26876c0, bytes_req=48, bytes_alloc=64, gfp_flags=21004480	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 697 143	k_1	1	syscall_exit_futex	ret=1, uaddr=2079364, uaddr2=2079336	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 699 421	k_1	1	syscall_entry_futex	uaddr=2079336, op=135, val=35, utime=2147484516, uaddr2=3044516608, val3=0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 704 789	k_1	1	sched_waking	comm=MSQueueThreadSe, tid=867, prio=-49, target_cpu=0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 709 669	k_1	1	sched_wakeup	comm=MSQueueThreadSe, tid=867, prio=-49, target_cpu=0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 710 319	k_0	0	rcu_utilization	s=Start context switch	836	-20
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 711 458	k_0	0	rcu_utilization	s=End context switch	836	-20
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 713 085	k_1	1	syscall_exit_futex	ret=0, uaddr=2079336, uaddr2=3044516608	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 713 410	k_0	0	sched_switch	prev_comm=CPU-BG-Thread0, prev_tid=836, prev_prio=-20, prev_state=0, next_comm=MSQueueThreadSe, next_tid=867, next_prio=-49	836	-20
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 715 525	k_1	1	syscall_entry_getcpu	tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 716 663	k_1	1	syscall_exit_getcpu	ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 717 802	u_1	1	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceRec, event_id=0, message=post		
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 718 290	k_0	0	kmem_kfree	call_site=0xc02b5948, ptr=0xd26876c0	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 720 242	k_1	1	syscall_entry_getcpu	tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 721 055	k_0	0	syscall_exit_futex	ret=0, uaddr=2079364, uaddr2=2079336	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 721 055	k_1	1	syscall_exit_getcpu	ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 722 031	u_1	1	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceRec, event_id=9845, message=start while		
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 723 333	k_0	0	syscall_entry_futex	uaddr=2079336, op=135, val=35, utime=2147484515, uaddr2=3044950784, val3=0	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 726 911	k_0	0	syscall_exit_futex	ret=0, uaddr=2079336, uaddr2=3044950784	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 729 677	k_0	0	syscall_entry_getcpu	tcache=0x0	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 730 653	k_0	0	syscall_exit_getcpu	ret=0, cpup=0xb57e38c4, nodep=0x0, tcache=0x0	867	-49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 731 954	u_0	0	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceSend, event_id=0, message=after wait		
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 734 069	k_0	0	syscall_entry_clock_gettime	which_clock=4	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 735 533	k_0	0	syscall_exit_clock_gettime	ret=0, tp=1733948	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 741 389	k_0	0	syscall_entry_futex	uaddr=3069284376, op=0, val=2, utime=0, uaddr2=3069284376, val3=0	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 742 365	k_1	1	syscall_entry_futex	uaddr=3069280280, op=1, val=1, utime=0, uaddr2=128, val3=0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 747 407	k_0	0	syscall_exit_futex	ret=-11, uaddr=3069284376, uaddr2=3069284376	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 749 034	k_1	1	syscall_exit_futex	ret=0, uaddr=3069280280, uaddr2=128	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 750 335	k_0	0	syscall_entry_futex	uaddr=3069284400, op=0, val=2, utime=0, uaddr2=3069284400, val3=0	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 750 498	k_1	1	syscall_entry_futex	uaddr=3069280304, op=1, val=1, utime=0, uaddr2=0, val3=19097	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 755 215	k_0	0	syscall_exit_futex	ret=-11, uaddr=3069284400, uaddr2=3069284400	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 755 866	k_1	1	syscall_exit_futex	ret=0, uaddr=3069280304, uaddr2=0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 756 679	k_0	0	syscall_entry_futex	uaddr=3069284404, op=5, val=1, utime=1, uaddr2=3069284400, val3=67108865	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 757 330	k_1	1	syscall_entry_futex	uaddr=3069280308, op=265, val=19097, utime=3044514512, uaddr2=0, val3=4294967295	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 758 957	k_1	1	timer_hrtimer_init	hrtimer=0xd26e3e00, clockid=0, mode=0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 763 349	k_0	0	syscall_exit_futex	ret=0, uaddr=3069284404, uaddr2=3069284400	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 764 487	k_1	1	syscall_exit_futex	ret=-11, uaddr=3069280308, uaddr2=0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 765 138	k_0	0	syscall_entry_futex	uaddr=3069284376, op=1, val=1, utime=0, uaddr2=128, val3=0	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 767 578	k_1	1	syscall_entry_futex	uaddr=3069280280, op=1, val=1, utime=0, uaddr2=128, val3=0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 768 229	k_0	0	syscall_exit_futex	ret=0, uaddr=3069284376, uaddr2=128	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 770 669	k_1	1	syscall_exit_futex	ret=0, uaddr=3069280280, uaddr2=128	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 770 994	k_0	0	syscall_entry_getcpu	tcache=0x0	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 771 970	k_0	0	syscall_exit_getcpu	ret=0, cpup=0xb57e38c4, nodep=0x0, tcache=0x0	867	-49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 773 271	u_0	0	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceSend, event_id=9845, message=send

On an unhealthy run I would e.g see the "send" and also the "out of timed_receive" (and also as a last thing the exit_clock_gettime), but I would get the "receive" trace 40ms later.
I noticed a sched_migrate_task for my receiving thread just before getting back to normal:

Trace	Timestamp	Channel	CPU	Event type	Contents	TID	Prio
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 196 914	k_1	1	sched_migrate_task	comm=MSQueueThreadRc, tid=868, prio=-49, orig_cpu=1, dest_cpu=0	105	-51
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 202 119	k_0	0	rcu_utilization	s=Start context switch	836	-20
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 203 095	k_0	0	rcu_utilization	s=End context switch	836	-20
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 205 210	k_0	0	sched_switch	prev_comm=CPU-BG-Thread0, prev_tid=836, prev_prio=-20, prev_state=0, next_comm=MSQueueThreadRc, next_tid=868, next_prio=-49	836	-20
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 209 764	k_1	1	irq_softirq_raise	vec=3	105	-51
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 218 060	k_1	1	irq_softirq_entry	vec=3	105	-51
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 218 386	k_0	0	syscall_entry_getcpu	tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 220 663	k_1	1	napi_poll	napi=0xeda39038, dev_name=eth0, work=0, budget=64	105	-51
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 221 151	k_0	0	syscall_exit_getcpu	ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 222 615	k_1	1	irq_softirq_exit	vec=3	105	-51
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.317 224 730	u_0	0	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceRec, event_id=0, message=received		

Sorry for this lengthy post but I am kind of clueless where to look any further.
Cheers
Andy




[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux