Re: trace-cmd fails with many cpus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 12 Apr 2019 09:53:33 -0400
Phil Auld <pauld@xxxxxxxxxx> wrote:

> Hi,
> 
> I was trying to get some sched traces on a 160 cpu box yesterday. Trace-cmd 
> failed with 

Thanks for the report!

> 
> # ./tracecmd/trace-cmd record -e "sched:*" sleep 2
> none
> trace-cmd: Invalid argument
>   Failed filter of /sys/kernel/tracing/events/sched/sched_switch/filter
> 
> trace-cmd: No such file or directory
>   can not stat 'trace.dat.cpu0'
> #
> 
> 
> Which can be seen better with strace
> 
> [pid 97653] open("/sys/kernel/tracing/events/sched/sched_swap_numa/filter", O_WRONLY|O_TRUNC) = 5
> [pid 97653] write(5, "(common_pid!=97652)&&(common_pid"..., 3358) = 3358
> [pid 97653] close(5)                    = 0
> [pid 97653] open("/sys/kernel/tracing/events/sched/sched_switch/filter", O_WRONLY|O_TRUNC) = 5
> [pid 97653] write(5, "(common_pid!=97652)&&(common_pid"..., 6398) = -1 EINVAL (Invalid argument)
> [pid 97653] close(5)                    = 0
> 
> The filter file can only take a max write of length PAGE_SIZE. 

Ah yeah. By default we try not to trace the recorders. Newer kernels
have a set_event_pid which is used for only tracing specific tasks for
the events. I wonder if we should allow for "!pid" to be sent to that
file as something to not be traced?

But that doesn't help you now.

Hmm, I thought we had an option to disable this, but I don't see one.
That's the first thing we should do. Add an option such that you record
all events, even the threads (which is something I would definitely
want!).

> 
> The extra pid filtering added for "next_pid"  more or less doubles length
> and pushes it over the 4k limit. 
> 
> WRITE: /sys/kernel/tracing/events/sched/sched_switch/filter, len 6718, data "(common_pid!=100199)&&(common_pid!=100198)&&(common_pid!=100197)&&(common_pid!=100196)&&(common_pid!=100195)&&(common_pid!=100194)&&(common_pid!=100193)&&(common_pid!=100192)&&(common_pid!=100191) ... 160 of these ...
> &&(common_pid!=100040)||(next_pid!=100199)&&(next_pid!=100198)&&(next_pid!=100197)&&(next_pid!=100196)&&(next_pid!=100195)&&(next_pid!=100194)&&(next_pid!=100193)&&(next_pid!=100192)...  160 of these...
> 
> 
> I suppose the answer is don't run on a system with that many cpus  :)
> 
> But I wonder if it would be possible to have the threads each handle say 8 cpu
> files or something.

Actually, I think another solution is to consolidate the pids that are
to be excluded and sort them. Thus if we have (which is very likely the
case)

 (common_pid!=1000)&&(common_pid!=1001)&&(common_pid!=1002)

That we change that to:

  !((common_pid>=1000)||(common_pid<=1002))

Which would also have the affect of improving the filter logic within
the kernel as well.

Tzvetomir or Slavomir, would either of you be able to implement the
above? Both adding an option to disable this (--no-filter) and the
sorting of the excluded pids?

Thanks!

-- Steve


> 
> Or maybe have the kernel filter accept an "all_pid" that covered common_pid, next_pid, pid to reduce
> the number of items needed in there? 
> 




[Index of Archives]     [Linux USB Development]     [Linux USB Development]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux