RE: Deadlock with FTrace in child process

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Sat, 19 Oct 2019 07:50:53 +0000
> Allende Imanol <iallende@xxxxxxxxxx> wrote:
> 
> > I am using Ftrace in a Ultra96 (arm64) 4.19.75-cip11 and it seems that
> > I have a deadlock situation. I have a simple application executed as a
> > child process from a python script. The child process reads from
> > /dev/urandom and writes in /dev/zero. Ftrace is set only for this child
> > process.
> 
> Are you saying that ftrace is calling the blocked task?
> 
> If you don't run ftrace, the task never gets stuck?

Without FTrace I did not get to achieve the same stuck scenario.

> 
> 
> >
> > After thousands of executions of the child process with a high IRQ and
> > CPU load (it varies in the number but it is quite reproducible) - the child
> > process gets blocked. But the system still is running.
> >
> > The child process appears as "sleeping state" and if I try to do kill
> > -18 $pid it does not change.
> >
> > In this case the child process is 4197 PID.
> >
> > # ps -aux | grep python
> > root      1145  6.3  0.4  85448  9508 ?        S<l  00:00  23:02 python3 test_client.py
> > root      4197  0.0  0.4  85448  8256 ?        S<   05:53   0:00 python3 test_client.py
> > root      5279  0.0  0.0   2752   280 ttyPS0   S+   06:02   0:00 grep python
> > # cat /proc/4197/stack
> > [<0>] __switch_to+0x94/0xd0
> > [<0>] futex_wait_queue_me+0xd0/0x150
> > [<0>] futex_wait+0x12c/0x200
> > [<0>] do_futex+0x304/0xc10
> 
> Looks like the task is blocked on a user space mutex (futex).
> 
> > [<0>] __arm64_sys_futex+0x154/0x190
> > [<0>] el0_svc_common+0xa0/0x170
> > [<0>] el0_svc_handler+0x38/0x80
> > [<0>] el0_svc+0x8/0xc
> > [<0>] 0xffffffffffffffff
> >

I could reproduce it in my Intel laptop (Debian 10 - 4.19 kernel):
# cat /proc/21719/stack 
[<0>] futex_wait_queue_me+0xc1/0x120
[<0>] futex_wait+0x13f/0x240
[<0>] do_futex+0x3f7/0xbf0
[<0>] __x64_sys_futex+0x143/0x180
[<0>] do_syscall_64+0x53/0x110
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0>] 0xffffffffffffffff

> >
> > The stack is more or less the same every time the block happens.
> >
> > If I kill -9 the child process the execution loop will continue with
> > spawning the next child so it seems that the deadlock is really only
> > in the child process - but do not know why.
> >
> > I am using function_graph traces and I added the funcgraph-proc,
> > noirq-info and nofuncgraphs-irq options. I disable tracing after each
> > iteration, and I clear the trace buffer and pid set also.
> >
> > I add the application that is executed as the child process:
> 
> I don't understand what you meant by this. Is the below the code that
> gets stuck? But the above shows a python task?

The python script executes the read/write application as a child process

> 
> >
> > #define DEV1 "/dev/urandom"
> > #define DEV2 "/dev/null"
> >
> > int main(int argc, char **argv)
> > {
> >  unsigned char result;
> >  int fd1, fd2, ret;
> >  char res_str[10] = {0};
> >
> >  fd1 = open(DEV1, O_RDONLY);
> >  fd2 = open(DEV2, O_WRONLY);
> >
> >  ret = read(fd1, &result, 1);
> >  sprintf(res_str, "%d", result);
> >  ret = write(fd2, res_str, strlen(res_str));
> >
> >  close(fd1);
> >  close(fd2);
> >
> >  return ret;
> > }
> 
> Can you give the full reproducer, and explain the problem in a bit more
> detail. I don't really see how ftrace is involved here, besides that
> you are running it.

This is a part of a bigger trace analysis tool that we are still developing. 
https://sil2.osadl.org/download/DB4SIL2_client.tar.gz

It is ready to execute in Intel based computer. To execute the tool:
# python3 test_client.py

The tool starts executing hackbench, to disable it, just comment first 2
lines of test_client_code()

There is a README with more detailed in explanation

> 
> -- Steve
> 
> >
> > The preemption configuration is set to Preemptible kernel and the Ftrace
> > config is set:
> > CONFIG_FTRACE=y
> > CONFIG_FUNCTION_TRACER=y
> > CONFIG_FUNCTION_GRAPH_TRACER=y
> > # CONFIG_PREEMPTIRQ_EVENTS is not set
> > # CONFIG_IRQSOFF_TRACER is not set
> > # CONFIG_PREEMPT_TRACER is not set
> > # CONFIG_SCHED_TRACER is not set
> > # CONFIG_HWLAT_TRACER is not set
> > CONFIG_FTRACE_SYSCALLS=y
> > # CONFIG_TRACER_SNAPSHOT is not set
> > CONFIG_BRANCH_PROFILE_NONE=y
> > # CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
> > # CONFIG_PROFILE_ALL_BRANCHES is not set
> > CONFIG_STACK_TRACER=y
> > # CONFIG_BLK_DEV_IO_TRACE is not set
> > CONFIG_UPROBE_EVENTS=y
> > CONFIG_PROBE_EVENTS=y
> > CONFIG_DYNAMIC_FTRACE=y
> > # CONFIG_FUNCTION_PROFILER is not set
> > CONFIG_FTRACE_MCOUNT_RECORD=y
> > # CONFIG_FTRACE_STARTUP_TEST is not set
> > # CONFIG_HIST_TRIGGERS is not set
> > # CONFIG_TRACEPOINT_BENCHMARK is not set
> > # CONFIG_RING_BUFFER_BENCHMARK is not set
> > # CONFIG_RING_BUFFER_STARTUP_TEST is not set
> > # CONFIG_PREEMPTIRQ_DELAY_TEST is not set
> > # CONFIG_TRACE_EVAL_MAP_FILE is not set
> > CONFIG_TRACING_EVENTS_GPIO=y





[Index of Archives]     [Linux USB Development]     [Linux USB Development]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux