On Mon, 2 Nov 2020 12:37:21 -0500 Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > The only race that I see that can happen, is the one in the comment I > showed. And that is after enabling the recursed functions again after > clearing, one CPU could add a function while another CPU that just added > that same function could be just exiting this routine, notice that a > clearing of the array happened, and remove its function (which was the same > as the one just happened). So we get a "zero" in the array. If this > happens, it is likely that that function will recurse again and will be > added later. > Updated version of this function: -- Steve void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip) { int index = 0; int i; unsigned long old; again: /* First check the last one recorded */ if (ip == cached_function) return; i = atomic_read(&nr_records); /* nr_records is -1 when clearing records */ smp_mb__after_atomic(); if (i < 0) return; /* * If there's two writers and this writer comes in second, * the cmpxchg() below to update the ip will fail. Then this * writer will try again. It is possible that index will now * be greater than nr_records. This is because the writer * that succeeded has not updated the nr_records yet. * This writer could keep trying again until the other writer * updates nr_records. But if the other writer takes an * interrupt, and that interrupt locks up that CPU, we do * not want this CPU to lock up due to the recursion protection, * and have a bug report showing this CPU as the cause of * locking up the computer. To not lose this record, this * writer will simply use the next position to update the * recursed_functions, and it will update the nr_records * accordingly. */ if (index < i) index = i; if (index >= CONFIG_FTRACE_RECORD_RECURSION_SIZE) return; for (i = index - 1; i >= 0; i--) { if (recursed_functions[i].ip == ip) { cached_function = ip; return; } } cached_function = ip; /* * We only want to add a function if it hasn't been added before. * Add to the current location before incrementing the count. * If it fails to add, then increment the index (save in i) * and try again. */ old = cmpxchg(&recursed_functions[index].ip, 0, ip); if (old != 0) { /* Did something else already added this for us? */ if (old == ip) return; /* Try the next location (use i for the next index) */ index++; goto again; } recursed_functions[index].parent_ip = parent_ip; /* * It's still possible that we could race with the clearing * CPU0 CPU1 * ---- ---- * ip = func * nr_records = -1; * recursed_functions[0] = 0; * i = -1 * if (i < 0) * nr_records = 0; * (new recursion detected) * recursed_functions[0] = func * cmpxchg(recursed_functions[0], * func, 0) * * But the worse that could happen is that we get a zero in * the recursed_functions array, and it's likely that "func" will * be recorded again. */ i = atomic_read(&nr_records); smp_mb__after_atomic(); if (i < 0) cmpxchg(&recursed_functions[index].ip, ip, 0); else if (i <= index) atomic_cmpxchg(&nr_records, i, index + 1); }