On 2022/11/5 06:43, Qi Zheng wrote:
On 2022/11/5 02:21, Dmitry Vyukov wrote:
On Fri, 4 Nov 2022 at 11:03, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
On Fri, Nov 04, 2022 at 10:47:17AM -0700, Dmitry Vyukov wrote:
Do we know how common/useful such an allocation pattern is?
I have coded something like this a few times, in my cases it is
usually something like: try to allocate a big chunk of memory hoping
for a huge page, then fall back to a smaller allocation
Most likely the key consideration is that the callsites are using
GFP_NOWARN, so perhaps we can just avoid decrementing the nth on a
NOWARN case assuming that another allocation attempt will closely
follow?
GFP_NOWARN is also extensively used for allocations with
user-controlled size, e.g.:
https://elixir.bootlin.com/linux/v6.1-rc3/source/net/unix/af_unix.c#L3451
That's different and these allocations are usually not repeated.
So looking at GFP_NOWARN does not look like the right thing to do.
This may be the best option then, arguably perhaps even more
"realistic" than normal fail_nth as in a real system if this stuff
starts failing there is a good chance things from then on will fail
too during the error cleanup.
However, this would also have to fix the obnoxious behavior of fail
nth where it fails its own copy_from_user on its write system call -
meaning there would be no way to turn it off.
Oh, interesting. We added failing of copy_from/to_user later and did
not consider such interaction.
Filed https://bugzilla.kernel.org/show_bug.cgi?id=216660 for this.
Oh, I will tell you the other two bugish things I noticed
__should_failslab() has this:
if (gfpflags & __GFP_NOWARN)
failslab.attr.no_warn = true;
return should_fail(&failslab.attr, s->object_size);
Which always permanently turns off no_warn for slab during early
boot. This is why syzkaller reports are so confusing. They trigger a
slab fault injection, which in all other cases gives a notification
backtrace, but in slab cases there is no hint about the fault
injection in the log.
Ouch, this looks like a bug in:
commit 3f913fc5f9745613088d3c569778c9813ab9c129
Author: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
Date: Thu May 19 14:08:55 2022 -0700
mm: fix missing handler for __GFP_NOWARN
+Qi could you please fix it?
At the very least the local gfpflags should not alter the global
failslab.attr that is persistent and shared by all tasks.
Oh, It indeed shouldn't alter the global failslab.attr, I'll fix it.
How about the following changes? If it's ok, I will send this fix patch.
Thanks. :)
diff --git a/include/linux/fault-inject.h b/include/linux/fault-inject.h
index 9f6e25467844..b61a3fb7a2a3 100644
--- a/include/linux/fault-inject.h
+++ b/include/linux/fault-inject.h
@@ -20,7 +20,6 @@ struct fault_attr {
atomic_t space;
unsigned long verbose;
bool task_filter;
- bool no_warn;
unsigned long stacktrace_depth;
unsigned long require_start;
unsigned long require_end;
@@ -40,12 +39,12 @@ struct fault_attr {
.ratelimit_state = RATELIMIT_STATE_INIT_DISABLED, \
.verbose = 2, \
.dname = NULL, \
- .no_warn = false, \
}
#define DECLARE_FAULT_ATTR(name) struct fault_attr name =
FAULT_ATTR_INITIALIZER
int setup_fault_attr(struct fault_attr *attr, char *str);
bool should_fail(struct fault_attr *attr, ssize_t size);
+bool should_fail_gfp(struct fault_attr *attr, ssize_t size, gfp_t
gfpflags);
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
diff --git a/lib/fault-inject.c b/lib/fault-inject.c
index 4b8fafce415c..95af50832770 100644
--- a/lib/fault-inject.c
+++ b/lib/fault-inject.c
@@ -41,9 +41,6 @@ EXPORT_SYMBOL_GPL(setup_fault_attr);
static void fail_dump(struct fault_attr *attr)
{
- if (attr->no_warn)
- return;
-
if (attr->verbose > 0 && __ratelimit(&attr->ratelimit_state)) {
printk(KERN_NOTICE "FAULT_INJECTION: forcing a failure.\n"
"name %pd, interval %lu, probability %lu, "
@@ -98,12 +95,7 @@ static inline bool fail_stacktrace(struct fault_attr
*attr)
#endif /* CONFIG_FAULT_INJECTION_STACKTRACE_FILTER */
-/*
- * This code is stolen from failmalloc-1.0
- * http://www.nongnu.org/failmalloc/
- */
-
-bool should_fail(struct fault_attr *attr, ssize_t size)
+bool should_fail_check(struct fault_attr *attr, ssize_t size)
{
bool stack_checked = false;
@@ -118,7 +110,7 @@ bool should_fail(struct fault_attr *attr, ssize_t size)
fail_nth--;
WRITE_ONCE(current->fail_nth, fail_nth);
if (!fail_nth)
- goto fail;
+ return true;
return false;
}
@@ -151,7 +143,19 @@ bool should_fail(struct fault_attr *attr, ssize_t size)
if (attr->probability <= get_random_u32_below(100))
return false;
-fail:
+ return true;
+}
+
+/*
+ * This code is stolen from failmalloc-1.0
+ * http://www.nongnu.org/failmalloc/
+ */
+
+bool should_fail(struct fault_attr *attr, ssize_t size)
+{
+ if (!should_fail_check(attr, size))
+ return false;
+
fail_dump(attr);
if (atomic_read(&attr->times) != -1)
@@ -161,6 +165,21 @@ bool should_fail(struct fault_attr *attr, ssize_t size)
}
EXPORT_SYMBOL_GPL(should_fail);
+bool should_fail_gfp(struct fault_attr *attr, ssize_t size, gfp_t gfpflags)
+{
+ if (!should_fail_check(attr, size))
+ return false;
+
+ if (!(gfpflags & __GFP_NOWARN))
+ fail_dump(attr);
+
+ if (atomic_read(&attr->times) != -1)
+ atomic_dec_not_zero(&attr->times);
+
+ return true;
+}
+EXPORT_SYMBOL_GPL(should_fail_gfp);
+
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
static int debugfs_ul_set(void *data, u64 val)
diff --git a/mm/failslab.c b/mm/failslab.c
index 58df9789f1d2..21338b256791 100644
--- a/mm/failslab.c
+++ b/mm/failslab.c
@@ -30,10 +30,7 @@ bool __should_failslab(struct kmem_cache *s, gfp_t
gfpflags)
if (failslab.cache_filter && !(s->flags & SLAB_FAILSLAB))
return false;
- if (gfpflags & __GFP_NOWARN)
- failslab.attr.no_warn = true;
-
- return should_fail(&failslab.attr, s->object_size);
+ return should_fail_gfp(&failslab.attr, s->object_size, gfpflags);
}
static int __init setup_failslab(char *str)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7192ded44ad0..4e70b5599ada 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3912,10 +3912,7 @@ static bool __should_fail_alloc_page(gfp_t
gfp_mask, unsigned int order)
(gfp_mask & __GFP_DIRECT_RECLAIM))
return false;
- if (gfp_mask & __GFP_NOWARN)
- fail_page_alloc.attr.no_warn = true;
-
- return should_fail(&fail_page_alloc.attr, 1 << order);
+ return should_fail_gfp(&fail_page_alloc.attr, 1 << order, gfp_mask);
}
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
But a warning should not be printed for callers that currently specify
__GFP_NOWARN, because that could lead to deadlocks, such as the deadlock
case mentioned in commit 6b9dbedbe349 ("tty: fix deadlock caused by
calling printk() under tty_port->lock").
Thanks,
Qi
But I am not sure if we really don't want to issue the fault injection
stack in this case. It's not a WARNING, it's merely an information
message. It looks useful in all cases, even with GFP_NOWARN. Why
should it be suppressed?
Once that is fixed we can quickly explain why the socketpair() example
in the docs shows success ret codes in the middle of the sweep when
run on syzkaller kernels
fail_nth interacts badly with other kernel features typically enabled
in syzkaller kernels. Eg it fails in hidden kmemleak instrumentation:
[ 18.499559] FAULT_INJECTION: forcing a failure.
[ 18.499559] name failslab, interval 1, probability 0, space 0,
times 0
[ 18.499720] CPU: 10 PID: 386 Comm: iommufd_fail_nt Not tainted
6.1.0-rc3+ #34
[ 18.499826] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 18.499971] Call Trace:
[ 18.500010] <TASK>
[ 18.500048] show_stack+0x3d/0x3f
[ 18.500114] dump_stack_lvl+0x92/0xbd
[ 18.500171] dump_stack+0x15/0x17
[ 18.500232] should_fail.cold+0x5/0xa
[ 18.500291] __should_failslab+0xb6/0x100
[ 18.500349] should_failslab+0x9/0x20
[ 18.500416] kmem_cache_alloc+0x64/0x4e0
[ 18.500477] ? __create_object+0x40/0xc50
[ 18.500539] __create_object+0x40/0xc50
[ 18.500620] ? kasan_poison+0x3a/0x50
[ 18.500690] ? kasan_unpoison+0x28/0x50
[***18.500753] kmemleak_alloc+0x24/0x30
[ 18.500816] __kmem_cache_alloc_node+0x1de/0x400
[ 18.500900] ? iopt_alloc_area_pages+0x95/0x560 [iommufd]
[ 18.500993] kmalloc_trace+0x26/0x110
[ 18.501059] iopt_alloc_area_pages+0x95/0x560 [iommufd]
Which has the consequence of syzkaller wasting half its fail_nth
effort because it is triggering failures in hidden instrumentation
that has no impact on the main code path.
Maybe a kmem_cache_alloc_no_fault_inject() would be helpful for a few
cases.
Jason
--
Thanks,
Qi