On Wed, 15 Nov 2023 at 22:53, Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote: > > On 2023/11/14 15:27, Dmitry Vyukov wrote: > > On Mon, 13 Nov 2023 at 22:17, Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote: > >> > >> Record and report more information to help us find the cause of > >> the bug (for example, bugs caused by subtle race condition). > >> > >> This patch adds recording and showing CPU number and timestamp at > >> allocation and free (controlled by CONFIG_KASAN_EXTRA_INFO), and > >> adds recording and showing timestamp at error occurrence (CPU number > >> is already shown by dump_stack_lvl). The timestamps in the report use > >> the same format and source as printk. > >> > >> In order to record CPU number and timestamp at allocation and free, > >> corresponding members need to be added to the relevant data structures, > >> which may lead to increased memory consumption. > >> > >> In Generic KASAN, members are added to struct kasan_track. Since in > >> most cases, alloc meta is stored in the redzone and free meta is > >> stored in the object or the redzone, memory consumption will not > >> increase much. > >> > >> In SW_TAGS KASAN and HW_TAGS KASAN, members are added to > >> struct kasan_stack_ring_entry. Memory consumption increases as the > >> size of struct kasan_stack_ring_entry increases (this part of the > >> memory is allocated by memblock), but since this is configurable, > >> it is up to the user to choose. > >> > >> Signed-off-by: Juntong Deng <juntong.deng@xxxxxxxxxxx> > >> --- > >> lib/Kconfig.kasan | 11 +++++++++++ > >> mm/kasan/common.c | 5 +++++ > >> mm/kasan/kasan.h | 9 +++++++++ > >> mm/kasan/report.c | 28 ++++++++++++++++++++++------ > >> mm/kasan/report_tags.c | 18 ++++++++++++++++++ > >> mm/kasan/tags.c | 15 +++++++++++++++ > >> 6 files changed, 80 insertions(+), 6 deletions(-) > >> > >> diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan > >> index fdca89c05745..d9611564b339 100644 > >> --- a/lib/Kconfig.kasan > >> +++ b/lib/Kconfig.kasan > >> @@ -207,4 +207,15 @@ config KASAN_MODULE_TEST > >> A part of the KASAN test suite that is not integrated with KUnit. > >> Incompatible with Hardware Tag-Based KASAN. > >> > >> +config KASAN_EXTRA_INFO > >> + bool "Record and report more information" > >> + depends on KASAN > >> + help > >> + Record and report more information to help us find the cause of > >> + the bug. The trade-off is potentially increased memory consumption > >> + (to record more information). > >> + > >> + Currently the CPU number and timestamp are additionally recorded > >> + at allocation and free. > > > > Hi Juntong, > > > > Thanks for working on this. > > > > > Thanks for your reply! > > > > As a KASAN developer I understand what this is doing, but I am trying > > to think from a position of a user that does not know details of KASAN > > implementation. From this position it may be useful to say somewhere > > that information is recorded "per heap allocation". Perhaps something > > like: > > > > "Currently the CPU number and timestamp are additionally recorded for > > each heap block at allocation and free time". > > > Yes, I agree, that is a better expression. > > > > > > Also it's unclear what the memory consumption increase is. You say > > "potentially|, so may it not increase at all? If it increases, by how > > much? I obviously want more information, if I can afford it, but I > > can't understand if I can or not based on this description. I would > > assume that this may be a problem only for small/embedded devices. > > Can we provide some ballpark estimation of the memory consumption > > increase? And somehow say that's probably not an issue for larger > > machines? > > > > > How about this expression? > > Currently, in order to record CPU number and timestamp, the data > structure to record allocation and free information will increase > by 12 bytes. > > In Generic KASAN, this affects all allocations less than 32 bytes. > In SW_TAGS KASAN and HW_TAGS KASAN, depending on the stack_ring_size > boot parameter increases the memory consumption by > 12 * stack_ring_size bytes. Let's go with this version. > >> endif # KASAN > >> diff --git a/mm/kasan/common.c b/mm/kasan/common.c > >> index 256930da578a..7a81566d9d66 100644 > >> --- a/mm/kasan/common.c > >> +++ b/mm/kasan/common.c > >> @@ -20,6 +20,7 @@ > >> #include <linux/module.h> > >> #include <linux/printk.h> > >> #include <linux/sched.h> > >> +#include <linux/sched/clock.h> > >> #include <linux/sched/task_stack.h> > >> #include <linux/slab.h> > >> #include <linux/stacktrace.h> > >> @@ -50,6 +51,10 @@ void kasan_set_track(struct kasan_track *track, gfp_t flags) > >> { > >> track->pid = current->pid; > >> track->stack = kasan_save_stack(flags, true); > >> +#ifdef CONFIG_KASAN_EXTRA_INFO > >> + track->cpu = raw_smp_processor_id(); > >> + track->ts_nsec = local_clock(); > > > > What does "local_" mean? Is this clock value comparable across CPUs? > > > > > No, local_clock is the local CPU clock and cannot be used for comparison > across CPUs, I made a mistake here. > > I delved into the clock subsystem of the Linux kernel today and I found > that we have two choices. > > - sched_clock(): based on jiffies, high performance, but on some > hardware, it will drift between CPUs. > > - ktime_get_boot_fast_ns(): based on clocksource, highly accurate, > can be compared between CPUs, but performance is worse (seqlock). > > I tested 100000 calls respectively on my laptop, the average of > sched_clock() is 17ns and the average of ktime_get_boot_fast_ns() > is 25ns. > > ktime_get_boot_fast_ns() takes about 1.5 times as long as sched_clock(). > > With Generic KASAN enabled, the average of one memory allocation is > 3512ns on my laptop. > > Personally, I prefer ktime_get_boot_fast_ns() because it is more > accurate and the extra time is insignificant for the time required for > one memory allocation with Generic KASAN enabled. > > But maybe using ktime_get_boot_fast_ns() would have a more serious > impact on small/embedded devices. > > Which do you think is the better choice? I don't have a strong preference. Re drift of sched_clock(), do you mean unsynchronized RDTSC on different cores? I had the impression that RDTSC is synchronized across cores on all recent CPUs/systems. > >> +#endif /* CONFIG_KASAN_EXTRA_INFO */ > >> } > >> > >> #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) > >> diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h > >> index 8b06bab5c406..b3899a255aca 100644 > >> --- a/mm/kasan/kasan.h > >> +++ b/mm/kasan/kasan.h > >> @@ -187,6 +187,10 @@ static inline bool kasan_requires_meta(void) > >> struct kasan_track { > >> u32 pid; > >> depot_stack_handle_t stack; > >> +#ifdef CONFIG_KASAN_EXTRA_INFO > >> + u32 cpu; > >> + u64 ts_nsec; > > > > This increases the size of meta from 8 to 24 bytes. > > Can we somehow store the timestamp as 32 bits? Maybe compress the CPU > > number (it shouldn't be larger than 20 bits?)? > > I see below we report on microseconds, so we don't low bits of the > > timestamp as well. > > > > > Maybe we can use bit field. > > struct kasan_track { > u32 pid; > depot_stack_handle_t stack; > #ifdef CONFIG_KASAN_EXTRA_INFO > u64 cpu:20; > u64 ts_sec:22; > u64 ts_usec:22; > #endif /* CONFIG_KASAN_EXTRA_INFO */ > }; > > For example, 20-bit cpu, 22-bit ts_sec, 22-bit ts_usec, 64 bits > (8 bytes), and the data structure is 16 bytes in total. This looks better. Can't we have a single field for time instead of ts_sec/usec? Both sched_clock() and ktime_get_boot_fast_ns() return just u64. > If the data structure becomes 16 bytes, it will not affect objects > larger than 16 bytes. > > But the bit field can only be used in struct kasan_track (Generic), > and cannot be used in struct kasan_stack_ring_entry (SW_TAGS KASAN > and HW_TAGS KASAN). > > Because we need to use READ_ONCE and WRITE_ONCE to read and write > struct kasan_stack_ring_entry, but READ_ONCE and WRITE_ONCE cannot > read or write bit field. If that's necessary, we could store both values as a single u64 and the manually pack/unpack. > > I see there is a deficiency in kasan_cache_create(): > > https://elixir.bootlin.com/linux/latest/source/mm/kasan/generic.c#L412 > > > > If free_meta does not fit into the object, we add it after the object. > > But we could overlap it with the object. > > For example if the object size is 16 bytes and free_meta size is 24 > > bytes, we could increase object size to 24, while currently we > > increase it to 16+24 = 40. > > We need to place it after the object only if we have these other cases > > "(cache->flags & SLAB_TYPESAFE_BY_RCU) || cache->ctor". > > > > Currently it affects only kmalloc-8 slab. > > But with this change it will affect at least kmalloc-16 slab as well. > > > > > I completely agree that we can use both object space and redzone > space to store free meta, thereby further reducing the extra memory > consumption caused by KASAN. > > Of course, in this case we need to readjust the offset of the > alloc meta. > > If you agree I can make this change in a separate patch. This would be good. Thanks. > >> +#endif /* CONFIG_KASAN_EXTRA_INFO */ > >> }; > >> > >> enum kasan_report_type { > >> @@ -202,6 +206,7 @@ struct kasan_report_info { > >> size_t access_size; > >> bool is_write; > >> unsigned long ip; > >> + u64 ts_nsec; > >> > >> /* Filled in by the common reporting code. */ > >> const void *first_bad_addr; > >> @@ -278,6 +283,10 @@ struct kasan_stack_ring_entry { > >> u32 pid; > >> depot_stack_handle_t stack; > >> bool is_free; > >> +#ifdef CONFIG_KASAN_EXTRA_INFO > >> + u32 cpu; > >> + u64 ts_nsec; > >> +#endif /* CONFIG_KASAN_EXTRA_INFO */ > >> }; > >> > >> struct kasan_stack_ring { > >> diff --git a/mm/kasan/report.c b/mm/kasan/report.c > >> index e77facb62900..b6feaf807c08 100644 > >> --- a/mm/kasan/report.c > >> +++ b/mm/kasan/report.c > >> @@ -25,6 +25,7 @@ > >> #include <linux/types.h> > >> #include <linux/kasan.h> > >> #include <linux/module.h> > >> +#include <linux/sched/clock.h> > >> #include <linux/sched/task_stack.h> > >> #include <linux/uaccess.h> > >> #include <trace/events/error_report.h> > >> @@ -242,27 +243,40 @@ static void end_report(unsigned long *flags, const void *addr, bool is_write) > >> > >> static void print_error_description(struct kasan_report_info *info) > >> { > >> + unsigned long rem_usec = do_div(info->ts_nsec, NSEC_PER_SEC) / 1000; > >> + > >> pr_err("BUG: KASAN: %s in %pS\n", info->bug_type, (void *)info->ip); > >> > >> if (info->type != KASAN_REPORT_ACCESS) { > >> - pr_err("Free of addr %px by task %s/%d\n", > >> - info->access_addr, current->comm, task_pid_nr(current)); > >> + pr_err("Free of addr %px by task %s/%d at %lu.%06lus\n", > >> + info->access_addr, current->comm, task_pid_nr(current), > >> + (unsigned long)info->ts_nsec, rem_usec); > >> return; > >> } > >> > >> if (info->access_size) > >> - pr_err("%s of size %zu at addr %px by task %s/%d\n", > >> + pr_err("%s of size %zu at addr %px by task %s/%d at %lu.%06lus\n", > >> info->is_write ? "Write" : "Read", info->access_size, > >> - info->access_addr, current->comm, task_pid_nr(current)); > >> + info->access_addr, current->comm, task_pid_nr(current), > >> + (unsigned long)info->ts_nsec, rem_usec); > >> else > >> - pr_err("%s at addr %px by task %s/%d\n", > >> + pr_err("%s at addr %px by task %s/%d at %lu.%06lus\n", > >> info->is_write ? "Write" : "Read", > >> - info->access_addr, current->comm, task_pid_nr(current)); > >> + info->access_addr, current->comm, task_pid_nr(current), > >> + (unsigned long)info->ts_nsec, rem_usec); > >> } > >> > >> static void print_track(struct kasan_track *track, const char *prefix) > >> { > >> +#ifdef CONFIG_KASAN_EXTRA_INFO > >> + unsigned long rem_usec = do_div(track->ts_nsec, NSEC_PER_SEC) / 1000; > >> + > >> + pr_err("%s by task %u on cpu %d at %lu.%06lus:\n", > >> + prefix, track->pid, track->cpu, > >> + (unsigned long)track->ts_nsec, rem_usec); > >> +#else > >> pr_err("%s by task %u:\n", prefix, track->pid); > >> +#endif /* CONFIG_KASAN_EXTRA_INFO */ > >> if (track->stack) > >> stack_depot_print(track->stack); > >> else > >> @@ -544,6 +558,7 @@ void kasan_report_invalid_free(void *ptr, unsigned long ip, enum kasan_report_ty > >> info.access_size = 0; > >> info.is_write = false; > >> info.ip = ip; > >> + info.ts_nsec = local_clock(); > >> > >> complete_report_info(&info); > >> > >> @@ -582,6 +597,7 @@ bool kasan_report(const void *addr, size_t size, bool is_write, > >> info.access_size = size; > >> info.is_write = is_write; > >> info.ip = ip; > >> + info.ts_nsec = local_clock(); > >> > >> complete_report_info(&info); > >> > >> diff --git a/mm/kasan/report_tags.c b/mm/kasan/report_tags.c > >> index 8b8bfdb3cfdb..4d62f1b3e11d 100644 > >> --- a/mm/kasan/report_tags.c > >> +++ b/mm/kasan/report_tags.c > >> @@ -26,6 +26,18 @@ static const char *get_common_bug_type(struct kasan_report_info *info) > >> return "invalid-access"; > >> } > >> > >> +#ifdef CONFIG_KASAN_EXTRA_INFO > >> +static void kasan_complete_extra_report_info(struct kasan_track *track, > >> + struct kasan_stack_ring_entry *entry) > >> +{ > >> + u32 cpu = READ_ONCE(entry->cpu); > >> + u64 ts_nsec = READ_ONCE(entry->ts_nsec); > >> + > >> + track->cpu = cpu; > >> + track->ts_nsec = ts_nsec; > >> +} > >> +#endif /* CONFIG_KASAN_EXTRA_INFO */ > >> + > >> void kasan_complete_mode_report_info(struct kasan_report_info *info) > >> { > >> unsigned long flags; > >> @@ -82,6 +94,9 @@ void kasan_complete_mode_report_info(struct kasan_report_info *info) > >> > >> info->free_track.pid = pid; > >> info->free_track.stack = stack; > >> +#ifdef CONFIG_KASAN_EXTRA_INFO > >> + kasan_complete_extra_report_info(&info->free_track, entry); > >> +#endif /* CONFIG_KASAN_EXTRA_INFO */ > >> free_found = true; > >> > >> /* > >> @@ -97,6 +112,9 @@ void kasan_complete_mode_report_info(struct kasan_report_info *info) > >> > >> info->alloc_track.pid = pid; > >> info->alloc_track.stack = stack; > >> +#ifdef CONFIG_KASAN_EXTRA_INFO > >> + kasan_complete_extra_report_info(&info->alloc_track, entry); > >> +#endif /* CONFIG_KASAN_EXTRA_INFO */ > >> alloc_found = true; > >> > >> /* > >> diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c > >> index 7dcfe341d48e..474ce7e8be8b 100644 > >> --- a/mm/kasan/tags.c > >> +++ b/mm/kasan/tags.c > >> @@ -13,6 +13,7 @@ > >> #include <linux/memblock.h> > >> #include <linux/memory.h> > >> #include <linux/mm.h> > >> +#include <linux/sched/clock.h> > >> #include <linux/static_key.h> > >> #include <linux/string.h> > >> #include <linux/types.h> > >> @@ -92,6 +93,17 @@ void __init kasan_init_tags(void) > >> } > >> } > >> > >> +#ifdef CONFIG_KASAN_EXTRA_INFO > >> +static void save_extra_info(struct kasan_stack_ring_entry *entry) > >> +{ > >> + u32 cpu = raw_smp_processor_id(); > >> + u64 ts_nsec = local_clock(); > >> + > >> + WRITE_ONCE(entry->cpu, cpu); > >> + WRITE_ONCE(entry->ts_nsec, ts_nsec); > >> +} > >> +#endif /* CONFIG_KASAN_EXTRA_INFO */ > >> + > >> static void save_stack_info(struct kmem_cache *cache, void *object, > >> gfp_t gfp_flags, bool is_free) > >> { > >> @@ -124,6 +136,9 @@ static void save_stack_info(struct kmem_cache *cache, void *object, > >> WRITE_ONCE(entry->pid, current->pid); > >> WRITE_ONCE(entry->stack, stack); > >> WRITE_ONCE(entry->is_free, is_free); > >> +#ifdef CONFIG_KASAN_EXTRA_INFO > >> + save_extra_info(entry); > >> +#endif /* CONFIG_KASAN_EXTRA_INFO */ > >> > >> /* > >> * Paired with smp_load_acquire() in kasan_complete_mode_report_info(). > >> -- > >> 2.39.2 > >> >