On 2024-12-03, Petr Mladek <pmladek@xxxxxxxx> wrote: > On Sun 2024-12-01 12:40:13, Geert Uytterhoeven wrote: >> Isn't this kernel log message bookkeeping becoming a bit too excessive? >> E.g. on a small system: >> >> printk: log buffer data + meta data: 65536 + 204800 = 270336 bytes >> >> Why is the meta data that big (> 3*log buffer)? >> >> #define PRB_AVGBITS 5 /* 32 character average length */ >> >> unsigned int descs_count = log_buf_len >> PRB_AVGBITS; >> meta_data_size = descs_count * (sizeof(struct prb_desc) + >> sizeof(struct printk_info)); >> >> struct prb_data_blk_lpos { >> unsigned long begin; >> unsigned long next; >> }; >> >> struct prb_desc { >> atomic_long_t state_var; >> struct prb_data_blk_lpos text_blk_lpos; >> }; >> >> i.e. 12 bytes on 32-bit, 24 bytes on 64-bit. > > I am afraid that we could not do much about the size of this part. > All the variables are important parts of the lockless machinery. The descriptor array is one source of wasted space. It ensures there are enough descriptors so that the text_ring can be fully maximized for an average record text length of 32. However, looking at /sys/kernel/debug/printk/index/vmlinux and running some basic tests, it seems the average text length is >=45 (usually around 55). That means at least 30% of the descriptor space is not in use, which is roughtly 5% of the total memory used by all ringbuffer components. For example, for CONFIG_LOG_BUF_SHIFT=13, the amount of wasted desc_ring space is about 1.8KiB. (The total memory usage of the ringbuffer is 36KiB.) However, if we bump the expected average size to 64, there will not be enough descriptors, leading to wasted text_ring buffer space. (The expected size must be a power of 2 due to the ID wrapping algorithm.) >> #define PRINTK_INFO_SUBSYSTEM_LEN 16 >> #define PRINTK_INFO_DEVICE_LEN 48 >> >> struct dev_printk_info { >> char subsystem[PRINTK_INFO_SUBSYSTEM_LEN]; >> char device[PRINTK_INFO_DEVICE_LEN]; >> }; > > This is probably the lowest hanging fruit. Yes, the info array is also a source of wasted space. This was known at the time when we discussed [0] introducing this array. Many records do not even use the dev_printk_info fields. (A running system seems to use them more often than a booting system.) However, the 64 bytes is currently quite large. During some testing I typically saw 70% of the dev_printk_info text space for valid descriptors unused. I typically saw dev_printk_info averages of: - 5 bytes for SUBSYSTEM - 12 bytes for DEVICE However, this problem is compounded by the descriptor array issue I mentioned at the beginning. The reason is that there is a 1:1 relationship between descriptors and dev_printk_info structs. So if 30% of the descriptors are invalid, then that adds an additional waste of 30% totally unused dev_printk_info structs. For example, for CONFIG_LOG_BUF_SHIFT=13, that represents a total of 13KiB wasted space in the info array (36% of the total memory usage). > It should be possible to write these dev_printk-specific metadata into > the data buffer a more efficient way and only for records created by > dev_printk(). > > It would require adding "dict_len" into "struct printk_info" > and reserving space for both "text_len" and "dict_len". > > We bundled it into the metadata because these are metadata. > We wanted to keep the design as clean as possible. We focused > mainly on the stability and maintainability of the code. > And it was really challenging to get it all working. I think keeping it bundled in the meta data is correct. But if those text arrays could be allocated from a text ring, then the space could be used efficiently. I am not recommending that we add the dict_ring back. It was considerably more complex. But perhaps we could use the text_ring for these allocations as well. It may even have the benefit that the "average text size" becomes >=64, which would also help with the first wasted item I mentioned. >> struct printk_info { >> u64 seq; /* sequence number */ > > I do not recal the details. But I think that we need to > explicitely store the 64-bit "seq" number in the metadata > because of the lockless algoritm. It helps to solve > problems with wrapping of the counter in > "atomic_long_t state_var". Yes. I could not figure out a way to safely update a @log_first_seq to represent the first sequence available in the ringbuffer. (The complexity involves both writers and readers seeing appropriate sequence values.) If that could be done somehow, that would save another 2KiB (for CONFIG_LOG_BUF_SHIFT=13). In summary... I think the only quick fix we could do immediately is reduce PRINTK_INFO_DEVICE_LEN. On my various test machines, I never encountered anything above 25. Perhaps reducing it from 48 to 32? That would immediately reclaim 11% (4KiB for CONFIG_LOG_BUF_SHIFT=13) and it would not require any changes to the various crash/dump tools. In the longer term we may want to consider using the text ring for additional device/subsystem string allocation. This would only require changes to crash/dump tools that provide the device/subsystem information. In addition, if low-memory single CPU systems are am important target, we might be better off implementing a CONFIG_PRINTK_RINGBUFFER_TINY. Implementing a lockless ringbuffer for a single CPU is trivial (in comparison) and would not require all the management abstractions. If it used the same printk_ringbuffer API it could be a simple drop-in replacement. John Ogness [0] https://lore.kernel.org/lkml/20200904124530.GB20558@alley