Re: [RFC PATCH 0/4] make jbd2 debug switch per device

brookxu <brookxu.cn@xxxxxxxxx> · Mon, 25 Jan 2021 21:59:09 +0800

Thanks for your reply.

Jan Kara wrote on 2021/1/25 20:41:
> On Fri 22-01-21 14:43:18, Chunguang Xu wrote:
>> On a multi-disk machine, because jbd2 debugging switch is global, this
>> confuses the logs of multiple disks. It is not easy to distinguish the
>> logs of each disk and the amount of generated logs is very large. Or a
>> separate debugging switch for each disk would be better, so that you
>> can easily distinguish the logs of a certain disk. 
>>
>> We can enable jbd2 debugging of a device in the following ways:
>> echo X > /proc/fs/jbd2/sdX/jbd2_debug
>>
>> But there is a small disadvantage here. Because the debugging switch is
>> placed in the journal_t object, the log before the object is initialized
>> will be lost. However, usually this will not have much impact on
>> debugging.
> 
> OK, I didn't look at the series yet but I'm wondering: How are you using
> jbd2 debugging? I mean obviously it isn't meant for production use but
> rather for debugging JBD2 bugs so I'm kind of wondering in which case too
> many messages matter.
We perform stress testing on machines in the test environment, and use scripts
to capture journal related logs to analyze problems. There are 12 disks on this
machine, and each disk runs different jobs. Our test kernel also adds some
additional function-related logs. If we adjust the log level to a higher level,
a large number of logs have nothing to do with the disk to be observed. These
logs are generated by system agents or coordinated tasks. This makes the log
difficul to analyze.

> And if the problem is that there's a problem with distinguishing messages
> from multiple filesystems, then it would be perhaps more useful to add
> journal identification to each message similarly as we do it with ext4
> messages (likely by using journal->j_dev) - which is very simple to do
> after your patches 3 and 4.
Our test kernel did this. Because it broke the log format, I was not sure whether
it would break something, so I didn't bring this part. Even if the device information
is added, when there are more disks and the log level is higher, there will be a
lot of irrelevant logs, which makes it necessary to consume a lot of CPU to filter
messages. Therefore, a device-level switch is provided to make everything simpler.
> 
> 								Honza
>