Re: [RFC] jbd2: add new "stats" proc file

Xiaoguang Wang <xiaoguang.wang@xxxxxxxxxxxxxxxxx> · Wed, 5 Jun 2019 15:05:52 +0800

hi,

On Mon, Jun 03, 2019 at 08:42:38PM +0800, Xiaoguang Wang wrote:
/proc/fs/jbd2/${device}/info only shows whole average statistical
info about jbd2's life cycle, but it can not show jbd2 info in
specified time interval and sometimes this capability is very useful
for trouble shooting. For example, we can not see how rs_locked and
rs_flushing grows in specified time interval, but these two indexes
can explain some reasons for app's behaviours.

We actually had something like this, but we removed it in commit
bf6993276f7: "jbd2: Use tracepoints for history file".  The idea was
that you can get the same information using the jbd2_run_tracepoints

# echo jbd2_run_stats > /sys/kernel/debug/tracing/set_event
# cat /sys/kernel/debug/tracing/trace_pipe

... which will produce output like this:

       jbd2/vdg-8-293   [000] ...2   122.822487: jbd2_run_stats: dev 254,96 tid 4403 wait 0 request_delay 0 running 4 locked 0 flushing 0 logging 7 handle_count 98 blocks 3 blocks_logged 4
       jbd2/vdg-8-293   [000] ...2   122.833101: jbd2_run_stats: dev 254,96 tid 4404 wait 0 request_delay 0 running 14 locked 0 flushing 0 logging 4 handle_count 198 blocks 1 blocks_logged 2
       jbd2/vdg-8-293   [000] ...2   122.839325: jbd2_run_stats: dev 254,96 tid 4405 wait

With eBPF, we should be able to do something even more user friendly.
Yes, I'm learning it :)
For this patch, it's because we'd like to implement a monitor system based
on web to show jbd2 status's change, then for example if buffered write reports
high latency and jbd2 rs_locked and rs_flushing also report high value, we may
build a connection between buffered write and jbd2.
Previously we planned to make above monitor system parse a jbd2 status file provided
by kernel,this would be simplest. But ok, we can try to use ebpf.

BTW, if you are looking to try to optimize jbd2, a good thing to do is
to take a look at jbd2_handle_stats, filtered on ones where the
interval is larger than some cut-off.  Ideally, the time between a
handle getting started and stopped should be as small as possible,
because if a transaction is trying to close, an open handle will get
in the way of that, and other CPU's will be stuck waiting for handle
to complete.  This means that pre-reading blocks before starting a
handle, etc., is a really good idea.  And monitoring jbd2_handle_stats
is a good way to find potential spots to topimize in ext4.
Thanks for your detailed explanation and suggestions.

Regards,
Xiaoguang Wang

      	      	      		      	 	  - Ted