Hi,
在 2024/11/01 4:33, John Stoffel 写道:
"Christian" == Christian Theune <ct@xxxxxxxxxxxxxxx> writes:
Hi,
the system has been running under stress for a while on 6.11.5 with the debugging. I have two observations so far:
1. The bitmap_counts are sometimes low and sometimes very high and intermingled like this:
Oct 31 20:41:27 barbrady09 kernel: __add_stripe_bio: md127: start ff2721bf1db20000(29009381448+8) 7
Oct 31 20:41:27 barbrady09 kernel: __add_stripe_bio: md127: start ff2721bf9d6fbf80(29009382168+8) 5
Oct 31 20:41:27 barbrady09 kernel: __add_stripe_bio: md127: start ff2721beec896f20(29009381928+8) 4294967242
For this 'sh', can you grep "ff2721beec896f20" for the whole log and
show the results? Looks like bitmap_startwrite and endwrite is not
balanced for this 'sh', and this might be a real problem.
You can also do the same for some other 'sh'.
Thanks,
Kuai
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721c108f26f20(29009374480+8) 3
Oct 31 20:41:27 barbrady09 kernel: __add_stripe_bio: md127: start ff2721bfb083df40(29009381456+8) 7
Oct 31 20:41:27 barbrady09 kernel: __add_stripe_bio: md127: start ff2721bfc92a2fa0(29009381936+8) 5
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721c108f26f20(29009374480+8) 2
Oct 31 20:41:27 barbrady09 kernel: __add_stripe_bio: md127: start ff2721c074f8df40(29009381464+8) 7
Oct 31 20:41:27 barbrady09 kernel: __add_stripe_bio: md127: start ff2721bfa3b2df40(29009381944+8) 5
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721c108f26f20(29009374480+8) 1
Oct 31 20:41:27 barbrady09 kernel: __add_stripe_bio: md127: start ff2721beec219fc0(29009381472+8) 4294967268
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721c108f26f20(29009374480+8) 0
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721beec030000(29009374488+8) 4294967247
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721beec030000(29009374488+8) 4294967246
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721beec030000(29009374488+8) 4294967245
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721beec030000(29009374488+8) 4294967244
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721beec030000(29009374488+8) 4294967243
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721beec030000(29009374488+8) 4294967242
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721beec030000(29009374488+8) 4294967241
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721bf21496f20(29009374496+8) 6
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721bf21496f20(29009374496+8) 5
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721bf21496f20(29009374496+8) 4
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721bf21496f20(29009374496+8) 3
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721bf21496f20(29009374496+8) 2
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721bf21496f20(29009374496+8) 1
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721bf21496f20(29009374496+8) 0
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721c1aa216f20(29009374504+8) 6
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721c1aa216f20(29009374504+8) 5
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721c1aa216f20(29009374504+8) 4
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721c1aa216f20(29009374504+8) 3
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721c1aa216f20(29009374504+8) 2
Is the high number an indicator of something weird?
Is this number wrapping around and not being detected? Maybe a
signed/unsigned issue? Total wild ass guess on my part...
.