Re: kernel 2.6.25-rc7 highly unstable on high load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Denys Fedoryshchenko a écrit :
Here is output from dmesg with patch you supplied.

Kup /config # rtstat -i60 -c60
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 103266| 69986| 11574| 0| 54| 0| 0| 0| 1982| 634| 0| 10984| 10980| 0| 0| 191808| 5254| 124787| 45125| 6019| 0| 28| 0| 0| 0| 807| 230| 0| 6277| 6274| 0| 0| 128922| 2518| 120270| 45588| 6288| 0| 30| 0| 0| 0| 883| 214| 0| 6532| 6529| 0| 0| 125651| 2743| 122253| 46522| 6582| 0| 27| 0| 0| 0| 897| 213| 0| 6822| 6819| 0| 0| 124927| 2761|

[ 102.534363] dst_total: 120397 delayed: 12 work_perf: 0 expires: 27999 elapsed: 1 us [ 130.530240] dst_total: 124277 delayed: 12 work_perf: 0 expires: 32998 elapsed: 2 us [ 163.523240] dst_total: 110006 delayed: 12 work_perf: 0 expires: 39000 elapsed: 1 us [ 202.519402] dst_total: 130453 delayed: 12 work_perf: 0 expires: 45998 elapsed: 1 us [ 248.511220] dst_total: 110637 delayed: 12 work_perf: 0 expires: 52600 elapsed: 2 us [ 301.102445] dst_total: 129366 delayed: 12 work_perf: 0 expires: 60696 elapsed: 6 us

After while

Kup /config # rtstat -i300 -c60
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 129138| 64166| 9864| 0| 46| 0| 0| 0| 1334| 358| 0| 10012| 10008| 0| 0| 173006| 3985| 130125| 43971| 10829| 0| 70| 0| 0| 0| 907| 195| 0| 11093| 11090| 0| 0| 100601| 2848| 144857| 32041| 22233| 0| 234| 0| 0| 0| 738| 261| 0| 22727| 22724| 2| 0| 42769| 811|

[ 687.740779] dst_total: 129365 delayed: 12 work_perf: 0 expires: 104999 elapsed: 1 us [ 792.736043] dst_total: 127247 delayed: 101589 work_perf: 0 expires: 600 elapsed: 11274 us [ 793.347127] dst_total: 122440 delayed: 101589 work_perf: 0 expires: 1600 elapsed: 11150 us [ 794.957419] dst_total: 123780 delayed: 101589 work_perf: 0 expires: 3100 elapsed: 10896 us [ 798.068614] dst_total: 121992 delayed: 101589 work_perf: 0 expires: 5655 elapsed: 11561 us [ 803.734450] dst_total: 120552 delayed: 101589 work_perf: 0 expires: 7988 elapsed: 12254 us [ 811.733227] dst_total: 121105 delayed: 101589 work_perf: 0 expires: 10988 elapsed: 12209 us [ 822.733196] dst_total: 120142 delayed: 101589 work_perf: 0 expires: 13987 elapsed: 12039 us [ 836.730637] dst_total: 120823 delayed: 101589 work_perf: 0 expires: 17987 elapsed: 12448 us [ 854.726913] dst_total: 127478 delayed: 101589 work_perf: 0 expires: 22988 elapsed: 11435 us [ 877.724827] dst_total: 127497 delayed: 101589 work_perf: 0 expires: 27987 elapsed: 12833 us [ 905.719818] dst_total: 126693 delayed: 101589 work_perf: 0 expires: 32987 elapsed: 12068 us [ 938.714595] dst_total: 130740 delayed: 101589 work_perf: 0 expires: 38988 elapsed: 11970 us [ 977.707568] dst_total: 135449 delayed: 101589 work_perf: 0 expires: 45989 elapsed: 11747 us [ 1023.701554] dst_total: 141181 delayed: 101589 work_perf: 0 expires: 52988 elapsed: 12653 us [ 1076.694012] dst_total: 147895 delayed: 101588 work_perf: 1 expires: 59987 elapsed: 12217 us [ 1136.684805] dst_total: 154798 delayed: 101588 work_perf: 0 expires: 67987 elapsed: 12026 us [ 1204.673424] dst_total: 163376 delayed: 101588 work_perf: 0 expires: 76988 elapsed: 11151 us

and after 30 minutes more

Kup /config # rtstat -i300 -c60
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 194841| 176962| 77964| 0| 684| 1| 0| 0| 3939| 1143| 0| 79531| 79516| 7| 0| 401258| 9455| 223224| 28262| 26499| 0| 236| 0| 0| 0| 661| 216| 0| 26951| 26947| 3| 0| 93059| 1946| 248425| 28257| 26784| 0| 226| 0| 0| 0| 678| 202| 0| 27211| 27208| 3| 0| 115593| 2412| 272823| 28943| 26728| 0| 246| 0| 0| 0| 695| 200| 0| 27174| 27171| 3| 0| 135256| 2942| 295087| 28835| 26701| 0| 252| 0| 0| 0| 687| 194| 0| 27146| 27143| 3| 0| 154269| 3312|


[ 2043.547879] dst_total: 242922 delayed: 101588 work_perf: 0 expires: 119987 elapsed: 12519 us [ 2163.529997] dst_total: 252951 delayed: 101588 work_perf: 0 expires: 119987 elapsed: 13930 us [ 2283.512215] dst_total: 262678 delayed: 101588 work_perf: 0 expires: 119986 elapsed: 14330 us [ 2403.508447] dst_total: 271930 delayed: 101588 work_perf: 0 expires: 119972 elapsed: 15230 us [ 2523.476844] dst_total: 288046 delayed: 101588 work_perf: 0 expires: 119985 elapsed: 15018 us [ 2643.456996] dst_total: 296475 delayed: 101588 work_perf: 0 expires: 119987 elapsed: 13654 us [ 2763.438267] dst_total: 305033 delayed: 101588 work_perf: 0 expires: 119987 elapsed: 13137 us


Very interesting... you can see 101588 dst are *delayed* in dst_garbage, but apparently never freed.

Something is wrong, but the count seems stable. Must be some kind of event, admin driven or something... might be the "ip route flush cache" that is schedlued around 600 seconds of machine alive, then secret_interval seconds later...

Typically, when a "ip route flush cache" is done (manually or triggered by secret_interval timer), refcounted>0 entries are put into dst_garbage.

Then when some trafic occurs on the flows involved, IP stack should decrement refcount so that next dst_garbage round can free the deleted entries. Normal TCP connections are doing this correctly. On your machine, nothing.

Possibly idle tcp sessions ?

Maybe some netfilter problem ?

Please tell us more about your machine ;)




--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netfitler Users]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux