On Fri, Jun 13, 2008 at 04:53:31PM -0500, Weathers, Norman R. wrote: > > > > > The big one seems to be the __alloc_skb. (This is with 16 > > threads, and > > > it says that we are using up somewhere between 12 and 14 GB > > of memory, > > > about 2 to 3 gig of that is disk cache). If I were to put anymore > > > threads out there, the server would become almost > > unresponsive (it was > > > bad enough as it was). > > > > > > At the same time, I also noticed this: > > > > > > skbuff_fclone_cache: 1842524 __alloc_skb+0x50/0x170 > > > > > > Don't know for sure if that is meaningful or not.... > > > > OK, so, starting at net/core/skbuff.c, this means that this memory was > > allocated by __alloc_skb() calls with something nonzero in the third > > ("fclone") argument. The only such caller is alloc_skb_fclone(). > > Callers of alloc_skb_fclone() include: > > > > sk_stream_alloc_skb: > > do_tcp_sendpages > > tcp_sendmsg > > tcp_fragment > > tso_fragment > > Interesting you should mention the tso... We recently went through and > turned on TSO on all of our systems, trying it out to see if it helped > with performance... This could be something to do with that. I can try > disabling the tso on all of the servers and see if that helps with the > memory. Actually, I think I will, and I will monitor the situation. I > think it might help some, but I still think there may be something else > going on in a deep corner... I'll plead total ignorance about TSO, and it sounds like a long shot--but sure, it'd be worth trying, thanks. > > > tcp_mtu_probe > > tcp_send_fin > > tcp_connect > > buf_acquire: > > lots of callers in tipc code (whatever that is). > > > > So unless you're using tipc, or you have something in userspace going > > haywire (perhaps netstat would help rule that out?), then I suppose > > there's something wrong with knfsd's tcp code. Which makes sense, I > > guess. > > > > Not for sure what tipc is either.... > > > I'd think this sort of allocation would be limited by the number of > > sockets times the size of the send and receive buffers. > > svc_xprt.c:svc_check_conn_limits() claims to be limiting the number of > > sockets to (nrthreads+3)*20. (You aren't hitting the "too many open > > connections" printk there, are you?) The total buffer size should be > > bounded by something like 4 megs. > > > > --b. > > > > Yes, we are getting a continuous stream of the too many open connections > scrolling across our logs. That's interesting! So we should probably look more closely at the svc_check_conn_limits() behavior. I wonder whether some pathological behavior is triggered in the case where you're constantly over the limit it's trying to enforce. (Remind me how many active clients you have?) > No problems. I feel good if I exercised some deep corner of the code > and found something that needed flushed out, that's what the experience > is all about, isn't it? Yep! --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html