> -----Original Message----- > From: J. Bruce Fields [mailto:bfields@xxxxxxxxxxxx] > Sent: Tuesday, June 10, 2008 12:16 PM > To: Weathers, Norman R. > Cc: linux-nfs@xxxxxxxxxxxxxxx > Subject: Re: Problems with large number of clients and reads > > On Tue, Jun 10, 2008 at 09:30:18AM -0500, Weathers, Norman R. wrote: > > Unfortunately, I cannot stop the clients (middle of long running > > jobs). I might be able to test this soon. If I have the number of > > threads high, yes I can reduce the number of threads and it > appears to > > lower some of the memory, but even with as little as three threads, > > the memory usage climbs very high, just not as high as if there are > > say 8 threads. When the memory usage climbs high, it can cause the > > box to not respond over the network (ssh, rsh), and even be very > > sluggish when I am connected over our serial console to the > server(s). > > This same scenario has been happening with kernels that I have tried > > from 2.6.22.x on to the 2.6.25 series. The 2.6.25 series is > > interesting in that I can push the same load from a box with the > > 2.6.25 kernel and not have a load over .3 (with 3 threads), but with > > the 2.6.22.x kernel, I have a load of over 3 when I hit the same > > conditions. > > OK, I think what we want to do is turn on > CONFIG_DEBUG_SLAB_LEAK. I've > never used it before, but it looks like it will report which functions > are allocating from each slab cache, which may be exactly what we need > to know. So: > > 1. Install a kernel with both CONFIG_DEBUG_SLAB ("Debug slab > memory allocations") and CONFIG_DEBUG_SLAB_LEAK ("Memory leak > debugging") turned on. They're both under the "kernel hacking" > section of the kernel config. (If you have a file > /proc/slab_allocators, then you already have these turned on and > you can skip this step.) > > 2. Do whatever you need to do to reproduce the problem. > > 3. Get a copy of /proc/slabinfo and /proc/slab_allocators. > > Then we can take a look at that and see if it sheds any light. I have taken several snapshots of the /proc/slab_allocators and /proc/slabinfo as requested, but since there is a lot of info in them, and I didn't think anyone wanted to go cross-eyed reading the data in an email, I have them up on a website: http://shashi-weathers.net/linux/cluster/NFS/ The order of data collection is: slab_allocators_bad1.txt and corresponding slabinfo slab_allocators_after_bad1.txt and corresponding slabinfo slab_allocators_16_threads.txt and corresponding slabinfo slab_allocators_16_threads_1.txt and corresponding slabinfo slab_allocators_32_threads.txt and corresponding slabinfo slab_allocators_really_bad.txt and corresponding slabinfo. You will have to forgive my ignorance at this point, but I was looking through the slabinfo and slab_allocators, and noticed that size-4096 does not show up in slab_allocators... I hope that is by design. You can see it growing into the gigabytes in the slabinfo files.... > > I think that debugging will hurt the server performance, so you won't > want to keep it turned on all the time. > > > > > Also, this is all with the SLAB cache option. SLUB crashes > everytime > > I use it under heavy load. > > Have you reported the SLUB bugs to lkml? No, I haven't yet. I didn't know for sure if I was doing something wrong, or if SLUB was the problem there. Since the failures, I had gone back to using SLAB anyway, so .... I probably should... > > --b. > Norman Weathers -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html