On Thu, 10 Aug 2006 11:51:34 -0700 Badari Pulavarty <pbadari@xxxxxxxxxx> wrote: > Andrew Morton wrote: > > Also, JBD is presently feeding into submit_bh() buffer_heads which span two > > machine pages, and some device drivers spit the dummy. It'd be better to > > fix that once, rather than twice.. > > > Andrew, > > I looked at this few days ago. I am not sure how we end up having > multiple pages (especially, > why we end up having buffers with bh_size > pagesize) ? Do you know why ? > It's one or both of the jbd_kmalloc(bh->b_size) calls in fs/jbd/transaction.c. Here we're allocating data to attach to a bh which later gets fed into submit_bh(). Problem is, with CONFIG_DEBUG_SLAB=y, the data which kmalloc() returns can be offset by 4 bytes due to redzoning. Example: if the fs is using a 1k blocksize and we have a 4k pagesize, the data coming back from kmalloc may have an address of 0xnnnnxc04, so the data which we later feed into submit_bh() will span two pages. A simple fix would be to replace kmalloc() with a call to alloc_page(). We'd need to work out how much memory that will worst-case-waste. If "not much" then OK. If "quite a lot in the worst case" then we'd need something more elaborate. I'd suggest that ext3 implement ext3-private slab caches of size 1024, 2048, 4096 and perhaps 8192. Those caches should be kmem_cache_create()d on-demand at mount-time. They should be created with appropriate slab options to defeat the redzoning. The transaction.c code should use the appropriate slab (based on b_size) rather than using kmalloc(). The up-to-four private slab caches should be destroyed on ext3 rmmod. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html