On 07/31/2012 06:52 PM, James Bottomley wrote: > On Tue, 2012-07-31 at 09:31 -0500, Christoph Lameter wrote: >> On Tue, 31 Jul 2012, Glauber Costa wrote: >> >>> On 07/31/2012 06:17 PM, Christoph Lameter wrote: >>>> On Tue, 31 Jul 2012, Glauber Costa wrote: >>>> >>>>> On 07/31/2012 06:09 PM, Christoph Lameter wrote: >>>>>> That is understood. Typically these object where page sized though and >>>>>> various assumptions (pretty dangerous ones as you are finding out) are >>>>>> made regarding object reuse. The fallback of SLUB for higher order allocs >>>>>> to the page allocator avoids these problems for higher order pages. >>>>> omg... >>>> >>>> I would be very thankful if you would go through the tree and check for >>>> any remaining use cases like that. Would take care of your problem. >>> >>> I would be happy to do it. Do you have any example of any user that >>> behaved like this in the past, so I can search for something similar? >>> >>> This can potentially take many forms, and auditing every kfree out there >>> is not humanly possible. The best I can do is to search for known >>> patterns here... >> >> The basic problem is that someone will take the address of an object that >> is allocated via slab and then access the page struct to increase the page >> count. >> >> So you would see >> >> page = virt_to_page(<slab_object>); >> >> get_page(page); >> >> >> The main cuprit in the past has been the DMA code in the SCSI layer. I >> think it was the first 512 byte control block for the device that was the >> main issue. There was a discussion betwen Hugh Dickins and me when SLUB >> was first released about this issue and it resulted in some changes so >> that certain fields in the page struct were not touched by SLUB since they >> were needed for I/O. > > Hey, don't try to pin this on me. We don't use get_page() at all on the > ordinary DMA route. There are four get_page() calls in the whole of > drivers/scsi. One is in the sg.c fault path, which looks genuine. The > other three are in fcoe and iSCSI ... what they're trying to do is to > ensure that the page hangs around until the device sees the data in a > network tx path. > > I can't see why any of these pages would come from kmalloc() or any > other slab object since they should all be user pages. > > James > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> > On 07/31/2012 06:52 PM, James Bottomley wrote: > On Tue, 2012-07-31 at 09:31 -0500, Christoph Lameter wrote: >> On Tue, 31 Jul 2012, Glauber Costa wrote: >> >>> On 07/31/2012 06:17 PM, Christoph Lameter wrote: >>>> On Tue, 31 Jul 2012, Glauber Costa wrote: >>>> >>>>> On 07/31/2012 06:09 PM, Christoph Lameter wrote: >>>>>> That is understood. Typically these object where page sized though and >>>>>> various assumptions (pretty dangerous ones as you are finding out) are >>>>>> made regarding object reuse. The fallback of SLUB for higher order allocs >>>>>> to the page allocator avoids these problems for higher order pages. >>>>> omg... >>>> >>>> I would be very thankful if you would go through the tree and check for >>>> any remaining use cases like that. Would take care of your problem. >>> >>> I would be happy to do it. Do you have any example of any user that >>> behaved like this in the past, so I can search for something similar? >>> >>> This can potentially take many forms, and auditing every kfree out there >>> is not humanly possible. The best I can do is to search for known >>> patterns here... >> >> The basic problem is that someone will take the address of an object that >> is allocated via slab and then access the page struct to increase the page >> count. >> >> So you would see >> >> page = virt_to_page(<slab_object>); >> >> get_page(page); >> >> >> The main cuprit in the past has been the DMA code in the SCSI layer. I >> think it was the first 512 byte control block for the device that was the >> main issue. There was a discussion betwen Hugh Dickins and me when SLUB >> was first released about this issue and it resulted in some changes so >> that certain fields in the page struct were not touched by SLUB since they >> were needed for I/O. > > Hey, don't try to pin this on me. We don't use get_page() at all on the > ordinary DMA route. There are four get_page() calls in the whole of > drivers/scsi. One is in the sg.c fault path, which looks genuine. The > other three are in fcoe and iSCSI ... what they're trying to do is to > ensure that the page hangs around until the device sees the data in a > network tx path. > > I can't see why any of these pages would come from kmalloc() or any > other slab object since they should all be user pages. > I've audited all users of get_page() in the drivers/ directory for patterns like this. In general, they kmalloc something like a table of entries, and then get_page() the entries. The entries are either user pages, pages allocated by the page allocator, or physical addresses through their pfn (in 2 cases from the vga ones...) I took a look about some other instances where virt_to_page occurs together with kmalloc as well, and they all seem to fall in the same category. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>