Re: [PATCH v3] dma-debug: New interfaces to debug dma mapping errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2012-10-04 at 13:38 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Oct 03, 2012 at 08:55:59AM -0600, Shuah Khan wrote:
> > A recent dma mapping error analysis effort showed that a large percentage
> > of dma_map_single() and dma_map_page() returns are not checked for mapping
> > errors.
> > 
> > Reference:
> > http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis
> > 
> > Adding support for tracking dma mapping and unmapping errors to help assess
> > the following:
> > 
> > When do dma mapping errors get detected?
> > How often do these errors occur?
> > Why don't we see failures related to missing dma mapping error checks?
> > Are they silent failures?
> > 
> > Patch v3: Addresses review and design comments from Patch v2.
> > Patch v2: Addressed design issues from Patch v1.
> > 
> > Enhance dma-debug infrastructure to track dma mapping, and unmapping errors.
> > 
> > map_errors: (system wide counter)
> >   Total number of dma mapping errors returned by the dma mapping interfaces,
> >   in response to mapping requests from all devices in the system.
> > map_errors_not_checked: (system wide counter)
> >   Total number of dma mapping errors devices failed to check before using
> >   the returned address.
> > unmap_errors: (system wide counter)
> >   Total number of times devices tried to unmap or free an invalid dma
> >   address.
> > map_err_type: (new field added to dma_debug_entry structure)
> >   New field to maintain dma mapping error check status. This error type
> >   is applicable to the dma map page and dma map single entries tracked by
> >   dma-debug api. This status indicates whether or not a good mapping is
> >   checked by the device before its use. dma_map_single() and dma_map_page()
> >   could fail to create a mapping in some cases, and drivers are expected to
> >   call dma_mapping_error() to check for errors. Please note that this is not
> >   counter.
> > 
> > Enhancements to dma-debug api are made to add new debugfs interfaces to
> > report total dma errors, dma errors that are not checked, and unmap errors
> > for the entire system. Please note that these are system wide counters for
> > all devices in the system.
> > 
> > The following new dma-debug interface is added:
> > 
> > debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > 	Sets dma map error checked status for the dma map entry if one is
> > 	found. Decrements the system wide dma_map_errors_not_checked counter
> > 	that is incremented by debug_dma_map_page() when it checks for
> > 	mapping error before adding it to the dma debug entry table.
> > 
> > New dma-debug internal interface:
> > check_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > Calling dma_mapping_error() from dma-debug api will result in dma mapping
> > check when it shouldn't. This internal routine checks dma mapping error
> > without any debug checks. 
> > 
> > The following existing dma-debug api are changed to support this feature:
> > debug_dma_map_page()
> > 	Increments dma_map_errors and dma_map_errors_not_checked errors totals
> > 	for the system, dma-debug api keeps track of, when dma_addr is invalid.
> > 	This routine now calls internal check_mapping_error() interface to
> > 	avoid doing dma mapping debug checks from dma-debug internal mapping
> > 	error checks.
> > check_unmap()
> > 	This is an existing internal routines that checks for various mapping
> > 	errors. Changed to increment system wide dma_unmap_errors, when a
> > 	device requests an invalid address to be unmapped. This routine now
> > 	calls internal check_mapping_error() interface to avoid doing dma
> > 	mapping debug checks from dma-debug internal mapping error checks.
> > 
> > Changed arch/x86/include/asm/dma-mapping.h to call debug_dma_mapping_error()
> > to validate these new interfaces on x86_64. Other architectures will be
> > changed in a subsequent patch.
> > 
> > Tested: Intel iommu and swiotlb (iommu=soft) on x86-64 with
> >         CONFIG_DMA_API_DEBUG enabled and disabled.
> > 
> > Signed-off-by: Shuah Khan <shuah.khan@xxxxxx>
> > ---
> >  Documentation/DMA-API.txt          |   13 ++++
> >  arch/x86/include/asm/dma-mapping.h |    1 +
> >  include/linux/dma-debug.h          |    7 ++
> >  lib/dma-debug.c                    |  128 ++++++++++++++++++++++++++++++++----
> >  4 files changed, 136 insertions(+), 13 deletions(-)
> > 
> > diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
> > index 66bd97a..886aaf9 100644
> > --- a/Documentation/DMA-API.txt
> > +++ b/Documentation/DMA-API.txt
> > @@ -638,6 +638,19 @@ this directory the following files can currently be found:
> >  	dma-api/error_count	This file is read-only and shows the total
> >  				numbers of errors found.
> >  
> > +	dma-api/map_errors	This file is read-only and shows the total
> > +				number of dma mapping errors detected.
> > +
> > +	dma-api/map_errors_not_checked
> > +				This file is read-only and shows the total
> > +				number of dma mapping errors, device drivers
> > +				failed to check prior to using the returned
> > +				address.
> > +
> > +	dma-api/unmap_errors
> > +				This file is read-only and shows the total
> > +				number of invalid dma unmapping attempts.
> > +
> >  	dma-api/num_errors	The number in this file shows how many
> >  				warnings will be printed to the kernel log
> >  				before it stops. This number is initialized to
> > diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
> > index f7b4c79..808dae6 100644
> > --- a/arch/x86/include/asm/dma-mapping.h
> > +++ b/arch/x86/include/asm/dma-mapping.h
> > @@ -47,6 +47,7 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> >  static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> >  {
> >  	struct dma_map_ops *ops = get_dma_ops(dev);
> > +	debug_dma_mapping_error(dev, dma_addr);
> >  	if (ops->mapping_error)
> >  		return ops->mapping_error(dev, dma_addr);
> >  
> > diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
> > index 171ad8a..fc0e34c 100644
> > --- a/include/linux/dma-debug.h
> > +++ b/include/linux/dma-debug.h
> > @@ -39,6 +39,8 @@ extern void debug_dma_map_page(struct device *dev, struct page *page,
> >  			       int direction, dma_addr_t dma_addr,
> >  			       bool map_single);
> >  
> > +extern void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
> > +
> >  extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
> >  				 size_t size, int direction, bool map_single);
> >  
> > @@ -105,6 +107,11 @@ static inline void debug_dma_map_page(struct device *dev, struct page *page,
> >  {
> >  }
> >  
> > +static inline void debug_dma_mapping_error(struct device *dev,
> > +					  dma_addr_t dma_addr)
> > +{
> > +}
> > +
> >  static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
> >  					size_t size, int direction,
> >  					bool map_single)
> > diff --git a/lib/dma-debug.c b/lib/dma-debug.c
> > index 66ce414..1708a10 100644
> > --- a/lib/dma-debug.c
> > +++ b/lib/dma-debug.c
> > @@ -45,18 +45,25 @@ enum {
> >  	dma_debug_coherent,
> >  };
> >  
> > +enum map_err_types {
> > +	MAP_ERR_CHECK_NOT_APPLICABLE,
> > +	MAP_ERR_NOT_CHECKED,
> > +	MAP_ERR_CHECKED,
> > +};
> > +
> >  #define DMA_DEBUG_STACKTRACE_ENTRIES 5
> >  
> >  struct dma_debug_entry {
> > -	struct list_head list;
> > -	struct device    *dev;
> > -	int              type;
> > -	phys_addr_t      paddr;
> > -	u64              dev_addr;
> > -	u64              size;
> > -	int              direction;
> > -	int		 sg_call_ents;
> > -	int		 sg_mapped_ents;
> > +	struct list_head    list;
> > +	struct device       *dev;
> > +	int                 type;
> > +	phys_addr_t         paddr;
> > +	u64                 dev_addr;
> > +	u64                 size;
> > +	int                 direction;
> > +	int		    sg_call_ents;
> > +	int		    sg_mapped_ents;
> > +	enum map_err_types  map_err_type;
> 
> Something gone wild with your tabs/space. It should be only
> one extra field added.
I tabbed the existing fields over for readability. Guess I should have
avoided the churn and left it alone. Will fix the tabs and send v4.

> 
> Otherwise:
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

Is it ok to add your reviewed by tag when I send v4?

-- Shuah
> 
> >  #ifdef CONFIG_STACKTRACE
> >  	struct		 stack_trace stacktrace;
> >  	unsigned long	 st_entries[DMA_DEBUG_STACKTRACE_ENTRIES];
> > @@ -83,6 +90,11 @@ static u32 global_disable __read_mostly;
> >  /* Global error count */
> >  static u32 error_count;
> >  
> > +/* dma mapping error counts */
> > +static u32 map_errors;
> > +static u32 map_errors_not_checked;
> > +static u32 unmap_errors;
> > +
> >  /* Global error show enable*/
> >  static u32 show_all_errors __read_mostly;
> >  /* Number of errors to show */
> > @@ -104,6 +116,9 @@ static struct dentry *show_num_errors_dent  __read_mostly;
> >  static struct dentry *num_free_entries_dent __read_mostly;
> >  static struct dentry *min_free_entries_dent __read_mostly;
> >  static struct dentry *filter_dent           __read_mostly;
> > +static struct dentry *map_errors_dent   __read_mostly;
> > +static struct dentry *map_errors_not_checked_dent   __read_mostly;
> > +static struct dentry *unmap_errors_dent __read_mostly;
> >  
> >  /* per-driver filter related state */
> >  
> > @@ -114,6 +129,12 @@ static struct device_driver *current_driver                    __read_mostly;
> >  
> >  static DEFINE_RWLOCK(driver_name_lock);
> >  
> > +static const char *const maperr2str[] = {
> > +	[MAP_ERR_CHECK_NOT_APPLICABLE] = "dma map error check not applicable",
> > +	[MAP_ERR_NOT_CHECKED] = "dma map error not checked",
> > +	[MAP_ERR_CHECKED] = "dma map error checked",
> > +};
> > +
> >  static const char *type2name[4] = { "single", "page",
> >  				    "scather-gather", "coherent" };
> >  
> > @@ -381,11 +402,12 @@ void debug_dma_dump_mappings(struct device *dev)
> >  		list_for_each_entry(entry, &bucket->list, list) {
> >  			if (!dev || dev == entry->dev) {
> >  				dev_info(entry->dev,
> > -					 "%s idx %d P=%Lx D=%Lx L=%Lx %s\n",
> > +					 "%s idx %d P=%Lx D=%Lx L=%Lx %s %s\n",
> >  					 type2name[entry->type], idx,
> >  					 (unsigned long long)entry->paddr,
> >  					 entry->dev_addr, entry->size,
> > -					 dir2name[entry->direction]);
> > +					 dir2name[entry->direction],
> > +					 maperr2str[entry->map_err_type]);
> >  			}
> >  		}
> >  
> > @@ -695,6 +717,28 @@ static int dma_debug_fs_init(void)
> >  	if (!filter_dent)
> >  		goto out_err;
> >  
> > +	map_errors_dent = debugfs_create_u32("map_errors", 0444,
> > +			dma_debug_dent,
> > +			&map_errors);
> > +
> > +	if (!map_errors_dent)
> > +		goto out_err;
> > +
> > +	map_errors_not_checked_dent = debugfs_create_u32(
> > +			"map_errors_not_checked",
> > +			0444,
> > +			dma_debug_dent,
> > +			&map_errors_not_checked);
> > +
> > +	if (!map_errors_not_checked_dent)
> > +		goto out_err;
> > +
> > +	unmap_errors_dent = debugfs_create_u32("unmap_errors", 0444,
> > +			dma_debug_dent,
> > +			&unmap_errors);
> > +	if (!unmap_errors_dent)
> > +		goto out_err;
> > +
> >  	return 0;
> >  
> >  out_err:
> > @@ -843,13 +887,29 @@ static __init int dma_debug_entries_cmdline(char *str)
> >  __setup("dma_debug=", dma_debug_cmdline);
> >  __setup("dma_debug_entries=", dma_debug_entries_cmdline);
> >  
> > +/* Calling dma_mapping_error() from dma-debug api will result in calling
> > +   debug_dma_mapping_error() - need internal mapping error routine to
> > +   avoid debug checks */
> > +#ifndef DMA_ERROR_CODE
> > +#define DMA_ERROR_CODE 0
> > +#endif
> > +static inline int check_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > +{
> > +	const struct dma_map_ops *ops = get_dma_ops(dev);
> > +	if (ops->mapping_error)
> > +		return ops->mapping_error(dev, dma_addr);
> > +
> > +	return (dma_addr == DMA_ERROR_CODE);
> > +}
> > +
> >  static void check_unmap(struct dma_debug_entry *ref)
> >  {
> >  	struct dma_debug_entry *entry;
> >  	struct hash_bucket *bucket;
> >  	unsigned long flags;
> >  
> > -	if (dma_mapping_error(ref->dev, ref->dev_addr)) {
> > +	if (unlikely(check_mapping_error(ref->dev, ref->dev_addr))) {
> > +		unmap_errors += 1;
> >  		err_printk(ref->dev, NULL, "DMA-API: device driver tries "
> >  			   "to free an invalid DMA memory address\n");
> >  		return;
> > @@ -915,6 +975,15 @@ static void check_unmap(struct dma_debug_entry *ref)
> >  			   dir2name[ref->direction]);
> >  	}
> >  
> > +	if (entry->map_err_type == MAP_ERR_NOT_CHECKED) {
> > +		err_printk(ref->dev, entry,
> > +			   "DMA-API: device driver failed to check map error"
> > +			   "[device address=0x%016llx] [size=%llu bytes] "
> > +			   "[mapped as %s]",
> > +			   ref->dev_addr, ref->size,
> > +			   type2name[entry->type]);
> > +	}
> > +
> >  	hash_bucket_del(entry);
> >  	dma_entry_free(entry);
> >  
> > @@ -1022,8 +1091,11 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
> >  	if (unlikely(global_disable))
> >  		return;
> >  
> > -	if (unlikely(dma_mapping_error(dev, dma_addr)))
> > +	if (unlikely(check_mapping_error(dev, dma_addr))) {
> > +		map_errors += 1;
> > +		map_errors_not_checked += 1;
> >  		return;
> > +	}
> >  
> >  	entry = dma_entry_alloc();
> >  	if (!entry)
> > @@ -1035,6 +1107,7 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
> >  	entry->dev_addr  = dma_addr;
> >  	entry->size      = size;
> >  	entry->direction = direction;
> > +	entry->map_err_type = MAP_ERR_NOT_CHECKED;
> >  
> >  	if (map_single)
> >  		entry->type = dma_debug_single;
> > @@ -1050,6 +1123,35 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
> >  }
> >  EXPORT_SYMBOL(debug_dma_map_page);
> >  
> > +void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > +{
> > +	struct dma_debug_entry ref;
> > +	struct dma_debug_entry *entry;
> > +	struct hash_bucket *bucket;
> > +	unsigned long flags;
> > +
> > +	if (unlikely(global_disable))
> > +		return;
> > +
> > +	ref.dev = dev;
> > +	ref.dev_addr = dma_addr;
> > +	bucket = get_hash_bucket(&ref, &flags);
> > +	entry = bucket_find_exact(bucket, &ref);
> > +
> > +	if (!entry) {
> > +		/* very likley dma-api didn't call debug_dma_map_page() or
> > +		   debug_dma_map_page() detected mapping error */
> > +		if (map_errors_not_checked)
> > +			map_errors_not_checked -= 1;
> > +		goto out;
> > +	}
> > +
> > +	entry->map_err_type = MAP_ERR_CHECKED;
> > +out:
> > +	put_hash_bucket(bucket, &flags);
> > +}
> > +EXPORT_SYMBOL(debug_dma_mapping_error);
> > +
> >  void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
> >  			  size_t size, int direction, bool map_single)
> >  {
> > -- 
> > 1.7.9.5
> > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux