Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Thu, 27 Jun 2013 01:06:15 +0200 (CEST)

On Wed, 26 Jun 2013, Waiman Long wrote:

It would have been nice if you'd had cc'ed people who spent a massive
amount of time working on the spinlock code. Did that now.

> +#ifndef CONFIG_SMP
> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name)	\
> +	unsigned int count;				\
> +	spinlock_t   lock
> +
> +#elif defined(__SPINLOCK_HAS_REFCOUNT)
> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name)	\
> +	union u_name {					\
> +		u64		__lock_count;		\
> +		spinlock_t	lock;			\
> +		struct {				\
> +			arch_spinlock_t __raw_lock;	\
> +			unsigned int	count;		\
> +		};					\
> +	}
> +
> +#else /* __SPINLOCK_HAS_REFCOUNT */
> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name)	\
> +	union u_name {					\
> +		u64		__lock_count;		\
> +		struct {				\
> +			unsigned int	count;		\
> +			spinlock_t	lock;		\
> +		};					\
> +	}
> +
> +#endif /* __SPINLOCK_HAS_REFCOUNT */

This is a complete design disaster. You are violating every single
rule of proper layering. The differentiation of spinlock, raw_spinlock
and arch_spinlock is there for a reason and you just take the current
implementation and map your desired function to it. If we take this,
then we fundamentally ruled out a change to the mappings of spinlock,
raw_spinlock and arch_spinlock. This layering is on purpose and it
took a lot of effort to make it as clean as it is now. Your proposed
change undoes that and completely wreckages preempt-rt.

What's even worse is that you go and fiddle with the basic data
structures:

> diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
> index 73548eb..cc107ad 100644
> --- a/include/linux/spinlock_types.h
> +++ b/include/linux/spinlock_types.h
> @@ -17,8 +17,27 @@
>  
>  #include <linux/lockdep.h>
>  
> +/*
> + * The presence of either one of the CONFIG_DEBUG_SPINLOCK or
> + * CONFIG_DEBUG_LOCK_ALLOC configuration parameter will force the
> + * spinlock_t structure to be 8-byte aligned.
> + *
> + * To support the spinlock/reference count combo data type for 64-bit SMP
> + * environment with spinlock debugging turned on, the reference count has
> + * to be integrated into the spinlock_t data structure in this special case.
> + * The spinlock_t data type will be 8 bytes larger if CONFIG_GENERIC_LOCKBREAK
> + * is also defined.
> + */
> +#if defined(CONFIG_64BIT) && (defined(CONFIG_DEBUG_SPINLOCK) ||\
> +			      defined(CONFIG_DEBUG_LOCK_ALLOC))
> +#define	__SPINLOCK_HAS_REFCOUNT	1
> +#endif
> +
>  typedef struct raw_spinlock {
>  	arch_spinlock_t raw_lock;
> +#ifdef __SPINLOCK_HAS_REFCOUNT
> +	unsigned int count;
> +#endif
>  #ifdef CONFIG_GENERIC_LOCKBREAK
>  	unsigned int break_lock;
>  #endif

You need to do that, because you blindly bolt your spinlock extension
into the existing code without taking the time to think about the
design implications.

Do not misunderstand me, I'm impressed by the improvement numbers, but
we need to find a way how this can be done sanely. You really have to
understand that there is a world aside of x86 and big machines. 

Let's have a look at your UP implementation. It's completely broken:

> +/*
> + * Just add the value as the spinlock is a no-op

So what protects that from being preempted? Nothing AFAICT.

What's even worse is that you break all architectures, which have an
arch_spinlock type != sizeof(unsigned int), e.g. SPARC

So what you really want is a new data structure, e.g. something like
struct spinlock_refcount() and a proper set of new accessor functions
instead of an adhoc interface which is designed solely for the needs
of dcache improvements.

The first step to achieve your goal is to consolidate all the
identical copies of arch_spinlock_t and talk to the maintainers of the
architectures which have a different (i.e. !32bit) notion of the arch
specific spinlock implementation, whether it's possible to move over
to a common data struct. Once you have achieved this, you can build
your spinlock + refcount construct upon that.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html