Re: [PATCH] Patch to integrate RapidDisk and RapidCache RAM Drive / Caching modules into stating subsystem.

Dan Carpenter <dan.carpenter@xxxxxxxxxx> · Mon, 1 Jun 2015 12:18:20 +0300

On Sun, May 31, 2015 at 02:43:16PM -0500, Petros Koutoupis wrote:
> Attached is a patch for two modules: RapidDisk & RapidCache. RapidDisk is a Linux
> RAM drive module which allows the user to dynamically create, remove, and resize
> RAM-based block devices. RapidDisk is designed to work with both volatile and
> non-volatile memory. In the case of volatile memory, memory is allocated only when
> needed. The RapidCache module in turn utilizes a RapidDisk volume as a FIFO
> Write-Through caching node to a slower block device.
> 
> Signed-off-by: Petros Koutoupis <petros@xxxxxxxxxxxxxxxxxxx>

This feels like it could be merged without going through staging.  It
looks like ok code to me.

> ---
>  Kconfig                           |    2 
>  Makefile                          |    1 
>  rapiddisk/Documentation/rxdsk.txt |   74 ++
>  rapiddisk/Kconfig                 |   10 
>  rapiddisk/Makefile                |    2 
>  rapiddisk/rxcache.c               | 1181 ++++++++++++++++++++++++++++++++++++++
>  rapiddisk/rxcommon.h              |   22 
>  rapiddisk/rxdsk.c                 |  799 +++++++++++++++++++++++++
>  8 files changed, 2091 insertions(+)
> 
> diff -uNpr linux-next.orig/drivers/staging/Kconfig linux-next/drivers/staging/Kconfig
> --- linux-next.orig/drivers/staging/Kconfig     2015-05-30 13:37:03.929726967 -0500
> +++ linux-next/drivers/staging/Kconfig  2015-05-31 13:36:08.673757913 -0500
> @@ -112,4 +112,6 @@ source "drivers/staging/fsl-mc/Kconfig"
>  
>  source "drivers/staging/wilc1000/Kconfig"
>  
> +source "drivers/staging/rapiddisk/Kconfig"
> +
>  endif # STAGING
> diff -uNpr linux-next.orig/drivers/staging/Makefile linux-next/drivers/staging/Makefile
> --- linux-next.orig/drivers/staging/Makefile    2015-05-30 13:37:03.921726968 -0500
> +++ linux-next/drivers/staging/Makefile 2015-05-31 13:36:08.753757911 -0500
> @@ -48,3 +48,4 @@ obj-$(CONFIG_COMMON_CLK_XLNX_CLKWZRD) +=
>  obj-$(CONFIG_FB_TFT)           += fbtft/
>  obj-$(CONFIG_FSL_MC_BUS)       += fsl-mc/
>  obj-$(CONFIG_WILC1000)         += wilc1000/
> +obj-$(CONFIG_RXDSK)            += rapiddisk/
> diff -uNpr linux-next.orig/drivers/staging/rapiddisk/Documentation/rxdsk.txt linux-next/drivers/staging/rapiddisk/Documentation/rxdsk.txt
> --- linux-next.orig/drivers/staging/rapiddisk/Documentation/rxdsk.txt   1969-12-31 18:00:00.000000000 -0600
> +++ linux-next/drivers/staging/rapiddisk/Documentation/rxdsk.txt        2015-05-31 13:38:29.893753897 -0500
> @@ -0,0 +1,74 @@
> +RapidDisk (rxdsk) RAM disk and RapidCache (rxcache) Linux modules
> +
> +== Description ==
> +
> +RapidDisk or rxdsk was designed to be used in high performing environments
> +and has been designed with simplicity in mind. Utilizing a user land binary,
> +the system administrator is capable of dynamically adding new RAM based
> +block devices of varying sizes, removing existing ones to even listing all
> +existing RAM block devices. The rxdsk module has been designed to allocate
> +from the system's memory pages and is capable of addressing memory to
> +support Gigabytes if not a Terabyte of available Random Access Memory.
> +
> +RapidCache or rxcache is designed to leverage the high speed performing
> +technologies of the RapidDisk RAM disk and utilizing the Device Mapper
> +framework, map an rxdsk volume to act as a block device's Write/Read-through
> +cache. This can significantly boost the performance of a local or remote
> +disk device.
> +
> +
> +== Module Parameters ==
> +
> +RapidDisk
> +---------
> +max_rxcnt: Total RAM Disk devices available for use. (Default = 128 = MAX) (int)
> +max_sectors: Maximum sectors (in KB) for the request queue. (Default = 127) (int)
> +nr_requests: Number of requests at a given time for the request queue. (Default = 128) (int)
> +
> +RapidCache
> +---------
> +None.
> +
> +
> +== Usage ==
> +
> +RapidDisk
> +---------
> +It is advised to utilize the userland utility, rxadm, but this is essentially what is
> +written to the /proc/rxctl proc file to manage rxdsk volumes:
> +
> +Attach a new rxdsk volume by typing (size in sectors):
> +    # echo "rxdsk attach 0 8192" > /proc/rxctl
> +
> +Attach a non-volatile rxdsk volume by typing (starting / ending addresses
> +in decimal format):
> +    # echo "rxdsk attach-nv 0 1234 56789" > /proc/rxctl
> +
> +Detach an existing rxdsk volume by typing:
> +    # echo "rxdsk detach 0" > /proc/rxctl
> +
> +Resize an existing rxdsk volume by typing (size in sectors):
> +    # echo "rxdsk resize 0 65536" > /proc/rxctl
> +
> +
> +Note - the integer after the "rxdsk <command>" is the RapidDisk volume number. "0"
> +would signify "rxd0."
> +
> +RapidCache
> +---------
> +It is advised to utilize the userland utility, rxadm, but this is essentially what is
> +sent to the dmsetup command:
> +
> +Map an existing rxdsk volume to a block device by typing:
> +    # echo 0 4194303 rxcache /dev/sdb /dev/rxd0 0 8 196608|dmsetup create rxc0
> +
> +Parameter 1: Start block of source volume (in sectors).
> +Parameter 2: End block of source volume (in sectors).
> +Parameter 4: Source volume.
> +Parameter 5: Cache volume.
> +Parameter 7: Caching block size.
> +Parameter 8: Cache size (in sectors).
> +
> +Unmap an rxdsk volume:
> +
> +    # dmsetup remove rxc0
> diff -uNpr linux-next.orig/drivers/staging/rapiddisk/Kconfig linux-next/drivers/staging/rapiddisk/Kconfig
> --- linux-next.orig/drivers/staging/rapiddisk/Kconfig   1969-12-31 18:00:00.000000000 -0600
> +++ linux-next/drivers/staging/rapiddisk/Kconfig        2015-05-31 13:36:08.753757911 -0500
> @@ -0,0 +1,10 @@
> +config RXDSK
> +       tristate "RapidDisk Enhanced Linux RAM disk solution"
> +       depends on BLOCK && BLK_DEV_DM
> +       default N
> +       help
> +         Creates virtual block devices called /dev/rxd(X = 0, 1, ...).
> +        Supports both volatile and non-volatile memory. And can map
> +        rxdsk volumes as caching nodes to slower drives.
> +
> +         Project home: http://www.rapiddisk.org/
> diff -uNpr linux-next.orig/drivers/staging/rapiddisk/Makefile linux-next/drivers/staging/rapiddisk/Makefile
> --- linux-next.orig/drivers/staging/rapiddisk/Makefile  1969-12-31 18:00:00.000000000 -0600
> +++ linux-next/drivers/staging/rapiddisk/Makefile       2015-05-31 13:36:08.753757911 -0500
> @@ -0,0 +1,2 @@
> +obj-$(CONFIG_RXDSK) += rxdsk.o
> +obj-$(CONFIG_RXDSK) += rxcache.o
> diff -uNpr linux-next.orig/drivers/staging/rapiddisk/rxcache.c linux-next/drivers/staging/rapiddisk/rxcache.c
> --- linux-next.orig/drivers/staging/rapiddisk/rxcache.c 1969-12-31 18:00:00.000000000 -0600
> +++ linux-next/drivers/staging/rapiddisk/rxcache.c      2015-05-31 13:36:08.753757911 -0500
> @@ -0,0 +1,1181 @@
> +/*******************************************************
> + ** Copyright (c) 2011-2014 Petros Koutoupis
> + ** All rights reserved.
> + **
> + ** filename: rxcache.c
> + ** description: Device mapper target for block-level disk
> + **      write-through and read-ahead caching. This module
> + **      is forked from Flashcache-wt:
> + **      Copyright 2010 Facebook, Inc.
> + **      Author: Mohan Srinivasan (mohan@xxxxxxxxxxxx)
> + **
> + ** created: 3Dec11, petros@xxxxxxxxxxxxxxxxxxx
> + **
> + ** This file is licensed under GPLv2.
> + **
> + ******************************************************/
> +
> +#include <asm/atomic.h>
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/list.h>
> +#include <linux/blkdev.h>
> +#include <linux/bio.h>
> +#include <linux/vmalloc.h>
> +#include <linux/slab.h>
> +#include <linux/hash.h>
> +#include <linux/spinlock.h>
> +#include <linux/workqueue.h>
> +#include <linux/pagemap.h>
> +#include <linux/random.h>
> +#include <linux/version.h>
> +#include <linux/seq_file.h>
> +#include <linux/hardirq.h>
> +#include <asm/kmap_types.h>
> +#include <linux/dm-io.h>
> +#include <linux/device-mapper.h>
> +#include <linux/bio.h>
> +#include "rxcommon.h"
> +
> +/* Like ASSERT() but always compiled in */

Just rename VERIFY() to ASSERT().
> +
> +#define VERIFY(x) do { \
> +       if (unlikely(!(x))) { \
> +               dump_stack(); \
> +               panic("VERIFY: assertion (%s) failed at %s (%d)\n", \
> +               #x, __FILE__, __LINE__); \
> +       } \
> +} while (0)
> +
> +#define DM_MSG_PREFIX  "rxc"
> +#define RxPREFIX       "rxc"
> +#define DRIVER_DESC    "RapidCache (rxc) DM target is a write-through caching target with RapidDisk volumes."

Ugh.  Too many levels of abstraction.  These are nice if your hobby is
grepping but otherwise they are pointless.

The debug printks could be cleaned up or deleted but that's like 10
minutes of work.

> +
> +#define READCACHE      1
> +#define WRITECACHE     2
> +#define READSOURCE     3
> +#define WRITESOURCE    4
> +#define READCACHE_DONE 5
> +
> +/* Default cache parameters */
> +#define DEFAULT_CACHE_ASSOC    512
> +#define DEFAULT_BLOCK_SIZE     8       /* 4 KB */
> +#define CONSECUTIVE_BLOCKS     512
> +
> +/* States of a cache block */
> +#define INVALID                0
> +#define VALID          1       /* Valid */
> +#define INPROG         2       /* IO (cache fill) is in progress */
> +#define CACHEREADINPROG        3       /* cache read in progress, don't recycle */
> +#define INPROG_INVALID 4       /* Write invalidated during a refill */
> +
> +#define DEV_PATHLEN    128
> +
> +#ifndef DM_MAPIO_SUBMITTED
> +#define DM_MAPIO_SUBMITTED     0
> +#endif
> +
> +#define WT_MIN_JOBS 1024
> +
> +/* Cache context */
> +struct cache_c {

struct cache_context.

> +       struct dm_target *tgt;
> +
> +       struct dm_dev *disk_dev;
> +       struct dm_dev *cache_dev;
> +
> +       spinlock_t cache_spin_lock;
> +       struct cacheblock *cache;
> +       u_int8_t *cache_state;
> +       u_int32_t *set_lru_next;
> +
> +       struct dm_io_client *io_client;
> +       sector_t size;
> +       unsigned int assoc;
> +       unsigned int block_size;
> +       unsigned int block_shift;
> +       unsigned int block_mask;
> +       unsigned int consecutive_shift;
> +
> +       wait_queue_head_t destroyq;
> +       atomic_t nr_jobs;
> +
> +       /* Stats */
> +       unsigned long reads;
> +       unsigned long writes;
> +       unsigned long cache_hits;
> +       unsigned long replace;
> +       unsigned long wr_invalidates;
> +       unsigned long rd_invalidates;
> +       unsigned long cached_blocks;
> +       unsigned long cache_wr_replace;
> +       unsigned long uncached_reads;
> +       unsigned long uncached_writes;
> +       unsigned long cache_reads, cache_writes;
> +       unsigned long disk_reads, disk_writes;
> +
> +       char cache_devname[DEV_PATHLEN];
> +       char disk_devname[DEV_PATHLEN];
> +};
> +
> +/* Cache block metadata structure */
> +struct cacheblock {

struct cache_block

> +       sector_t dbn;
> +};
> +
> +/* Structure for a kcached job */
> +struct kcached_job {
> +       struct list_head list;
> +       struct cache_c *dmc;
> +       struct bio *bio;
> +       struct dm_io_region disk;
> +       struct dm_io_region cache;
> +       int index;
> +       int rw;
> +       int error;
> +};
> +
> +u_int64_t size_hist[33];'

sector zero isn't used?  That's really strange.

> +
> +static struct workqueue_struct *_kcached_wq;
> +static struct work_struct _kcached_work;
> +static struct kmem_cache *_job_cache;
> +static mempool_t *_job_pool;
> +
> +static DEFINE_SPINLOCK(_job_lock);
> +
> +static LIST_HEAD(_complete_jobs);
> +static LIST_HEAD(_io_jobs);

Don't name everything with a leading underscrore.  Especially since it
is local to the file.

> +
> +
> +/*
> + * Function Declarations
> + **/

I have avoided pointing out the obvious comments, but this one crosses
a line.  Remove obvious comments.

> +static void cache_read_miss(struct cache_c *, struct bio *, int);
> +static void cache_write(struct cache_c *, struct bio *);
> +static int cache_invalidate_blocks(struct cache_c *, struct bio *);
> +static void rxc_uncached_io_callback(unsigned long, void *);
> +static void rxc_start_uncached_io(struct cache_c *, struct bio *);
> +
> +
> +/*
> + * Functions Definitions
> + **/
> +int dm_io_async_bvec(unsigned int num_regions, struct dm_io_region *where,
> +               int rw, struct bio *bio, io_notify_fn fn, void *context)
> +{
> +       struct kcached_job *job = (struct kcached_job *)context;
> +       struct cache_c *dmc = job->dmc;
> +       struct dm_io_request iorq;
> +
> +       iorq.bi_rw = rw;
> +       iorq.mem.type = DM_IO_BIO;
> +       iorq.mem.ptr.bio = bio;
> +       iorq.notify.fn = fn;
> +       iorq.notify.context = context;
> +       iorq.client = dmc->io_client;
> +
> +       return dm_io(&iorq, num_regions, where, NULL);
> +}
> +
> +static int jobs_init(void)
> +{
> +       _job_cache = kmem_cache_create("kcached-jobs-wt",
> +               sizeof(struct kcached_job), __alignof__(struct kcached_job),
> +               0, NULL);
> +       if (!_job_cache)
> +               return -ENOMEM;
> +
> +       _job_pool = mempool_create(WT_MIN_JOBS, mempool_alloc_slab,
> +               mempool_free_slab, _job_cache);
> +       if (!_job_pool) {
> +               kmem_cache_destroy(_job_cache);
> +               return -ENOMEM;
> +       }
> +
> +       return 0;
> +}
> +
> +static void jobs_exit(void)
> +{
> +       BUG_ON(!list_empty(&_complete_jobs));
> +       BUG_ON(!list_empty(&_io_jobs));
> +
> +       mempool_destroy(_job_pool);
> +       kmem_cache_destroy(_job_cache);
> +       _job_pool = NULL;
> +       _job_cache = NULL;
> +}
> +
> +/* Functions to push and pop a job onto the head of a given job list. */
> +static inline struct kcached_job *pop(struct list_head *jobs)
> +{
> +       struct kcached_job *job = NULL;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&_job_lock, flags);
> +       if (!list_empty(jobs)) {
> +               job = list_entry(jobs->next, struct kcached_job, list);
> +               list_del(&job->list);
> +       }
> +       spin_unlock_irqrestore(&_job_lock, flags);
> +       return job;
> +}
> +
> +static inline void push(struct list_head *jobs, struct kcached_job *job)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&_job_lock, flags);
> +       list_add_tail(&job->list, jobs);
> +       spin_unlock_irqrestore(&_job_lock, flags);
> +}
> +
> +void rxc_io_callback(unsigned long error, void *context)
> +{
> +       struct kcached_job *job = (struct kcached_job *) context;
> +       struct cache_c *dmc = job->dmc;
> +       struct bio *bio;
> +       int invalid = 0;
> +
> +       VERIFY(job != NULL);
> +       bio = job->bio;
> +       VERIFY(bio != NULL);
> +       DPRINTK("%s: %s: %s %llu(%llu->%llu,%llu)", RxPREFIX, __func__,
> +                       (job->rw == READ ? "READ" : "WRITE"),
> +                       bio->bi_iter.bi_sector, job->disk.sector,
> +                       job->cache.sector, job->disk.count);
> +       if (error)
> +               DMERR("%s: io error %ld", __func__, error);
> +       if (job->rw == READSOURCE || job->rw == WRITESOURCE) {
> +               spin_lock_bh(&dmc->cache_spin_lock);
> +               if (dmc->cache_state[job->index] != INPROG) {
> +                       VERIFY(dmc->cache_state[job->index] == INPROG_INVALID);
> +                       invalid++;
> +               }
> +               spin_unlock_bh(&dmc->cache_spin_lock);
> +               if (error || invalid) {
> +                       if (invalid)
> +                               DMERR("%s: cache fill invalidation, sector %lu, size %u",
> +                                       __func__,
> +                                       (unsigned long)bio->bi_iter.bi_sector,
> +                                       bio->bi_iter.bi_size);
> +                               bio_endio(bio, error);
> +                               spin_lock_bh(&dmc->cache_spin_lock);
> +                               dmc->cache_state[job->index] = INVALID;
> +                               spin_unlock_bh(&dmc->cache_spin_lock);
> +                               goto out;
> +               } else {
> +                       /* Kick off the write to the cache */
> +                       job->rw = WRITECACHE;
> +                       push(&_io_jobs, job);
> +                       queue_work(_kcached_wq, &_kcached_work);
> +                       return;
> +               }
> +       } else if (job->rw == READCACHE) {
> +               spin_lock_bh(&dmc->cache_spin_lock);
> +               VERIFY(dmc->cache_state[job->index] == INPROG_INVALID ||
> +                       dmc->cache_state[job->index] ==  CACHEREADINPROG);
> +               if (dmc->cache_state[job->index] == INPROG_INVALID)
> +                       invalid++;
> +               spin_unlock_bh(&dmc->cache_spin_lock);
> +               if (!invalid && !error) {
> +                       /* Complete the current IO successfully */
> +                       bio_endio(bio, 0);
> +                       spin_lock_bh(&dmc->cache_spin_lock);
> +                       dmc->cache_state[job->index] = VALID;
> +                       spin_unlock_bh(&dmc->cache_spin_lock);
> +                       goto out;
> +               }
> +               /* error || invalid || bounce back to source device */
> +               job->rw = READCACHE_DONE;
> +               push(&_complete_jobs, job);
> +               queue_work(_kcached_wq, &_kcached_work);
> +               return;
> +       } else {
> +               VERIFY(job->rw == WRITECACHE);
> +               bio_endio(bio, 0);
> +               spin_lock_bh(&dmc->cache_spin_lock);
> +               VERIFY((dmc->cache_state[job->index] == INPROG) ||
> +                       (dmc->cache_state[job->index] == INPROG_INVALID));
> +               if (error || dmc->cache_state[job->index] == INPROG_INVALID) {
> +                       dmc->cache_state[job->index] = INVALID;
> +               } else {
> +                       dmc->cache_state[job->index] = VALID;
> +                       dmc->cached_blocks++;
> +               }
> +               spin_unlock_bh(&dmc->cache_spin_lock);
> +               DPRINTK("%s: Cache Fill: Block %llu, index = %d: Cache state = %d",
> +                       RxPREFIX, dmc->cache[job->index].dbn, job->index,
> +                       dmc->cache_state[job->index]);
> +       }
> +out:
> +       mempool_free(job, _job_pool);
> +       if (atomic_dec_and_test(&dmc->nr_jobs))
> +               wake_up(&dmc->destroyq);
> +}
> +EXPORT_SYMBOL(rxc_io_callback);
> +
> +static int do_io(struct kcached_job *job)
> +{
> +       int r = 0;
> +       struct cache_c *dmc = job->dmc;
> +       struct bio *bio = job->bio;
> +
> +       VERIFY(job->rw == WRITECACHE);
> +       /* Write to cache device */
> +       dmc->cache_writes++;
> +       r = dm_io_async_bvec(1, &job->cache, WRITE, bio,
> +               rxc_io_callback, job);
> +       VERIFY(r == 0); /* dm_io_async_bvec() must always return 0 */
> +       return r;
> +}
> +
> +int rxc_do_complete(struct kcached_job *job)
> +{
> +       struct bio *bio = job->bio;
> +       struct cache_c *dmc = job->dmc;
> +
> +       VERIFY(job->rw == READCACHE_DONE);
> +       DPRINTK("%s: %s: %llu", RxPREFIX, __func__, bio->bi_iter.bi_sector);
> +       /* error || block invalidated while reading from cache */
> +       spin_lock_bh(&dmc->cache_spin_lock);
> +       dmc->cache_state[job->index] = INVALID;
> +       spin_unlock_bh(&dmc->cache_spin_lock);
> +       mempool_free(job, _job_pool);
> +       if (atomic_dec_and_test(&dmc->nr_jobs))
> +               wake_up(&dmc->destroyq);
> +       /* Kick this IO back to the source bdev */
> +       rxc_start_uncached_io(dmc, bio);
> +       return 0;
> +}
> +EXPORT_SYMBOL(rxc_do_complete);
> +
> +static void process_jobs(struct list_head *jobs,
> +       int (*fn)(struct kcached_job *))
> +{
> +       struct kcached_job *job;
> +
> +       while ((job = pop(jobs)))
> +               (void)fn(job);
> +}
> +
> +static void do_work(struct work_struct *work)
> +{
> +       process_jobs(&_complete_jobs, rxc_do_complete);
> +       process_jobs(&_io_jobs, do_io);
> +}
> +
> +static int kcached_init(struct cache_c *dmc)
> +{
> +       init_waitqueue_head(&dmc->destroyq);
> +       atomic_set(&dmc->nr_jobs, 0);
> +       return 0;
> +}
> +
> +void kcached_client_destroy(struct cache_c *dmc)
> +{
> +       /* Wait for completion of all jobs submitted by this client. */
> +       wait_event(dmc->destroyq, !atomic_read(&dmc->nr_jobs));
> +}
> +
> +/* Map a block from the source device to a block in the cache device. */
> +static unsigned long hash_block(struct cache_c *dmc, sector_t dbn)
> +{
> +       unsigned long set_number, value, tmpval;
> +
> +       value = (unsigned long)
> +       (dbn >> (dmc->block_shift + dmc->consecutive_shift));

Confusing indenting.  The cast is pointless.

> +       tmpval = value;

tmpval is not needed?

> +       set_number = do_div(tmpval, (dmc->size >> dmc->consecutive_shift));
> +       DPRINTK("%s: Hash: %llu(%lu)->%lu", RxPREFIX, dbn, value, set_number);
> +       return set_number;
> +}
> +
> +static int find_valid_dbn(struct cache_c *dmc, sector_t dbn,
> +       int start_index, int *index)
> +{
> +       int i;
> +       int end_index = start_index + dmc->assoc;
> +
> +       for (i = start_index ; i < end_index ; i++) {
> +               if (dbn == dmc->cache[i].dbn && (dmc->cache_state[i] == VALID ||
> +                       dmc->cache_state[i] == CACHEREADINPROG ||
> +                       dmc->cache_state[i] == INPROG)) {

A lot of your indenting is messed up.  This should be:

		if (dbn == dmc->cache[i].dbn &&
		    (dmc->cache_state[i] == VALID ||
		     dmc->cache_state[i] == CACHEREADINPROG ||
		     dmc->cache_state[i] == INPROG)) {

Or you could reverse the first chunk:

	if (dbn != dmc->cache[i].dbn)
		continue;
	if (dmc->cache_state[i] == VALID ||
	    dmc->cache_state[i] == CACHEREADINPROG ||

> +                       *index = i;
> +                       return dmc->cache_state[i];
> +               }
> +       }
> +       return -1;

Find ever reference to -1 and replace it with something meaningful.

> +}
> +
> +static void find_invalid_dbn(struct cache_c *dmc, int start_index, int *index)
> +{
> +       int i;
> +       int end_index = start_index + dmc->assoc;
> +
> +       /* Find INVALID slot that we can reuse */
> +       for (i = start_index ; i < end_index ; i++) {
> +               if (dmc->cache_state[i] == INVALID) {
> +                       *index = i;
> +                       return;
> +               }
> +       }

This function should return success or failure.

> +}
> +
> +static void find_reclaim_dbn(struct cache_c *dmc, int start_index, int *index)
> +{
> +       int i;
> +       int end_index = start_index + dmc->assoc;
> +       int set = start_index / dmc->assoc;
> +       int slots_searched = 0;
> +
> +       i = dmc->set_lru_next[set];
> +       while (slots_searched < dmc->assoc) {
> +               VERIFY(i >= start_index);
> +               VERIFY(i < end_index);
> +               if (dmc->cache_state[i] == VALID) {
> +                       *index = i;
> +                       break;
> +               }
> +               slots_searched++;
> +               i++;
> +               if (i == end_index)
> +                       i = start_index;
> +       }
> +       i++;
> +       if (i == end_index)
> +               i = start_index;
> +       dmc->set_lru_next[set] = i;
> +}
> +
> +/* dbn is the starting sector, io_size is the number of sectors. */
> +static int cache_lookup(struct cache_c *dmc, struct bio *bio, int *index)
> +{
> +       sector_t dbn = bio->bi_iter.bi_sector;
> +#if defined RxC_DEBUG
> +       int io_size = to_sector(bio->bi_iter.bi_size);
> +#endif

Get rid of this ifdef.  The DPRINTK() calls are ugly.  It's hard to see
the actual code in the forest of DPRINTK calls.  Let's fix the crashing
bugs then delete the debug code.

> +       unsigned long set_number = hash_block(dmc, dbn);
> +       int invalid = -1, oldest_clean = -1;
> +       int start_index;
> +       int ret;
> +
> +       start_index = dmc->assoc * set_number;
> +       DPRINTK("%s: Cache read lookup : dbn %llu(%lu), set = %d",
> +                       RxPREFIX, dbn, io_size, set_number);
> +       ret = find_valid_dbn(dmc, dbn, start_index, index);
> +       if (ret == VALID || ret == INPROG || ret == CACHEREADINPROG) {
> +               DPRINTK("%s: Cache read lookup: Block %llu(%lu): ret %d VALID/INPROG index %d",
> +                       RxPREFIX, dbn, io_size, ret, *index);
> +               /* We found the exact range of blocks we are looking for */
> +               return ret;
> +       }
> +       DPRINTK("%s: Cache read lookup: Block %llu(%lu):%d INVALID",
> +               RxPREFIX, dbn, io_size, ret);
> +       VERIFY(ret == -1);
> +       find_invalid_dbn(dmc, start_index, &invalid);
> +       if (invalid == -1) {
> +               /* Search for oldest valid entry */
> +               find_reclaim_dbn(dmc, start_index, &oldest_clean);
> +       }
> +       /* Cache miss : We can't choose an entry marked INPROG,
> +        * but choose the oldest INVALID or the oldest VALID entry. */
> +       *index = start_index + dmc->assoc;
> +       if (invalid != -1) {
> +               DPRINTK("%s: Cache read lookup MISS (INVALID): dbn %llu(%lu), set = %d, index = %d, start_index = %d",
> +                       RxPREFIX, dbn, io_size, set_number,
> +                       invalid, start_index);
> +               *index = invalid;
> +       } else if (oldest_clean != -1) {
> +               DPRINTK("%s: Cache read lookup MISS (VALID): dbn %llu(%lu), set = %d, index = %d, start_index = %d",
> +                       RxPREFIX, dbn, io_size, set_number,
> +                       oldest_clean, start_index);
> +               *index = oldest_clean;
> +       } else {
> +               DPRINTK("%s: Cache read lookup MISS (NOROOM): dbn %llu(%lu), set = %d",
> +                       RxPREFIX, dbn, io_size, set_number);
> +       }
> +       if (*index < (start_index + dmc->assoc))
> +               return INVALID;
> +       else
> +               return -1;
> +}
> +
> +static struct kcached_job *new_kcached_job(struct cache_c *dmc,
> +       struct bio *bio, int index)
> +{
> +       struct kcached_job *job;
> +
> +       job = mempool_alloc(_job_pool, GFP_NOIO);
> +       if (job == NULL)
> +               return NULL;
> +       job->disk.bdev = dmc->disk_dev->bdev;
> +       job->disk.sector = bio->bi_iter.bi_sector;
> +       if (index != -1)
> +               job->disk.count = dmc->block_size;
> +       else
> +               job->disk.count = to_sector(bio->bi_iter.bi_size);
> +       job->cache.bdev = dmc->cache_dev->bdev;
> +       if (index != -1) {
> +               job->cache.sector = index << dmc->block_shift;
> +               job->cache.count = dmc->block_size;
> +       }
> +       job->dmc = dmc;
> +       job->bio = bio;
> +       job->index = index;
> +       job->error = 0;
> +       return job;
> +}
> +
> +static void cache_read_miss(struct cache_c *dmc, struct bio *bio, int index)
> +{
> +       struct kcached_job *job;
> +
> +       DPRINTK("%s: Cache Read Miss sector %llu %u bytes, index %d)",
> +               RxPREFIX, bio->bi_iter.bi_sector, bio->bi_iter.bi_size, index);
> +
> +       job = new_kcached_job(dmc, bio, index);
> +       if (unlikely(job == NULL)) {

Don't add likely/unlikely unless you have benchmarked it.  It's just
makes the code messy for no reason if there isn't a speed reason.

> +               DMERR("%s: Cannot allocate job\n", __func__);
> +               spin_lock_bh(&dmc->cache_spin_lock);
> +               dmc->cache_state[index] = INVALID;
> +               spin_unlock_bh(&dmc->cache_spin_lock);
> +               bio_endio(bio, -EIO);
> +       } else {
> +               job->rw = READSOURCE; /* Fetch data from the source device */
> +               DPRINTK("%s: Queue job for %llu", RxPREFIX,
> +                       bio->bi_iter.bi_sector);
> +               atomic_inc(&dmc->nr_jobs);
> +               dmc->disk_reads++;
> +               dm_io_async_bvec(1, &job->disk, READ,
> +                       bio, rxc_io_callback, job);
> +       }
> +}
> +
> +static void cache_read(struct cache_c *dmc, struct bio *bio)
> +{
> +       int index;
> +       int res;
> +
> +       DPRINTK("%s: Got a %s for %llu  %u bytes)", RxPREFIX,
> +               (bio_rw(bio) == READ ? "READ":"READA"),
> +               bio->bi_iter.bi_sector, bio->bi_iter.bi_size);
> +
> +       spin_lock_bh(&dmc->cache_spin_lock);
> +       res = cache_lookup(dmc, bio, &index);
> +       /* Cache Hit */
> +       if ((res == VALID) &&
> +               (dmc->cache[index].dbn == bio->bi_iter.bi_sector)) {

Bad indenting.  Extra parens.  This isn't LISP.  I think a couple years
ago there was an = vs == introduced this way.  (I may have
misremembered).

> +               struct kcached_job *job;
> +
> +               dmc->cache_state[index] = CACHEREADINPROG;
> +               dmc->cache_hits++;
> +               spin_unlock_bh(&dmc->cache_spin_lock);
> +               DPRINTK("%s: Cache read: Block %llu(%lu), index = %d:%s",
> +                       RxPREFIX, bio->bi_iter.bi_sector, bio->bi_iter.bi_size,
> +                       index, "CACHE HIT");
> +               job = new_kcached_job(dmc, bio, index);
> +               if (unlikely(job == NULL)) {
> +                       /* Can't allocate job, bounce back error */
> +                       DMERR("cache_read(_hit): Cannot allocate job\n");
> +                       spin_lock_bh(&dmc->cache_spin_lock);
> +                       dmc->cache_state[index] = VALID;
> +                       spin_unlock_bh(&dmc->cache_spin_lock);
> +                       bio_endio(bio, -EIO);
> +               } else {
> +                       job->rw = READCACHE; /* Fetch data from source device */
> +                       DPRINTK("%s: Queue job for %llu", RxPREFIX,
> +                               bio->bi_iter.bi_sector);
> +                       atomic_inc(&dmc->nr_jobs);
> +                       dmc->cache_reads++;
> +                       dm_io_async_bvec(1, &job->cache, READ, bio,
> +                       rxc_io_callback, job);
> +               }
> +               return;
> +       }
> +       /* In all cases except for a cache hit (and VALID), test for potential
> +        * invalidations that we need to do. */

Use checkpatch.pl --strict on this code and fix all the warnings.

> +       if (cache_invalidate_blocks(dmc, bio) > 0) {
> +               /* A non zero return indicates an inprog invalidation */
> +               spin_unlock_bh(&dmc->cache_spin_lock);
> +               /* Start uncached IO */

These obvious comments are really distracting.

> +               rxc_start_uncached_io(dmc, bio);
> +               return;
> +       }
> +       if (res == -1 || res >= INPROG) {
> +               /* We either didn't find a cache slot in the set we were
> +                * looking at or the block we are trying to read is being
> +                * refilled into cache. */
> +               spin_unlock_bh(&dmc->cache_spin_lock);
> +               DPRINTK("%s: Cache read: Block %llu(%lu):%s", RxPREFIX,
> +                       bio->bi_iter.bi_sector, bio->bi_iter.bi_size,
> +                       "CACHE MISS & NO ROOM");
> +               /* Start uncached IO */
> +               rxc_start_uncached_io(dmc, bio);
> +               return;
> +       }
> +       /* (res == INVALID) Cache Miss And we found cache blocks to replace
> +        * Claim the cache blocks before giving up the spinlock */
> +       if (dmc->cache_state[index] == VALID) {
> +               dmc->cached_blocks--;
> +               dmc->replace++;
> +       }
> +       dmc->cache_state[index] = INPROG;
> +       dmc->cache[index].dbn = bio->bi_iter.bi_sector;
> +       spin_unlock_bh(&dmc->cache_spin_lock);
> +
> +       DPRINTK("%s: Cache read: Block %llu(%lu), index = %d:%s", RxPREFIX,
> +               bio->bi_iter.bi_sector, bio->bi_iter.bi_size, index,
> +               "CACHE MISS & REPLACE");
> +       cache_read_miss(dmc, bio, index);
> +}
> +
> +static int cache_invalidate_block_set(struct cache_c *dmc, int set,
> +       sector_t io_start, sector_t io_end, int rw, int *inprog_inval)
> +{
> +       int start_index, end_index, i;
> +       int invalidations = 0;
> +
> +       start_index = dmc->assoc * set;
> +       end_index = start_index + dmc->assoc;
> +       for (i = start_index ; i < end_index ; i++) {
> +               sector_t start_dbn = dmc->cache[i].dbn;
> +               sector_t end_dbn = start_dbn + dmc->block_size;
> +
> +               if (dmc->cache_state[i] == INVALID ||
> +                       dmc->cache_state[i] == INPROG_INVALID)
> +                       continue;
> +               if ((io_start >= start_dbn && io_start < end_dbn) ||
> +                       (io_end >= start_dbn && io_end < end_dbn)) {
> +                       /* We have a match */
> +                       if (rw == WRITE)
> +                               dmc->wr_invalidates++;
> +                       else
> +                               dmc->rd_invalidates++;
> +                       invalidations++;
> +                       if (dmc->cache_state[i] == VALID) {
> +                               dmc->cached_blocks--;
> +                               dmc->cache_state[i] = INVALID;
> +                               DPRINTK("%s: Cache invalidate: Block %llu VALID",
> +                                       RxPREFIX, start_dbn);
> +                       } else if (dmc->cache_state[i] >= INPROG) {
> +                               (*inprog_inval)++;
> +                               dmc->cache_state[i] = INPROG_INVALID;
> +                               DMERR("%s: sector %lu, size %lu, rw %d",
> +                                       __func__, (unsigned long)io_start,
> +                                       (unsigned long)io_end - (unsigned long)io_start, rw);
> +                               DPRINTK("%s: Cache invalidate: Block %llu INPROG",
> +                                       RxPREFIX, start_dbn);
> +                       }
> +               }
> +       }
> +       return invalidations;
> +}
> +
> +static int cache_invalidate_blocks(struct cache_c *dmc, struct bio *bio)
> +{
> +       sector_t io_start = bio->bi_iter.bi_sector;
> +       sector_t io_end = bio->bi_iter.bi_sector + (to_sector(bio->bi_iter.bi_size) - 1);
> +       int start_set, end_set;
> +       int inprog_inval_start = 0, inprog_inval_end = 0;
> +
> +       start_set = hash_block(dmc, io_start);
> +       end_set = hash_block(dmc, io_end);
> +       (void)cache_invalidate_block_set(dmc, start_set, io_start, io_end,
> +               bio_data_dir(bio), &inprog_inval_start);

Don't put the (void) here.  It's just noise.  It not done consistently
see two lines below.

> +       if (start_set != end_set)
> +               cache_invalidate_block_set(dmc, end_set, io_start, io_end,
> +                       bio_data_dir(bio),  &inprog_inval_end);
> +       return (inprog_inval_start + inprog_inval_end);
> +}
> +
> +static void cache_write(struct cache_c *dmc, struct bio *bio)
> +{
> +       int index;
> +       int res;
> +       struct kcached_job *job;
> +
> +       spin_lock_bh(&dmc->cache_spin_lock);
> +       if (cache_invalidate_blocks(dmc, bio) > 0) {
> +               /* A non zero return indicates an inprog invalidation */
> +               spin_unlock_bh(&dmc->cache_spin_lock);
> +               /* Start uncached IO */
> +               rxc_start_uncached_io(dmc, bio);
> +               return;
> +       }
> +       res = cache_lookup(dmc, bio, &index);
> +       VERIFY(res == -1 || res == INVALID);
> +       if (res == -1) {
> +               spin_unlock_bh(&dmc->cache_spin_lock);
> +               /* Start uncached IO */
> +               rxc_start_uncached_io(dmc, bio);
> +               return;
> +       }
> +       if (dmc->cache_state[index] == VALID) {
> +               dmc->cached_blocks--;
> +               dmc->cache_wr_replace++;
> +       }
> +       dmc->cache_state[index] = INPROG;
> +       dmc->cache[index].dbn = bio->bi_iter.bi_sector;
> +       spin_unlock_bh(&dmc->cache_spin_lock);
> +       job = new_kcached_job(dmc, bio, index);
> +       if (unlikely(job == NULL)) {
> +               DMERR("%s: Cannot allocate job\n", __func__);
> +               spin_lock_bh(&dmc->cache_spin_lock);
> +               dmc->cache_state[index] = INVALID;
> +               spin_unlock_bh(&dmc->cache_spin_lock);
> +               bio_endio(bio, -EIO);
> +               return;
> +       }
> +       job->rw = WRITESOURCE; /* Write data to the source device */
> +       DPRINTK("%s: Queue job for %llu", RxPREFIX, bio->bi_iter.bi_sector);
> +       atomic_inc(&job->dmc->nr_jobs);
> +       dmc->disk_writes++;
> +       dm_io_async_bvec(1, &job->disk, WRITE, bio,
> +               rxc_io_callback, job);
> +       return;
> +}
> +
> +#define bio_barrier(bio)       ((bio)->bi_rw & REQ_FLUSH)
> +
> +/* Decide the mapping and perform necessary cache operations
> + * for a bio request. */
> +int rxc_map(struct dm_target *ti, struct bio *bio)
> +{
> +       struct cache_c *dmc = (struct cache_c *) ti->private;
> +       int sectors = to_sector(bio->bi_iter.bi_size);
> +
> +       if (sectors <= 32)
> +               size_hist[sectors]++;
> +
> +       DPRINTK("%s: Got a %s for %llu %u bytes)", RxPREFIX,
> +               bio_rw(bio) == WRITE ? "WRITE" : (bio_rw(bio) == READ ?
> +               "READ":"READA"), bio->bi_iter.bi_sector, bio->bi_iter.bi_size);
> +
> +       if (bio_barrier(bio))
> +               return -EOPNOTSUPP;
> +
> +       VERIFY(to_sector(bio->bi_iter.bi_size) <= dmc->block_size);
> +
> +       if (bio_data_dir(bio) == READ)
> +               dmc->reads++;
> +       else
> +               dmc->writes++;
> +
> +       if (to_sector(bio->bi_iter.bi_size) != dmc->block_size) {
> +               spin_lock_bh(&dmc->cache_spin_lock);
> +               (void)cache_invalidate_blocks(dmc, bio);
> +               spin_unlock_bh(&dmc->cache_spin_lock);
> +               /* Start uncached IO */
> +               rxc_start_uncached_io(dmc, bio);
> +       } else {
> +               if (bio_data_dir(bio) == READ)
> +                       cache_read(dmc, bio);
> +               else
> +                       cache_write(dmc, bio);
> +       }
> +       return DM_MAPIO_SUBMITTED;
> +}
> +EXPORT_SYMBOL(rxc_map);
> +
> +static void rxc_uncached_io_callback(unsigned long error, void *context)
> +{
> +       struct kcached_job *job = (struct kcached_job *) context;
> +       struct cache_c *dmc = job->dmc;
> +
> +       spin_lock_bh(&dmc->cache_spin_lock);
> +       if (bio_data_dir(job->bio) == READ)
> +               dmc->uncached_reads++;
> +       else
> +               dmc->uncached_writes++;
> +       (void)cache_invalidate_blocks(dmc, job->bio);
> +       spin_unlock_bh(&dmc->cache_spin_lock);
> +       bio_endio(job->bio, error);
> +       mempool_free(job, _job_pool);
> +       if (atomic_dec_and_test(&dmc->nr_jobs))
> +               wake_up(&dmc->destroyq);
> +}
> +
> +static void rxc_start_uncached_io(struct cache_c *dmc, struct bio *bio)
> +{
> +       int is_write = (bio_data_dir(bio) == WRITE);
> +       struct kcached_job *job;
> +
> +       job = new_kcached_job(dmc, bio, -1);
> +       if (unlikely(job == NULL)) {
> +               bio_endio(bio, -EIO);
> +               return;
> +       }
> +       atomic_inc(&dmc->nr_jobs);
> +       if (bio_data_dir(job->bio) == READ)
> +               dmc->disk_reads++;
> +       else
> +               dmc->disk_writes++;
> +       dm_io_async_bvec(1, &job->disk, ((is_write) ? WRITE : READ),
> +               bio, rxc_uncached_io_callback, job);
> +}
> +
> +static inline int rxc_get_dev(struct dm_target *ti, char *pth,
> +       struct dm_dev **dmd, char *dmc_dname, sector_t tilen)
> +{
> +       int rc;
> +
> +       rc = dm_get_device(ti, pth, dm_table_get_mode(ti->table), dmd);
> +       if (!rc)
> +               strncpy(dmc_dname, pth, DEV_PATHLEN);
> +       return rc;
> +}
> +
> +/* Construct a cache mapping.
> + *  arg[0]: path to source device
> + *  arg[1]: path to cache device
> + *  arg[2]: cache block size (in sectors)
> + *  arg[3]: cache size (in blocks)
> + *  arg[4]: cache associativity */
> +static int cache_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> +{
> +       struct cache_c *dmc;
> +       unsigned int consecutive_blocks;
> +       sector_t i, order, tmpsize;
> +       sector_t data_size, dev_size;
> +       int r = -EINVAL;
> +
> +       if (argc < 2) {
> +               ti->error = "rxc: Need at least 2 arguments";
> +               goto bad;

Ugh.  This function is ugly.  "goto bad;" is useless.  What does it do?

Oh...  It returns.  Oh, we are using numbered gotos as if we were
programming in some primitive GW-BASIC language that didn't allow named
labels.

		return -EINVAL;

> +       }
> +
> +       dmc = kzalloc(sizeof(*dmc), GFP_KERNEL);
> +       if (dmc == NULL) {
> +               ti->error = "rxc: Failed to allocate cache context";
> +               r = ENOMEM;
> +               goto bad;

It's buggy.  We should return -ENOMEM; instead of posivitive.

> +       }
> +
> +       dmc->tgt = ti;
> +
> +       if (rxc_get_dev(ti, argv[0], &dmc->disk_dev, dmc->disk_devname,
> +               ti->len)) {

Hopefull checkpatch.pl --strict will complain here.

	if (rxc_get_dev(ti, argv[0], &dmc->disk_dev, dmc->disk_devname,
			ti->len)) {

> +               ti->error = "rxc: Disk device lookup failed";
> +               goto bad1;

		goto free_dmc;

> +       }
> +       if (strncmp(argv[1], "/dev/rxd", 8) != 0) {
> +               pr_err("%s: %s is not a valid cache device for rxcache.",
> +                       RxPREFIX, argv[1]);
> +               ti->error = "rxc: Invalid cache device. Not an rxdsk volume.";
> +               goto bad2;

		goto put_disk_dev;

> +       }
> +       if (rxc_get_dev(ti, argv[1], &dmc->cache_dev, dmc->cache_devname, 0)) {
> +               ti->error = "rxc: Cache device lookup failed";
> +               goto bad2;
> +       }
> +
> +       dmc->io_client = dm_io_client_create();
> +       if (IS_ERR(dmc->io_client)) {
> +               r = PTR_ERR(dmc->io_client);
> +               ti->error = "Failed to create io client\n";
> +               goto bad2;

This is buggy.  It doesn't free cache_dev.  It's not obviously, because
the name is based on GW-BASIC style.

		goto put_cache_dev.

> +       }
> +
> +       r = kcached_init(dmc);
> +       if (r) {
> +               ti->error = "Failed to initialize kcached";
> +               goto bad3;
> +       }
> +       dmc->assoc = DEFAULT_CACHE_ASSOC;
> +
> +       if (argc >= 3) {
> +               if (sscanf(argv[2], "%u", &dmc->block_size) != 1) {
> +                       ti->error = "rxc: Invalid block size";
> +                       r = -EINVAL;
> +                       goto bad4;
> +               }
> +               if (!dmc->block_size ||
> +                       (dmc->block_size & (dmc->block_size - 1))) {
> +                       ti->error = "rxc: Invalid block size";
> +                       r = -EINVAL;
> +                       goto bad4;
> +               }
> +       } else
> +               dmc->block_size = DEFAULT_BLOCK_SIZE;
> +
> +       dmc->block_shift = ffs(dmc->block_size) - 1;
> +       dmc->block_mask = dmc->block_size - 1;
> +
> +       /* dmc->size is specified in sectors here, and converted
> +        * to blocks below */
> +       if (argc >= 4) {
> +               if (sscanf(argv[3], "%lu", (unsigned long *)&dmc->size) != 1) {
> +                       ti->error = "rxc: Invalid cache size";
> +                       r = -EINVAL;
> +                       goto bad4;
> +               }
> +       } else {
> +               dmc->size = to_sector(dmc->cache_dev->bdev->bd_inode->i_size);
> +       }
> +
> +       if (argc >= 5) {
> +               if (sscanf(argv[4], "%u", &dmc->assoc) != 1) {
> +                       ti->error = "rxc: Invalid cache associativity";
> +                       r = -EINVAL;
> +                       goto bad4;
> +               }
> +               if (!dmc->assoc || (dmc->assoc & (dmc->assoc - 1)) ||
> +                       dmc->size < dmc->assoc) {
> +                       ti->error = "rxc: Invalid cache associativity";
> +                       r = -EINVAL;
> +                       goto bad4;
> +               }
> +       } else
> +               dmc->assoc = DEFAULT_CACHE_ASSOC;
> +
> +       /* Convert size (in sectors) to blocks. Then round size (in blocks now)
> +        * down to a multiple of associativity */
> +       do_div(dmc->size, dmc->block_size);
> +       tmpsize = dmc->size;
> +       do_div(tmpsize, dmc->assoc);
> +       dmc->size = tmpsize * dmc->assoc;
> +
> +       dev_size = to_sector(dmc->cache_dev->bdev->bd_inode->i_size);
> +       data_size = dmc->size * dmc->block_size;
> +       if (data_size > dev_size) {
> +               DMERR("Requested cache size exeeds the cache device's capacity (%lu>%lu)",
> +                       (unsigned long)data_size, (unsigned long)dev_size);
> +               ti->error = "rxc: Invalid cache size";
> +               r = -EINVAL;
> +               goto bad4;
> +       }
> +
> +       consecutive_blocks = dmc->assoc;
> +       dmc->consecutive_shift = ffs(consecutive_blocks) - 1;
> +
> +       order = dmc->size * sizeof(struct cacheblock);
> +#if defined(__x86_64__)
> +       DMINFO("Allocate %luKB (%luB per) mem for %lu-entry cache" \
> +               "(capacity:%luMB, associativity:%u, block size:%u sectors(%uKB))",
> +               (unsigned long)order >> 10, sizeof(struct cacheblock),
> +               (unsigned long)dmc->size,
> +               (unsigned long)data_size >> (20-SECTOR_SHIFT),
> +               dmc->assoc, dmc->block_size,
> +               dmc->block_size >> (10-SECTOR_SHIFT));
> +#else
> +       DMINFO("Allocate %luKB (%dB per) mem for %lu-entry cache" \
> +               "(capacity:%luMB, associativity:%u, block size:%u sectors(%uKB))",
> +               (unsigned long)order >> 10, sizeof(struct cacheblock),
> +               (unsigned long)dmc->size,
> +               (unsigned long)data_size >> (20-SECTOR_SHIFT), dmc->assoc,
> +               dmc->block_size, dmc->block_size >> (10-SECTOR_SHIFT));

This assumes that Intel is the only company in the world.

> +#endif
> +       dmc->cache = (struct cacheblock *)vmalloc(order);

No need for this cast.

> +       if (!dmc->cache) {
> +               ti->error = "Unable to allocate memory";
> +               r = -ENOMEM;
> +               goto bad4;
> +       }
> +       dmc->cache_state = (u_int8_t *)vmalloc(dmc->size);
> +       if (!dmc->cache_state) {
> +               ti->error = "Unable to allocate memory";
> +               r = -ENOMEM;
> +               vfree((void *)dmc->cache);

No need for this case.  Also this doesn't smell right at all.  Why
aren't we doing the vfree() after the goto?  Also we don't vfree if
dm_set_target_max_io_len() fails.

> +               goto bad4;
> +       }
> +
> +       order = (dmc->size >> dmc->consecutive_shift) * sizeof(u_int32_t);
> +       dmc->set_lru_next = (u_int32_t *)vmalloc(order);
> +       if (!dmc->set_lru_next) {
> +               ti->error = "Unable to allocate memory";
> +               r = -ENOMEM;
> +               vfree((void *)dmc->cache);
> +               vfree((void *)dmc->cache_state);
> +               goto bad4;
> +       }
> +
> +       /* Initialize the cache structs */

Obvious.

> +       for (i = 0; i < dmc->size ; i++) {
> +               dmc->cache[i].dbn = 0;
> +               dmc->cache_state[i] = INVALID;
> +       }
> +
> +       /* Initialize the point where LRU sweeps begin for each set */
> +       for (i = 0 ; i < (dmc->size >> dmc->consecutive_shift) ; i++)
> +               dmc->set_lru_next[i] = i * dmc->assoc;
> +
> +       spin_lock_init(&dmc->cache_spin_lock);
> +
> +       dmc->reads = 0;
> +       dmc->writes = 0;
> +       dmc->cache_hits = 0;
> +       dmc->replace = 0;
> +       dmc->wr_invalidates = 0;
> +       dmc->rd_invalidates = 0;
> +       dmc->cached_blocks = 0;
> +       dmc->cache_wr_replace = 0;
> +
> +       r = dm_set_target_max_io_len(ti, dmc->block_size);
> +       if (r)
> +               goto bad4;
> +       ti->private = dmc;
> +       return 0;
> +bad4:
> +       kcached_client_destroy(dmc);
> +bad3:
> +       dm_io_client_destroy(dmc->io_client);
> +       dm_put_device(ti, dmc->cache_dev);
> +bad2:
> +       dm_put_device(ti, dmc->disk_dev);
> +bad1:
> +       kfree(dmc);
> +bad:
> +       return r;
> +}
> +
> +/* Destroy the cache mapping. */
> +static void cache_dtr(struct dm_target *ti)
> +{
> +       struct cache_c *dmc = (struct cache_c *) ti->private;
> +
> +       kcached_client_destroy(dmc);
> +
> +       if (dmc->reads + dmc->writes > 0) {
> +               DMINFO("stats:\n\treads(%lu), writes(%lu)\n",
> +                       dmc->reads, dmc->writes);
> +               DMINFO("\tcache hits(%lu),replacement(%lu), write replacement(%lu)\n" \
> +                       "\tread invalidates(%lu), write invalidates(%lu)\n",
> +                       dmc->cache_hits, dmc->replace, dmc->cache_wr_replace,
> +                       dmc->rd_invalidates, dmc->wr_invalidates);
> +               DMINFO("conf:\n\tcapacity(%luM), associativity(%u), block size(%uK)\n" \
> +                       "\ttotal blocks(%lu), cached blocks(%lu)\n",
> +                       (unsigned long)dmc->size*dmc->block_size>>11, dmc->assoc,
> +                       dmc->block_size>>(10-SECTOR_SHIFT),
> +                       (unsigned long)dmc->size, dmc->cached_blocks);
> +       }
> +
> +       dm_io_client_destroy(dmc->io_client);
> +       vfree((void *)dmc->cache);
> +       vfree((void *)dmc->cache_state);
> +       vfree((void *)dmc->set_lru_next);

Search your code for casts and remove the pointless ones.  vfree() takes
a void pointer.

> +
> +       dm_put_device(ti, dmc->disk_dev);
> +       dm_put_device(ti, dmc->cache_dev);
> +       kfree(dmc);
> +}
> +
> +static void rxc_status_info(struct cache_c *dmc, status_type_t type,
> +       char *result, unsigned int maxlen)
> +{
> +       int sz = 0; /* DMEMIT */

Ugly.  Fix DMEMIT.

> +
> +       DMEMIT("stats:\n\treads(%lu), writes(%lu)\n", dmc->reads,
> +               dmc->writes);
> +
> +       DMEMIT("\tcache hits(%lu) replacement(%lu), write replacement(%lu)\n" \
> +               "\tread invalidates(%lu), write invalidates(%lu)\n" \
> +               "\tuncached reads(%lu), uncached writes(%lu)\n" \
> +               "\tdisk reads(%lu), disk writes(%lu)\n" \
> +               "\tcache reads(%lu), cache writes(%lu)\n",
> +               dmc->cache_hits, dmc->replace, dmc->cache_wr_replace,
> +               dmc->rd_invalidates, dmc->wr_invalidates,
> +               dmc->uncached_reads, dmc->uncached_writes,
> +               dmc->disk_reads, dmc->disk_writes,
> +               dmc->cache_reads, dmc->cache_writes);
> +}
> +
> +static void rxc_status_table(struct cache_c *dmc, status_type_t type,
> +       char *result, unsigned int maxlen)
> +{
> +       int i;
> +       int sz = 0; /* DMEMIT */
> +
> +       DMEMIT("conf:\n\trxd dev (%s), disk dev (%s) mode (%s)\n" \
> +               "\tcapacity(%luM), associativity(%u), block size(%uK)\n" \
> +               "\ttotal blocks(%lu), cached blocks(%lu)\n",
> +               dmc->cache_devname, dmc->disk_devname, "WRITETHROUGH",
> +               (unsigned long)dmc->size*dmc->block_size>>11, dmc->assoc,
> +               dmc->block_size>>(10-SECTOR_SHIFT),
> +               (unsigned long)dmc->size, dmc->cached_blocks);
> +       DMEMIT(" Size Hist: ");
> +       for (i = 1 ; i <= 32 ; i++) {
> +               if (size_hist[i] > 0)
> +                       DMEMIT("%d:%llu ", i*512, size_hist[i]);
> +       }
> +}
> +
> +/* Report cache status:
> + *  Output cache stats upon request of device status;
> + *  Output cache configuration upon request of table status. */
> +static void cache_status(struct dm_target *ti, status_type_t type,
> +       unsigned status_flags,
> +       char *result, unsigned int maxlen)
> +{
> +       struct cache_c *dmc = (struct cache_c *) ti->private;
> +
> +       DPRINTK("%s: debug: entering %s\n", RxPREFIX, __func__);
> +
> +       switch (type) {
> +       case STATUSTYPE_INFO:
> +               rxc_status_info(dmc, type, result, maxlen);
> +               break;
> +       case STATUSTYPE_TABLE:
> +               rxc_status_table(dmc, type, result, maxlen);
> +               break;
> +       }
> +}
> +
> +static struct target_type cache_target = {
> +       .name           = "rxcache",
> +       .version        = {3, 0, 0},
> +       .module         = THIS_MODULE,
> +       .ctr            = cache_ctr,
> +       .dtr            = cache_dtr,
> +       .map            = rxc_map,
> +       .status         = cache_status,
> +};
> +
> +int __init rxc_init(void)
> +{
> +       int r;

r is not a common name for return values.  Use "int ret".

> +
> +       DPRINTK("%s: debug: entering %s\n", RxPREFIX, __func__);
> +
> +       r = jobs_init();
> +       if (r)
> +               return r;
> +
> +       _kcached_wq = create_singlethread_workqueue("kcached");
> +       if (!_kcached_wq) {
> +               DMERR("failed to start kcached");

Most kernel functions handle their own error messages.  No need for this
one, I bet.

> +               return -ENOMEM;
> +       }
> +
> +       INIT_WORK(&_kcached_work, do_work);
> +
> +       for (r = 0 ; r < 33 ; r++)
> +               size_hist[r] = 0;

This shouldn't reuse "r".  It should be "int i".  I stands for iterator.
Actually just mem_zero().

> +       r = dm_register_target(&cache_target);
> +       if (r < 0) {
> +               DMERR("cache: register failed %d", r);

Delete.

> +               return r;
> +       }
> +       pr_info("%s: Module successfully loaded.\n", RxPREFIX);

Delete.

> +       return r;

	return 0;

> +}
> +
> +void rxc_exit(void)
> +{
> +       DPRINTK("%s: debug: entering %s\n", RxPREFIX, __func__);

Delete.  Use ftrace for debugging.

> +       dm_unregister_target(&cache_target);
> +       jobs_exit();
> +       destroy_workqueue(_kcached_wq);
> +       pr_info("%s: Module successfully unloaded.\n", RxPREFIX);

Delete.

> +}
> +
> +
> +module_init(rxc_init);
> +module_exit(rxc_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
> +MODULE_VERSION(VERSION_STR);
> +MODULE_INFO(Copyright, COPYRIGHT);
> diff -uNpr linux-next.orig/drivers/staging/rapiddisk/rxcommon.h linux-next/drivers/staging/rapiddisk/rxcommon.h
> --- linux-next.orig/drivers/staging/rapiddisk/rxcommon.h        1969-12-31 18:00:00.000000000 -0600
> +++ linux-next/drivers/staging/rapiddisk/rxcommon.h     2015-05-31 13:36:08.753757911 -0500
> @@ -0,0 +1,22 @@
> +/*******************************************************
> + ** Copyright (c) 2011-2015 Petros Koutoupis
> + ** All rights reserved.
> + **
> + ** filename: rxcommon.h
> + ** created: 12Feb12, petros@xxxxxxxxxxxxxxxxxxx
> + **
> + ** This file is licensed under GPLv2.
> + **
> + ******************************************************/
> +
> +#include <linux/device.h>
> +
> +#define VERSION_STR    "3.0"
> +#define COPYRIGHT      "Copyright 2014 - 2015 Petros Koutoupis"
> +#define DRIVER_AUTHOR  "Petros Koutoupis <petros@xxxxxxxxxxxxxxxxxxx>"

Don't use defines.

> +
> +#if defined RxDEBUG
> +#define DPRINTK(s, arg...) printk(s "\n", ##arg)
> +#else
> +#define DPRINTK(s, arg...)
> +#endif
> diff -uNpr linux-next.orig/drivers/staging/rapiddisk/rxdsk.c linux-next/drivers/staging/rapiddisk/rxdsk.c
> --- linux-next.orig/drivers/staging/rapiddisk/rxdsk.c   1969-12-31 18:00:00.000000000 -0600
> +++ linux-next/drivers/staging/rapiddisk/rxdsk.c        2015-05-31 13:36:08.753757911 -0500
> @@ -0,0 +1,799 @@
> +/*******************************************************
> + ** Copyright (c) 2011-2015 Petros Koutoupis
> + ** All rights reserved.
> + **
> + ** filename: rdsk.c
> + ** description: RapidDisk is an enhanced Linux RAM disk
> + **      module to dynamically create, remove, and resize
> + **      RAM drives. RapidDisk supports both volatile
> + **     and non-volatile memory.
> + ** created: 1Jun11, petros@xxxxxxxxxxxxxxxxxxx
> + **
> + ** This file is licensed under GPLv2.
> + **
> + ******************************************************/
> +
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <linux/moduleparam.h>
> +#include <linux/version.h>
> +#include <linux/blkdev.h>
> +#include <linux/bio.h>
> +#include <linux/fs.h>
> +#include <linux/hdreg.h>
> +#include <linux/proc_fs.h>
> +#include <linux/errno.h>
> +#include <linux/radix-tree.h>
> +#include <asm/io.h>
> +#include "rxcommon.h"
> +
> +#define RxPREFIX       "rxd"
> +#define DRIVER_DESC    "RapidDisk (rxdsk) is an enhanced RAM disk block device driver."
> +#define DEVICE_NAME    "rxd"
> +#define PROC_NODE      "rxctl"
> +#define BYTES_PER_SECTOR       512
> +#define MAX_RxDISKS            128
> +#define DEFAULT_MAX_SECTS      127
> +#define DEFAULT_REQUESTS       128
> +
> +#define FREE_BATCH             16
> +#define SECTOR_SHIFT           9
> +#define PAGE_SECTORS_SHIFT     (PAGE_SHIFT - SECTOR_SHIFT)
> +#define PAGE_SECTORS           (1 << PAGE_SECTORS_SHIFT)
> +
> +#define NON_VOLATILE_MEMORY    0
> +#define VOLATILE_MEMORY                1
> +
> +/* ioctls */
> +#define INVALID_CDQUERY_IOCTL  0x5331
> +#define RXD_GET_STATS          0x0529
> +
> +
> +static DEFINE_MUTEX(rxproc_mutex);
> +static DEFINE_MUTEX(rxioctl_mutex);
> +
> +struct rdsk_device {
> +       int num;
> +       bool volatile_memory;
> +       struct request_queue *rdsk_queue;
> +       struct gendisk *rdsk_disk;
> +       struct list_head rdsk_list;
> +       unsigned long long max_blk_alloc;
> +       unsigned long long start_addr;
> +       unsigned long long end_addr;
> +       unsigned long long size;
> +       unsigned long error_cnt;
> +       spinlock_t rdsk_lock;
> +       struct radix_tree_root rdsk_pages;
> +};
> +
> +int rdsk_ma_no = 0, rxcnt = 0;
> +static int max_rxcnt = MAX_RxDISKS;
> +static int max_sectors = DEFAULT_MAX_SECTS, nr_requests = DEFAULT_REQUESTS;
> +struct proc_dir_entry *rx_proc = NULL;
> +static LIST_HEAD(rdsk_devices);
> +
> +/*
> + * Supported insmod params
> + **/
> +module_param(max_rxcnt, int, S_IRUGO);
> +MODULE_PARM_DESC(max_rxcnt, " Total RAM Disk devices available for use. (Default = 128)");
> +module_param(max_sectors, int, S_IRUGO);
> +MODULE_PARM_DESC(max_sectors, " Maximum sectors (in KB) for the request queue. (Default = 127)");
> +module_param(nr_requests, int, S_IRUGO);
> +MODULE_PARM_DESC(nr_requests, " Number of requests at a given time for the request queue. (Default = 128)");
> +
> +/*
> + * Function Declarations
> + **/
> +static ssize_t read_proc(struct file *, char __user *, size_t, loff_t *);
> +static ssize_t write_proc(struct file *, const char __user *, size_t, loff_t *);
> +static int rdsk_do_bvec(struct rdsk_device *, struct page *, unsigned int, unsigned int, int, sector_t);
> +static int rdsk_ioctl(struct block_device *, fmode_t, unsigned int, unsigned long);
> +static void rdsk_make_request(struct request_queue *, struct bio *);
> +static int attach_device(int, int); /* disk num, disk size */
> +static int attach_nv_device(int, unsigned long long, unsigned long long); /* disk num, start / end addr */
> +static int detach_device(int);   /* disk num */
> +static int resize_device(int, int); /* disk num, disk size */
> +
> +
> +/*
> + * Functions Definitions
> + **/
> +static ssize_t write_proc(struct file *file, const char __user *buffer,
> +       size_t count, loff_t *data)
> +{
> +       int num, size, err = (int)count;
> +       char *ptr, *buf;
> +       unsigned long long start_addr, end_addr;
> +
> +       DPRINTK("%s: debug: entering %s\n", RxPREFIX, __func__);

Delete.

> +
> +       mutex_lock(&rxproc_mutex);
> +
> +       if (!buffer || count > PAGE_SIZE)
> +               return -EINVAL;

Returning under lock.

> +
> +       buf = (char *)__get_free_page(GFP_KERNEL);

Do the allocation outside the lock.

> +       if (!buf)
> +               return -ENOMEM;
> +
> +       if (copy_from_user(buf, buffer, count))
> +               return -EFAULT;

memory leak.

> +
> +       if (!strncmp("rxdsk attach ", buffer, 13)) {
> +               ptr = buf + 13;
> +               num = simple_strtoul(ptr, &ptr, 0);
> +               size = simple_strtoul(ptr + 1, &ptr, 0);
> +
> +               if (attach_device(num, size) != 0) {
> +                       pr_info("%s: Unable to attach rxd%d\n", RxPREFIX, num);
> +                       err = -EINVAL;

	goto unlock;

Anyway, you're not allowed to add /proc files.  Add it to sysfs or
debugfs.

Hopefully, I've given you enough to work on.  I will review the rest
after you resend.  If I complained once, it means find all similar code
and fix it as well.

regards,
dan carpenter

_______________________________________________
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxx
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel