Re: [PATCH] Patch to integrate RapidDisk and RapidCache RAM Drive / Caching modules into stating subsystem.

Petros Koutoupis <petros@xxxxxxxxxxxxxxxxxxx> · Mon, 01 Jun 2015 08:39:24 -0500

Dan,

You have indeed given me a good amount of homework. Anyway, I will address everything
according to your comments and suggestions and resubmit to the "real" part of the kernel
and not staging. Thank you very much.

Regards,

Petros

On Mon, 2015-06-01 at 12:18 +0300, Dan Carpenter wrote:
> On Sun, May 31, 2015 at 02:43:16PM -0500, Petros Koutoupis wrote:
> > Attached is a patch for two modules: RapidDisk & RapidCache. RapidDisk is a Linux
> > RAM drive module which allows the user to dynamically create, remove, and resize
> > RAM-based block devices. RapidDisk is designed to work with both volatile and
> > non-volatile memory. In the case of volatile memory, memory is allocated only when
> > needed. The RapidCache module in turn utilizes a RapidDisk volume as a FIFO
> > Write-Through caching node to a slower block device.
> > 
> > Signed-off-by: Petros Koutoupis <petros@xxxxxxxxxxxxxxxxxxx>
> 
> This feels like it could be merged without going through staging.  It
> looks like ok code to me.
> 
> 
> 
> > ---
> >  Kconfig                           |    2 
> >  Makefile                          |    1 
> >  rapiddisk/Documentation/rxdsk.txt |   74 ++
> >  rapiddisk/Kconfig                 |   10 
> >  rapiddisk/Makefile                |    2 
> >  rapiddisk/rxcache.c               | 1181 ++++++++++++++++++++++++++++++++++++++
> >  rapiddisk/rxcommon.h              |   22 
> >  rapiddisk/rxdsk.c                 |  799 +++++++++++++++++++++++++
> >  8 files changed, 2091 insertions(+)
> > 
> > diff -uNpr linux-next.orig/drivers/staging/Kconfig linux-next/drivers/staging/Kconfig
> > --- linux-next.orig/drivers/staging/Kconfig     2015-05-30 13:37:03.929726967 -0500
> > +++ linux-next/drivers/staging/Kconfig  2015-05-31 13:36:08.673757913 -0500
> > @@ -112,4 +112,6 @@ source "drivers/staging/fsl-mc/Kconfig"
> >  
> >  source "drivers/staging/wilc1000/Kconfig"
> >  
> > +source "drivers/staging/rapiddisk/Kconfig"
> > +
> >  endif # STAGING
> > diff -uNpr linux-next.orig/drivers/staging/Makefile linux-next/drivers/staging/Makefile
> > --- linux-next.orig/drivers/staging/Makefile    2015-05-30 13:37:03.921726968 -0500
> > +++ linux-next/drivers/staging/Makefile 2015-05-31 13:36:08.753757911 -0500
> > @@ -48,3 +48,4 @@ obj-$(CONFIG_COMMON_CLK_XLNX_CLKWZRD) +=
> >  obj-$(CONFIG_FB_TFT)           += fbtft/
> >  obj-$(CONFIG_FSL_MC_BUS)       += fsl-mc/
> >  obj-$(CONFIG_WILC1000)         += wilc1000/
> > +obj-$(CONFIG_RXDSK)            += rapiddisk/
> > diff -uNpr linux-next.orig/drivers/staging/rapiddisk/Documentation/rxdsk.txt linux-next/drivers/staging/rapiddisk/Documentation/rxdsk.txt
> > --- linux-next.orig/drivers/staging/rapiddisk/Documentation/rxdsk.txt   1969-12-31 18:00:00.000000000 -0600
> > +++ linux-next/drivers/staging/rapiddisk/Documentation/rxdsk.txt        2015-05-31 13:38:29.893753897 -0500
> > @@ -0,0 +1,74 @@
> > +RapidDisk (rxdsk) RAM disk and RapidCache (rxcache) Linux modules
> > +
> > +== Description ==
> > +
> > +RapidDisk or rxdsk was designed to be used in high performing environments
> > +and has been designed with simplicity in mind. Utilizing a user land binary,
> > +the system administrator is capable of dynamically adding new RAM based
> > +block devices of varying sizes, removing existing ones to even listing all
> > +existing RAM block devices. The rxdsk module has been designed to allocate
> > +from the system's memory pages and is capable of addressing memory to
> > +support Gigabytes if not a Terabyte of available Random Access Memory.
> > +
> > +RapidCache or rxcache is designed to leverage the high speed performing
> > +technologies of the RapidDisk RAM disk and utilizing the Device Mapper
> > +framework, map an rxdsk volume to act as a block device's Write/Read-through
> > +cache. This can significantly boost the performance of a local or remote
> > +disk device.
> > +
> > +
> > +== Module Parameters ==
> > +
> > +RapidDisk
> > +---------
> > +max_rxcnt: Total RAM Disk devices available for use. (Default = 128 = MAX) (int)
> > +max_sectors: Maximum sectors (in KB) for the request queue. (Default = 127) (int)
> > +nr_requests: Number of requests at a given time for the request queue. (Default = 128) (int)
> > +
> > +RapidCache
> > +---------
> > +None.
> > +
> > +
> > +== Usage ==
> > +
> > +RapidDisk
> > +---------
> > +It is advised to utilize the userland utility, rxadm, but this is essentially what is
> > +written to the /proc/rxctl proc file to manage rxdsk volumes:
> > +
> > +Attach a new rxdsk volume by typing (size in sectors):
> > +    # echo "rxdsk attach 0 8192" > /proc/rxctl
> > +
> > +Attach a non-volatile rxdsk volume by typing (starting / ending addresses
> > +in decimal format):
> > +    # echo "rxdsk attach-nv 0 1234 56789" > /proc/rxctl
> > +
> > +Detach an existing rxdsk volume by typing:
> > +    # echo "rxdsk detach 0" > /proc/rxctl
> > +
> > +Resize an existing rxdsk volume by typing (size in sectors):
> > +    # echo "rxdsk resize 0 65536" > /proc/rxctl
> > +
> > +
> > +Note - the integer after the "rxdsk <command>" is the RapidDisk volume number. "0"
> > +would signify "rxd0."
> > +
> > +RapidCache
> > +---------
> > +It is advised to utilize the userland utility, rxadm, but this is essentially what is
> > +sent to the dmsetup command:
> > +
> > +Map an existing rxdsk volume to a block device by typing:
> > +    # echo 0 4194303 rxcache /dev/sdb /dev/rxd0 0 8 196608|dmsetup create rxc0
> > +
> > +Parameter 1: Start block of source volume (in sectors).
> > +Parameter 2: End block of source volume (in sectors).
> > +Parameter 4: Source volume.
> > +Parameter 5: Cache volume.
> > +Parameter 7: Caching block size.
> > +Parameter 8: Cache size (in sectors).
> > +
> > +Unmap an rxdsk volume:
> > +
> > +    # dmsetup remove rxc0
> > diff -uNpr linux-next.orig/drivers/staging/rapiddisk/Kconfig linux-next/drivers/staging/rapiddisk/Kconfig
> > --- linux-next.orig/drivers/staging/rapiddisk/Kconfig   1969-12-31 18:00:00.000000000 -0600
> > +++ linux-next/drivers/staging/rapiddisk/Kconfig        2015-05-31 13:36:08.753757911 -0500
> > @@ -0,0 +1,10 @@
> > +config RXDSK
> > +       tristate "RapidDisk Enhanced Linux RAM disk solution"
> > +       depends on BLOCK && BLK_DEV_DM
> > +       default N
> > +       help
> > +         Creates virtual block devices called /dev/rxd(X = 0, 1, ...).
> > +        Supports both volatile and non-volatile memory. And can map
> > +        rxdsk volumes as caching nodes to slower drives.
> > +
> > +         Project home: http://www.rapiddisk.org/
> > diff -uNpr linux-next.orig/drivers/staging/rapiddisk/Makefile linux-next/drivers/staging/rapiddisk/Makefile
> > --- linux-next.orig/drivers/staging/rapiddisk/Makefile  1969-12-31 18:00:00.000000000 -0600
> > +++ linux-next/drivers/staging/rapiddisk/Makefile       2015-05-31 13:36:08.753757911 -0500
> > @@ -0,0 +1,2 @@
> > +obj-$(CONFIG_RXDSK) += rxdsk.o
> > +obj-$(CONFIG_RXDSK) += rxcache.o
> > diff -uNpr linux-next.orig/drivers/staging/rapiddisk/rxcache.c linux-next/drivers/staging/rapiddisk/rxcache.c
> > --- linux-next.orig/drivers/staging/rapiddisk/rxcache.c 1969-12-31 18:00:00.000000000 -0600
> > +++ linux-next/drivers/staging/rapiddisk/rxcache.c      2015-05-31 13:36:08.753757911 -0500
> > @@ -0,0 +1,1181 @@
> > +/*******************************************************
> > + ** Copyright (c) 2011-2014 Petros Koutoupis
> > + ** All rights reserved.
> > + **
> > + ** filename: rxcache.c
> > + ** description: Device mapper target for block-level disk
> > + **      write-through and read-ahead caching. This module
> > + **      is forked from Flashcache-wt:
> > + **      Copyright 2010 Facebook, Inc.
> > + **      Author: Mohan Srinivasan (mohan@xxxxxxxxxxxx)
> > + **
> > + ** created: 3Dec11, petros@xxxxxxxxxxxxxxxxxxx
> > + **
> > + ** This file is licensed under GPLv2.
> > + **
> > + ******************************************************/
> > +
> > +#include <asm/atomic.h>
> > +#include <linux/module.h>
> > +#include <linux/init.h>
> > +#include <linux/list.h>
> > +#include <linux/blkdev.h>
> > +#include <linux/bio.h>
> > +#include <linux/vmalloc.h>
> > +#include <linux/slab.h>
> > +#include <linux/hash.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/workqueue.h>
> > +#include <linux/pagemap.h>
> > +#include <linux/random.h>
> > +#include <linux/version.h>
> > +#include <linux/seq_file.h>
> > +#include <linux/hardirq.h>
> > +#include <asm/kmap_types.h>
> > +#include <linux/dm-io.h>
> > +#include <linux/device-mapper.h>
> > +#include <linux/bio.h>
> > +#include "rxcommon.h"
> > +
> > +/* Like ASSERT() but always compiled in */
> 
> Just rename VERIFY() to ASSERT().
> > +
> > +#define VERIFY(x) do { \
> > +       if (unlikely(!(x))) { \
> > +               dump_stack(); \
> > +               panic("VERIFY: assertion (%s) failed at %s (%d)\n", \
> > +               #x, __FILE__, __LINE__); \
> > +       } \
> > +} while (0)
> > +
> > +#define DM_MSG_PREFIX  "rxc"
> > +#define RxPREFIX       "rxc"
> > +#define DRIVER_DESC    "RapidCache (rxc) DM target is a write-through caching target with RapidDisk volumes."
> 
> 
> Ugh.  Too many levels of abstraction.  These are nice if your hobby is
> grepping but otherwise they are pointless.
> 
> The debug printks could be cleaned up or deleted but that's like 10
> minutes of work.
> 
> > +
> > +#define READCACHE      1
> > +#define WRITECACHE     2
> > +#define READSOURCE     3
> > +#define WRITESOURCE    4
> > +#define READCACHE_DONE 5
> > +
> > +/* Default cache parameters */
> > +#define DEFAULT_CACHE_ASSOC    512
> > +#define DEFAULT_BLOCK_SIZE     8       /* 4 KB */
> > +#define CONSECUTIVE_BLOCKS     512
> > +
> > +/* States of a cache block */
> > +#define INVALID                0
> > +#define VALID          1       /* Valid */
> > +#define INPROG         2       /* IO (cache fill) is in progress */
> > +#define CACHEREADINPROG        3       /* cache read in progress, don't recycle */
> > +#define INPROG_INVALID 4       /* Write invalidated during a refill */
> > +
> > +#define DEV_PATHLEN    128
> > +
> > +#ifndef DM_MAPIO_SUBMITTED
> > +#define DM_MAPIO_SUBMITTED     0
> > +#endif
> > +
> > +#define WT_MIN_JOBS 1024
> > +
> > +/* Cache context */
> > +struct cache_c {
> 
> struct cache_context.
> 
> > +       struct dm_target *tgt;
> > +
> > +       struct dm_dev *disk_dev;
> > +       struct dm_dev *cache_dev;
> > +
> > +       spinlock_t cache_spin_lock;
> > +       struct cacheblock *cache;
> > +       u_int8_t *cache_state;
> > +       u_int32_t *set_lru_next;
> > +
> > +       struct dm_io_client *io_client;
> > +       sector_t size;
> > +       unsigned int assoc;
> > +       unsigned int block_size;
> > +       unsigned int block_shift;
> > +       unsigned int block_mask;
> > +       unsigned int consecutive_shift;
> > +
> > +       wait_queue_head_t destroyq;
> > +       atomic_t nr_jobs;
> > +
> > +       /* Stats */
> > +       unsigned long reads;
> > +       unsigned long writes;
> > +       unsigned long cache_hits;
> > +       unsigned long replace;
> > +       unsigned long wr_invalidates;
> > +       unsigned long rd_invalidates;
> > +       unsigned long cached_blocks;
> > +       unsigned long cache_wr_replace;
> > +       unsigned long uncached_reads;
> > +       unsigned long uncached_writes;
> > +       unsigned long cache_reads, cache_writes;
> > +       unsigned long disk_reads, disk_writes;
> > +
> > +       char cache_devname[DEV_PATHLEN];
> > +       char disk_devname[DEV_PATHLEN];
> > +};
> > +
> > +/* Cache block metadata structure */
> > +struct cacheblock {
> 
> struct cache_block
> 
> > +       sector_t dbn;
> > +};
> > +
> > +/* Structure for a kcached job */
> > +struct kcached_job {
> > +       struct list_head list;
> > +       struct cache_c *dmc;
> > +       struct bio *bio;
> > +       struct dm_io_region disk;
> > +       struct dm_io_region cache;
> > +       int index;
> > +       int rw;
> > +       int error;
> > +};
> > +
> > +u_int64_t size_hist[33];'
> 
> sector zero isn't used?  That's really strange.
> 
> > +
> > +static struct workqueue_struct *_kcached_wq;
> > +static struct work_struct _kcached_work;
> > +static struct kmem_cache *_job_cache;
> > +static mempool_t *_job_pool;
> > +
> > +static DEFINE_SPINLOCK(_job_lock);
> > +
> > +static LIST_HEAD(_complete_jobs);
> > +static LIST_HEAD(_io_jobs);
> 
> Don't name everything with a leading underscrore.  Especially since it
> is local to the file.
> 
> > +
> > +
> > +/*
> > + * Function Declarations
> > + **/
> 
> I have avoided pointing out the obvious comments, but this one crosses
> a line.  Remove obvious comments.
> 
> > +static void cache_read_miss(struct cache_c *, struct bio *, int);
> > +static void cache_write(struct cache_c *, struct bio *);
> > +static int cache_invalidate_blocks(struct cache_c *, struct bio *);
> > +static void rxc_uncached_io_callback(unsigned long, void *);
> > +static void rxc_start_uncached_io(struct cache_c *, struct bio *);
> > +
> > +
> > +/*
> > + * Functions Definitions
> > + **/
> > +int dm_io_async_bvec(unsigned int num_regions, struct dm_io_region *where,
> > +               int rw, struct bio *bio, io_notify_fn fn, void *context)
> > +{
> > +       struct kcached_job *job = (struct kcached_job *)context;
> > +       struct cache_c *dmc = job->dmc;
> > +       struct dm_io_request iorq;
> > +
> > +       iorq.bi_rw = rw;
> > +       iorq.mem.type = DM_IO_BIO;
> > +       iorq.mem.ptr.bio = bio;
> > +       iorq.notify.fn = fn;
> > +       iorq.notify.context = context;
> > +       iorq.client = dmc->io_client;
> > +
> > +       return dm_io(&iorq, num_regions, where, NULL);
> > +}
> > +
> > +static int jobs_init(void)
> > +{
> > +       _job_cache = kmem_cache_create("kcached-jobs-wt",
> > +               sizeof(struct kcached_job), __alignof__(struct kcached_job),
> > +               0, NULL);
> > +       if (!_job_cache)
> > +               return -ENOMEM;
> > +
> > +       _job_pool = mempool_create(WT_MIN_JOBS, mempool_alloc_slab,
> > +               mempool_free_slab, _job_cache);
> > +       if (!_job_pool) {
> > +               kmem_cache_destroy(_job_cache);
> > +               return -ENOMEM;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +static void jobs_exit(void)
> > +{
> > +       BUG_ON(!list_empty(&_complete_jobs));
> > +       BUG_ON(!list_empty(&_io_jobs));
> > +
> > +       mempool_destroy(_job_pool);
> > +       kmem_cache_destroy(_job_cache);
> > +       _job_pool = NULL;
> > +       _job_cache = NULL;
> > +}
> > +
> > +/* Functions to push and pop a job onto the head of a given job list. */
> > +static inline struct kcached_job *pop(struct list_head *jobs)
> > +{
> > +       struct kcached_job *job = NULL;
> > +       unsigned long flags;
> > +
> > +       spin_lock_irqsave(&_job_lock, flags);
> > +       if (!list_empty(jobs)) {
> > +               job = list_entry(jobs->next, struct kcached_job, list);
> > +               list_del(&job->list);
> > +       }
> > +       spin_unlock_irqrestore(&_job_lock, flags);
> > +       return job;
> > +}
> > +
> > +static inline void push(struct list_head *jobs, struct kcached_job *job)
> > +{
> > +       unsigned long flags;
> > +
> > +       spin_lock_irqsave(&_job_lock, flags);
> > +       list_add_tail(&job->list, jobs);
> > +       spin_unlock_irqrestore(&_job_lock, flags);
> > +}
> > +
> > +void rxc_io_callback(unsigned long error, void *context)
> > +{
> > +       struct kcached_job *job = (struct kcached_job *) context;
> > +       struct cache_c *dmc = job->dmc;
> > +       struct bio *bio;
> > +       int invalid = 0;
> > +
> > +       VERIFY(job != NULL);
> > +       bio = job->bio;
> > +       VERIFY(bio != NULL);
> > +       DPRINTK("%s: %s: %s %llu(%llu->%llu,%llu)", RxPREFIX, __func__,
> > +                       (job->rw == READ ? "READ" : "WRITE"),
> > +                       bio->bi_iter.bi_sector, job->disk.sector,
> > +                       job->cache.sector, job->disk.count);
> > +       if (error)
> > +               DMERR("%s: io error %ld", __func__, error);
> > +       if (job->rw == READSOURCE || job->rw == WRITESOURCE) {
> > +               spin_lock_bh(&dmc->cache_spin_lock);
> > +               if (dmc->cache_state[job->index] != INPROG) {
> > +                       VERIFY(dmc->cache_state[job->index] == INPROG_INVALID);
> > +                       invalid++;
> > +               }
> > +               spin_unlock_bh(&dmc->cache_spin_lock);
> > +               if (error || invalid) {
> > +                       if (invalid)
> > +                               DMERR("%s: cache fill invalidation, sector %lu, size %u",
> > +                                       __func__,
> > +                                       (unsigned long)bio->bi_iter.bi_sector,
> > +                                       bio->bi_iter.bi_size);
> > +                               bio_endio(bio, error);
> > +                               spin_lock_bh(&dmc->cache_spin_lock);
> > +                               dmc->cache_state[job->index] = INVALID;
> > +                               spin_unlock_bh(&dmc->cache_spin_lock);
> > +                               goto out;
> > +               } else {
> > +                       /* Kick off the write to the cache */
> > +                       job->rw = WRITECACHE;
> > +                       push(&_io_jobs, job);
> > +                       queue_work(_kcached_wq, &_kcached_work);
> > +                       return;
> > +               }
> > +       } else if (job->rw == READCACHE) {
> > +               spin_lock_bh(&dmc->cache_spin_lock);
> > +               VERIFY(dmc->cache_state[job->index] == INPROG_INVALID ||
> > +                       dmc->cache_state[job->index] ==  CACHEREADINPROG);
> > +               if (dmc->cache_state[job->index] == INPROG_INVALID)
> > +                       invalid++;
> > +               spin_unlock_bh(&dmc->cache_spin_lock);
> > +               if (!invalid && !error) {
> > +                       /* Complete the current IO successfully */
> > +                       bio_endio(bio, 0);
> > +                       spin_lock_bh(&dmc->cache_spin_lock);
> > +                       dmc->cache_state[job->index] = VALID;
> > +                       spin_unlock_bh(&dmc->cache_spin_lock);
> > +                       goto out;
> > +               }
> > +               /* error || invalid || bounce back to source device */
> > +               job->rw = READCACHE_DONE;
> > +               push(&_complete_jobs, job);
> > +               queue_work(_kcached_wq, &_kcached_work);
> > +               return;
> > +       } else {
> > +               VERIFY(job->rw == WRITECACHE);
> > +               bio_endio(bio, 0);
> > +               spin_lock_bh(&dmc->cache_spin_lock);
> > +               VERIFY((dmc->cache_state[job->index] == INPROG) ||
> > +                       (dmc->cache_state[job->index] == INPROG_INVALID));
> > +               if (error || dmc->cache_state[job->index] == INPROG_INVALID) {
> > +                       dmc->cache_state[job->index] = INVALID;
> > +               } else {
> > +                       dmc->cache_state[job->index] = VALID;
> > +                       dmc->cached_blocks++;
> > +               }
> > +               spin_unlock_bh(&dmc->cache_spin_lock);
> > +               DPRINTK("%s: Cache Fill: Block %llu, index = %d: Cache state = %d",
> > +                       RxPREFIX, dmc->cache[job->index].dbn, job->index,
> > +                       dmc->cache_state[job->index]);
> > +       }
> > +out:
> > +       mempool_free(job, _job_pool);
> > +       if (atomic_dec_and_test(&dmc->nr_jobs))
> > +               wake_up(&dmc->destroyq);
> > +}
> > +EXPORT_SYMBOL(rxc_io_callback);
> > +
> > +static int do_io(struct kcached_job *job)
> > +{
> > +       int r = 0;
> > +       struct cache_c *dmc = job->dmc;
> > +       struct bio *bio = job->bio;
> > +
> > +       VERIFY(job->rw == WRITECACHE);
> > +       /* Write to cache device */
> > +       dmc->cache_writes++;
> > +       r = dm_io_async_bvec(1, &job->cache, WRITE, bio,
> > +               rxc_io_callback, job);
> > +       VERIFY(r == 0); /* dm_io_async_bvec() must always return 0 */
> > +       return r;
> > +}
> > +
> > +int rxc_do_complete(struct kcached_job *job)
> > +{
> > +       struct bio *bio = job->bio;
> > +       struct cache_c *dmc = job->dmc;
> > +
> > +       VERIFY(job->rw == READCACHE_DONE);
> > +       DPRINTK("%s: %s: %llu", RxPREFIX, __func__, bio->bi_iter.bi_sector);
> > +       /* error || block invalidated while reading from cache */
> > +       spin_lock_bh(&dmc->cache_spin_lock);
> > +       dmc->cache_state[job->index] = INVALID;
> > +       spin_unlock_bh(&dmc->cache_spin_lock);
> > +       mempool_free(job, _job_pool);
> > +       if (atomic_dec_and_test(&dmc->nr_jobs))
> > +               wake_up(&dmc->destroyq);
> > +       /* Kick this IO back to the source bdev */
> > +       rxc_start_uncached_io(dmc, bio);
> > +       return 0;
> > +}
> > +EXPORT_SYMBOL(rxc_do_complete);
> > +
> > +static void process_jobs(struct list_head *jobs,
> > +       int (*fn)(struct kcached_job *))
> > +{
> > +       struct kcached_job *job;
> > +
> > +       while ((job = pop(jobs)))
> > +               (void)fn(job);
> > +}
> > +
> > +static void do_work(struct work_struct *work)
> > +{
> > +       process_jobs(&_complete_jobs, rxc_do_complete);
> > +       process_jobs(&_io_jobs, do_io);
> > +}
> > +
> > +static int kcached_init(struct cache_c *dmc)
> > +{
> > +       init_waitqueue_head(&dmc->destroyq);
> > +       atomic_set(&dmc->nr_jobs, 0);
> > +       return 0;
> > +}
> > +
> > +void kcached_client_destroy(struct cache_c *dmc)
> > +{
> > +       /* Wait for completion of all jobs submitted by this client. */
> > +       wait_event(dmc->destroyq, !atomic_read(&dmc->nr_jobs));
> > +}
> > +
> > +/* Map a block from the source device to a block in the cache device. */
> > +static unsigned long hash_block(struct cache_c *dmc, sector_t dbn)
> > +{
> > +       unsigned long set_number, value, tmpval;
> > +
> > +       value = (unsigned long)
> > +       (dbn >> (dmc->block_shift + dmc->consecutive_shift));
> 
> Confusing indenting.  The cast is pointless.
> 
> > +       tmpval = value;
> 
> tmpval is not needed?
> 
> > +       set_number = do_div(tmpval, (dmc->size >> dmc->consecutive_shift));
> > +       DPRINTK("%s: Hash: %llu(%lu)->%lu", RxPREFIX, dbn, value, set_number);
> > +       return set_number;
> > +}
> > +
> > +static int find_valid_dbn(struct cache_c *dmc, sector_t dbn,
> > +       int start_index, int *index)
> > +{
> > +       int i;
> > +       int end_index = start_index + dmc->assoc;
> > +
> > +       for (i = start_index ; i < end_index ; i++) {
> > +               if (dbn == dmc->cache[i].dbn && (dmc->cache_state[i] == VALID ||
> > +                       dmc->cache_state[i] == CACHEREADINPROG ||
> > +                       dmc->cache_state[i] == INPROG)) {
> 
> 
> A lot of your indenting is messed up.  This should be:
> 
>               if (dbn == dmc->cache[i].dbn &&
>                   (dmc->cache_state[i] == VALID ||
>                    dmc->cache_state[i] == CACHEREADINPROG ||
>                    dmc->cache_state[i] == INPROG)) {
> 
> Or you could reverse the first chunk:
> 
>       if (dbn != dmc->cache[i].dbn)
>               continue;
>       if (dmc->cache_state[i] == VALID ||
>           dmc->cache_state[i] == CACHEREADINPROG ||
> 
> 
> 
> 
> > +                       *index = i;
> > +                       return dmc->cache_state[i];
> > +               }
> > +       }
> > +       return -1;
> 
> Find ever reference to -1 and replace it with something meaningful.
> 
> > +}
> > +
> > +static void find_invalid_dbn(struct cache_c *dmc, int start_index, int *index)
> > +{
> > +       int i;
> > +       int end_index = start_index + dmc->assoc;
> > +
> > +       /* Find INVALID slot that we can reuse */
> > +       for (i = start_index ; i < end_index ; i++) {
> > +               if (dmc->cache_state[i] == INVALID) {
> > +                       *index = i;
> > +                       return;
> > +               }
> > +       }
> 
> This function should return success or failure.
> 
> > +}
> > +
> > +static void find_reclaim_dbn(struct cache_c *dmc, int start_index, int *index)
> > +{
> > +       int i;
> > +       int end_index = start_index + dmc->assoc;
> > +       int set = start_index / dmc->assoc;
> > +       int slots_searched = 0;
> > +
> > +       i = dmc->set_lru_next[set];
> > +       while (slots_searched < dmc->assoc) {
> > +               VERIFY(i >= start_index);
> > +               VERIFY(i < end_index);
> > +               if (dmc->cache_state[i] == VALID) {
> > +                       *index = i;
> > +                       break;
> > +               }
> > +               slots_searched++;
> > +               i++;
> > +               if (i == end_index)
> > +                       i = start_index;
> > +       }
> > +       i++;
> > +       if (i == end_index)
> > +               i = start_index;
> > +       dmc->set_lru_next[set] = i;
> > +}
> > +
> > +/* dbn is the starting sector, io_size is the number of sectors. */
> > +static int cache_lookup(struct cache_c *dmc, struct bio *bio, int *index)
> > +{
> > +       sector_t dbn = bio->bi_iter.bi_sector;
> > +#if defined RxC_DEBUG
> > +       int io_size = to_sector(bio->bi_iter.bi_size);
> > +#endif
> 
> Get rid of this ifdef.  The DPRINTK() calls are ugly.  It's hard to see
> the actual code in the forest of DPRINTK calls.  Let's fix the crashing
> bugs then delete the debug code.
> 
> > +       unsigned long set_number = hash_block(dmc, dbn);
> > +       int invalid = -1, oldest_clean = -1;
> > +       int start_index;
> > +       int ret;
> > +
> > +       start_index = dmc->assoc * set_number;
> > +       DPRINTK("%s: Cache read lookup : dbn %llu(%lu), set = %d",
> > +                       RxPREFIX, dbn, io_size, set_number);
> > +       ret = find_valid_dbn(dmc, dbn, start_index, index);
> > +       if (ret == VALID || ret == INPROG || ret == CACHEREADINPROG) {
> > +               DPRINTK("%s: Cache read lookup: Block %llu(%lu): ret %d VALID/INPROG index %d",
> > +                       RxPREFIX, dbn, io_size, ret, *index);
> > +               /* We found the exact range of blocks we are looking for */
> > +               return ret;
> > +       }
> > +       DPRINTK("%s: Cache read lookup: Block %llu(%lu):%d INVALID",
> > +               RxPREFIX, dbn, io_size, ret);
> > +       VERIFY(ret == -1);
> > +       find_invalid_dbn(dmc, start_index, &invalid);
> > +       if (invalid == -1) {
> > +               /* Search for oldest valid entry */
> > +               find_reclaim_dbn(dmc, start_index, &oldest_clean);
> > +       }
> > +       /* Cache miss : We can't choose an entry marked INPROG,
> > +        * but choose the oldest INVALID or the oldest VALID entry. */
> > +       *index = start_index + dmc->assoc;
> > +       if (invalid != -1) {
> > +               DPRINTK("%s: Cache read lookup MISS (INVALID): dbn %llu(%lu), set = %d, index = %d, start_index = %d",
> > +                       RxPREFIX, dbn, io_size, set_number,
> > +                       invalid, start_index);
> > +               *index = invalid;
> > +       } else if (oldest_clean != -1) {
> > +               DPRINTK("%s: Cache read lookup MISS (VALID): dbn %llu(%lu), set = %d, index = %d, start_index = %d",
> > +                       RxPREFIX, dbn, io_size, set_number,
> > +                       oldest_clean, start_index);
> > +               *index = oldest_clean;
> > +       } else {
> > +               DPRINTK("%s: Cache read lookup MISS (NOROOM): dbn %llu(%lu), set = %d",
> > +                       RxPREFIX, dbn, io_size, set_number);
> > +       }
> > +       if (*index < (start_index + dmc->assoc))
> > +               return INVALID;
> > +       else
> > +               return -1;
> > +}
> > +
> > +static struct kcached_job *new_kcached_job(struct cache_c *dmc,
> > +       struct bio *bio, int index)
> > +{
> > +       struct kcached_job *job;
> > +
> > +       job = mempool_alloc(_job_pool, GFP_NOIO);
> > +       if (job == NULL)
> > +               return NULL;
> > +       job->disk.bdev = dmc->disk_dev->bdev;
> > +       job->disk.sector = bio->bi_iter.bi_sector;
> > +       if (index != -1)
> > +               job->disk.count = dmc->block_size;
> > +       else
> > +               job->disk.count = to_sector(bio->bi_iter.bi_size);
> > +       job->cache.bdev = dmc->cache_dev->bdev;
> > +       if (index != -1) {
> > +               job->cache.sector = index << dmc->block_shift;
> > +               job->cache.count = dmc->block_size;
> > +       }
> > +       job->dmc = dmc;
> > +       job->bio = bio;
> > +       job->index = index;
> > +       job->error = 0;
> > +       return job;
> > +}
> > +
> > +static void cache_read_miss(struct cache_c *dmc, struct bio *bio, int index)
> > +{
> > +       struct kcached_job *job;
> > +
> > +       DPRINTK("%s: Cache Read Miss sector %llu %u bytes, index %d)",
> > +               RxPREFIX, bio->bi_iter.bi_sector, bio->bi_iter.bi_size, index);
> > +
> > +       job = new_kcached_job(dmc, bio, index);
> > +       if (unlikely(job == NULL)) {
> 
> Don't add likely/unlikely unless you have benchmarked it.  It's just
> makes the code messy for no reason if there isn't a speed reason.
> 
> > +               DMERR("%s: Cannot allocate job\n", __func__);
> > +               spin_lock_bh(&dmc->cache_spin_lock);
> > +               dmc->cache_state[index] = INVALID;
> > +               spin_unlock_bh(&dmc->cache_spin_lock);
> > +               bio_endio(bio, -EIO);
> > +       } else {
> > +               job->rw = READSOURCE; /* Fetch data from the source device */
> > +               DPRINTK("%s: Queue job for %llu", RxPREFIX,
> > +                       bio->bi_iter.bi_sector);
> > +               atomic_inc(&dmc->nr_jobs);
> > +               dmc->disk_reads++;
> > +               dm_io_async_bvec(1, &job->disk, READ,
> > +                       bio, rxc_io_callback, job);
> > +       }
> > +}
> > +
> > +static void cache_read(struct cache_c *dmc, struct bio *bio)
> > +{
> > +       int index;
> > +       int res;
> > +
> > +       DPRINTK("%s: Got a %s for %llu  %u bytes)", RxPREFIX,
> > +               (bio_rw(bio) == READ ? "READ":"READA"),
> > +               bio->bi_iter.bi_sector, bio->bi_iter.bi_size);
> > +
> > +       spin_lock_bh(&dmc->cache_spin_lock);
> > +       res = cache_lookup(dmc, bio, &index);
> > +       /* Cache Hit */
> > +       if ((res == VALID) &&
> > +               (dmc->cache[index].dbn == bio->bi_iter.bi_sector)) {
> 
> Bad indenting.  Extra parens.  This isn't LISP.  I think a couple years
> ago there was an = vs == introduced this way.  (I may have
> misremembered).
> 
> > +               struct kcached_job *job;
> > +
> > +               dmc->cache_state[index] = CACHEREADINPROG;
> > +               dmc->cache_hits++;
> > +               spin_unlock_bh(&dmc->cache_spin_lock);
> > +               DPRINTK("%s: Cache read: Block %llu(%lu), index = %d:%s",
> > +                       RxPREFIX, bio->bi_iter.bi_sector, bio->bi_iter.bi_size,
> > +                       index, "CACHE HIT");
> > +               job = new_kcached_job(dmc, bio, index);
> > +               if (unlikely(job == NULL)) {
> > +                       /* Can't allocate job, bounce back error */
> > +                       DMERR("cache_read(_hit): Cannot allocate job\n");
> > +                       spin_lock_bh(&dmc->cache_spin_lock);
> > +                       dmc->cache_state[index] = VALID;
> > +                       spin_unlock_bh(&dmc->cache_spin_lock);
> > +                       bio_endio(bio, -EIO);
> > +               } else {
> > +                       job->rw = READCACHE; /* Fetch data from source device */
> > +                       DPRINTK("%s: Queue job for %llu", RxPREFIX,
> > +                               bio->bi_iter.bi_sector);
> > +                       atomic_inc(&dmc->nr_jobs);
> > +                       dmc->cache_reads++;
> > +                       dm_io_async_bvec(1, &job->cache, READ, bio,
> > +                       rxc_io_callback, job);
> > +               }
> > +               return;
> > +       }
> > +       /* In all cases except for a cache hit (and VALID), test for potential
> > +        * invalidations that we need to do. */
> 
> Use checkpatch.pl --strict on this code and fix all the warnings.
> 
> > +       if (cache_invalidate_blocks(dmc, bio) > 0) {
> > +               /* A non zero return indicates an inprog invalidation */
> > +               spin_unlock_bh(&dmc->cache_spin_lock);
> > +               /* Start uncached IO */
> 
> These obvious comments are really distracting.
> 
> > +               rxc_start_uncached_io(dmc, bio);
> > +               return;
> > +       }
> > +       if (res == -1 || res >= INPROG) {
> > +               /* We either didn't find a cache slot in the set we were
> > +                * looking at or the block we are trying to read is being
> > +                * refilled into cache. */
> > +               spin_unlock_bh(&dmc->cache_spin_lock);
> > +               DPRINTK("%s: Cache read: Block %llu(%lu):%s", RxPREFIX,
> > +                       bio->bi_iter.bi_sector, bio->bi_iter.bi_size,
> > +                       "CACHE MISS & NO ROOM");
> > +               /* Start uncached IO */
> > +               rxc_start_uncached_io(dmc, bio);
> > +               return;
> > +       }
> > +       /* (res == INVALID) Cache Miss And we found cache blocks to replace
> > +        * Claim the cache blocks before giving up the spinlock */
> > +       if (dmc->cache_state[index] == VALID) {
> > +               dmc->cached_blocks--;
> > +               dmc->replace++;
> > +       }
> > +       dmc->cache_state[index] = INPROG;
> > +       dmc->cache[index].dbn = bio->bi_iter.bi_sector;
> > +       spin_unlock_bh(&dmc->cache_spin_lock);
> > +
> > +       DPRINTK("%s: Cache read: Block %llu(%lu), index = %d:%s", RxPREFIX,
> > +               bio->bi_iter.bi_sector, bio->bi_iter.bi_size, index,
> > +               "CACHE MISS & REPLACE");
> > +       cache_read_miss(dmc, bio, index);
> > +}
> > +
> > +static int cache_invalidate_block_set(struct cache_c *dmc, int set,
> > +       sector_t io_start, sector_t io_end, int rw, int *inprog_inval)
> > +{
> > +       int start_index, end_index, i;
> > +       int invalidations = 0;
> > +
> > +       start_index = dmc->assoc * set;
> > +       end_index = start_index + dmc->assoc;
> > +       for (i = start_index ; i < end_index ; i++) {
> > +               sector_t start_dbn = dmc->cache[i].dbn;
> > +               sector_t end_dbn = start_dbn + dmc->block_size;
> > +
> > +               if (dmc->cache_state[i] == INVALID ||
> > +                       dmc->cache_state[i] == INPROG_INVALID)
> > +                       continue;
> > +               if ((io_start >= start_dbn && io_start < end_dbn) ||
> > +                       (io_end >= start_dbn && io_end < end_dbn)) {
> > +                       /* We have a match */
> > +                       if (rw == WRITE)
> > +                               dmc->wr_invalidates++;
> > +                       else
> > +                               dmc->rd_invalidates++;
> > +                       invalidations++;
> > +                       if (dmc->cache_state[i] == VALID) {
> > +                               dmc->cached_blocks--;
> > +                               dmc->cache_state[i] = INVALID;
> > +                               DPRINTK("%s: Cache invalidate: Block %llu VALID",
> > +                                       RxPREFIX, start_dbn);
> > +                       } else if (dmc->cache_state[i] >= INPROG) {
> > +                               (*inprog_inval)++;
> > +                               dmc->cache_state[i] = INPROG_INVALID;
> > +                               DMERR("%s: sector %lu, size %lu, rw %d",
> > +                                       __func__, (unsigned long)io_start,
> > +                                       (unsigned long)io_end - (unsigned long)io_start, rw);
> > +                               DPRINTK("%s: Cache invalidate: Block %llu INPROG",
> > +                                       RxPREFIX, start_dbn);
> > +                       }
> > +               }
> > +       }
> > +       return invalidations;
> > +}
> > +
> > +static int cache_invalidate_blocks(struct cache_c *dmc, struct bio *bio)
> > +{
> > +       sector_t io_start = bio->bi_iter.bi_sector;
> > +       sector_t io_end = bio->bi_iter.bi_sector + (to_sector(bio->bi_iter.bi_size) - 1);
> > +       int start_set, end_set;
> > +       int inprog_inval_start = 0, inprog_inval_end = 0;
> > +
> > +       start_set = hash_block(dmc, io_start);
> > +       end_set = hash_block(dmc, io_end);
> > +       (void)cache_invalidate_block_set(dmc, start_set, io_start, io_end,
> > +               bio_data_dir(bio), &inprog_inval_start);
> 
> Don't put the (void) here.  It's just noise.  It not done consistently
> see two lines below.
> 
> > +       if (start_set != end_set)
> > +               cache_invalidate_block_set(dmc, end_set, io_start, io_end,
> > +                       bio_data_dir(bio),  &inprog_inval_end);
> > +       return (inprog_inval_start + inprog_inval_end);
> > +}
> > +
> > +static void cache_write(struct cache_c *dmc, struct bio *bio)
> > +{
> > +       int index;
> > +       int res;
> > +       struct kcached_job *job;
> > +
> > +       spin_lock_bh(&dmc->cache_spin_lock);
> > +       if (cache_invalidate_blocks(dmc, bio) > 0) {
> > +               /* A non zero return indicates an inprog invalidation */
> > +               spin_unlock_bh(&dmc->cache_spin_lock);
> > +               /* Start uncached IO */
> > +               rxc_start_uncached_io(dmc, bio);
> > +               return;
> > +       }
> > +       res = cache_lookup(dmc, bio, &index);
> > +       VERIFY(res == -1 || res == INVALID);
> > +       if (res == -1) {
> > +               spin_unlock_bh(&dmc->cache_spin_lock);
> > +               /* Start uncached IO */
> > +               rxc_start_uncached_io(dmc, bio);
> > +               return;
> > +       }
> > +       if (dmc->cache_state[index] == VALID) {
> > +               dmc->cached_blocks--;
> > +               dmc->cache_wr_replace++;
> > +       }
> > +       dmc->cache_state[index] = INPROG;
> > +       dmc->cache[index].dbn = bio->bi_iter.bi_sector;
> > +       spin_unlock_bh(&dmc->cache_spin_lock);
> > +       job = new_kcached_job(dmc, bio, index);
> > +       if (unlikely(job == NULL)) {
> > +               DMERR("%s: Cannot allocate job\n", __func__);
> > +               spin_lock_bh(&dmc->cache_spin_lock);
> > +               dmc->cache_state[index] = INVALID;
> > +               spin_unlock_bh(&dmc->cache_spin_lock);
> > +               bio_endio(bio, -EIO);
> > +               return;
> > +       }
> > +       job->rw = WRITESOURCE; /* Write data to the source device */
> > +       DPRINTK("%s: Queue job for %llu", RxPREFIX, bio->bi_iter.bi_sector);
> > +       atomic_inc(&job->dmc->nr_jobs);
> > +       dmc->disk_writes++;
> > +       dm_io_async_bvec(1, &job->disk, WRITE, bio,
> > +               rxc_io_callback, job);
> > +       return;
> > +}
> > +
> > +#define bio_barrier(bio)       ((bio)->bi_rw & REQ_FLUSH)
> > +
> > +/* Decide the mapping and perform necessary cache operations
> > + * for a bio request. */
> > +int rxc_map(struct dm_target *ti, struct bio *bio)
> > +{
> > +       struct cache_c *dmc = (struct cache_c *) ti->private;
> > +       int sectors = to_sector(bio->bi_iter.bi_size);
> > +
> > +       if (sectors <= 32)
> > +               size_hist[sectors]++;
> > +
> > +       DPRINTK("%s: Got a %s for %llu %u bytes)", RxPREFIX,
> > +               bio_rw(bio) == WRITE ? "WRITE" : (bio_rw(bio) == READ ?
> > +               "READ":"READA"), bio->bi_iter.bi_sector, bio->bi_iter.bi_size);
> > +
> > +       if (bio_barrier(bio))
> > +               return -EOPNOTSUPP;
> > +
> > +       VERIFY(to_sector(bio->bi_iter.bi_size) <= dmc->block_size);
> > +
> > +       if (bio_data_dir(bio) == READ)
> > +               dmc->reads++;
> > +       else
> > +               dmc->writes++;
> > +
> > +       if (to_sector(bio->bi_iter.bi_size) != dmc->block_size) {
> > +               spin_lock_bh(&dmc->cache_spin_lock);
> > +               (void)cache_invalidate_blocks(dmc, bio);
> > +               spin_unlock_bh(&dmc->cache_spin_lock);
> > +               /* Start uncached IO */
> > +               rxc_start_uncached_io(dmc, bio);
> > +       } else {
> > +               if (bio_data_dir(bio) == READ)
> > +                       cache_read(dmc, bio);
> > +               else
> > +                       cache_write(dmc, bio);
> > +       }
> > +       return DM_MAPIO_SUBMITTED;
> > +}
> > +EXPORT_SYMBOL(rxc_map);
> > +
> > +static void rxc_uncached_io_callback(unsigned long error, void *context)
> > +{
> > +       struct kcached_job *job = (struct kcached_job *) context;
> > +       struct cache_c *dmc = job->dmc;
> > +
> > +       spin_lock_bh(&dmc->cache_spin_lock);
> > +       if (bio_data_dir(job->bio) == READ)
> > +               dmc->uncached_reads++;
> > +       else
> > +               dmc->uncached_writes++;
> > +       (void)cache_invalidate_blocks(dmc, job->bio);
> > +       spin_unlock_bh(&dmc->cache_spin_lock);
> > +       bio_endio(job->bio, error);
> > +       mempool_free(job, _job_pool);
> > +       if (atomic_dec_and_test(&dmc->nr_jobs))
> > +               wake_up(&dmc->destroyq);
> > +}
> > +
> > +static void rxc_start_uncached_io(struct cache_c *dmc, struct bio *bio)
> > +{
> > +       int is_write = (bio_data_dir(bio) == WRITE);
> > +       struct kcached_job *job;
> > +
> > +       job = new_kcached_job(dmc, bio, -1);
> > +       if (unlikely(job == NULL)) {
> > +               bio_endio(bio, -EIO);
> > +               return;
> > +       }
> > +       atomic_inc(&dmc->nr_jobs);
> > +       if (bio_data_dir(job->bio) == READ)
> > +               dmc->disk_reads++;
> > +       else
> > +               dmc->disk_writes++;
> > +       dm_io_async_bvec(1, &job->disk, ((is_write) ? WRITE : READ),
> > +               bio, rxc_uncached_io_callback, job);
> > +}
> > +
> > +static inline int rxc_get_dev(struct dm_target *ti, char *pth,
> > +       struct dm_dev **dmd, char *dmc_dname, sector_t tilen)
> > +{
> > +       int rc;
> > +
> > +       rc = dm_get_device(ti, pth, dm_table_get_mode(ti->table), dmd);
> > +       if (!rc)
> > +               strncpy(dmc_dname, pth, DEV_PATHLEN);
> > +       return rc;
> > +}
> > +
> > +/* Construct a cache mapping.
> > + *  arg[0]: path to source device
> > + *  arg[1]: path to cache device
> > + *  arg[2]: cache block size (in sectors)
> > + *  arg[3]: cache size (in blocks)
> > + *  arg[4]: cache associativity */
> > +static int cache_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> > +{
> > +       struct cache_c *dmc;
> > +       unsigned int consecutive_blocks;
> > +       sector_t i, order, tmpsize;
> > +       sector_t data_size, dev_size;
> > +       int r = -EINVAL;
> > +
> > +       if (argc < 2) {
> > +               ti->error = "rxc: Need at least 2 arguments";
> > +               goto bad;
> 
> Ugh.  This function is ugly.  "goto bad;" is useless.  What does it do?
> 
> Oh...  It returns.  Oh, we are using numbered gotos as if we were
> programming in some primitive GW-BASIC language that didn't allow named
> labels.
> 
>               return -EINVAL;
> 
> > +       }
> > +
> > +       dmc = kzalloc(sizeof(*dmc), GFP_KERNEL);
> > +       if (dmc == NULL) {
> > +               ti->error = "rxc: Failed to allocate cache context";
> > +               r = ENOMEM;
> > +               goto bad;
> 
> It's buggy.  We should return -ENOMEM; instead of posivitive.
> 
> > +       }
> > +
> > +       dmc->tgt = ti;
> > +
> > +       if (rxc_get_dev(ti, argv[0], &dmc->disk_dev, dmc->disk_devname,
> > +               ti->len)) {
> 
> Hopefull checkpatch.pl --strict will complain here.
> 
>       if (rxc_get_dev(ti, argv[0], &dmc->disk_dev, dmc->disk_devname,
>                       ti->len)) {
> 
> > +               ti->error = "rxc: Disk device lookup failed";
> > +               goto bad1;
> 
>               goto free_dmc;
> 
> > +       }
> > +       if (strncmp(argv[1], "/dev/rxd", 8) != 0) {
> > +               pr_err("%s: %s is not a valid cache device for rxcache.",
> > +                       RxPREFIX, argv[1]);
> > +               ti->error = "rxc: Invalid cache device. Not an rxdsk volume.";
> > +               goto bad2;
> 
> 
>               goto put_disk_dev;
> 
> > +       }
> > +       if (rxc_get_dev(ti, argv[1], &dmc->cache_dev, dmc->cache_devname, 0)) {
> > +               ti->error = "rxc: Cache device lookup failed";
> > +               goto bad2;
> > +       }
> > +
> > +       dmc->io_client = dm_io_client_create();
> > +       if (IS_ERR(dmc->io_client)) {
> > +               r = PTR_ERR(dmc->io_client);
> > +               ti->error = "Failed to create io client\n";
> > +               goto bad2;
> 
> This is buggy.  It doesn't free cache_dev.  It's not obviously, because
> the name is based on GW-BASIC style.
> 
>               goto put_cache_dev.
> 
> > +       }
> > +
> > +       r = kcached_init(dmc);
> > +       if (r) {
> > +               ti->error = "Failed to initialize kcached";
> > +               goto bad3;
> > +       }
> > +       dmc->assoc = DEFAULT_CACHE_ASSOC;
> > +
> > +       if (argc >= 3) {
> > +               if (sscanf(argv[2], "%u", &dmc->block_size) != 1) {
> > +                       ti->error = "rxc: Invalid block size";
> > +                       r = -EINVAL;
> > +                       goto bad4;
> > +               }
> > +               if (!dmc->block_size ||
> > +                       (dmc->block_size & (dmc->block_size - 1))) {
> > +                       ti->error = "rxc: Invalid block size";
> > +                       r = -EINVAL;
> > +                       goto bad4;
> > +               }
> > +       } else
> > +               dmc->block_size = DEFAULT_BLOCK_SIZE;
> > +
> > +       dmc->block_shift = ffs(dmc->block_size) - 1;
> > +       dmc->block_mask = dmc->block_size - 1;
> > +
> > +       /* dmc->size is specified in sectors here, and converted
> > +        * to blocks below */
> > +       if (argc >= 4) {
> > +               if (sscanf(argv[3], "%lu", (unsigned long *)&dmc->size) != 1) {
> > +                       ti->error = "rxc: Invalid cache size";
> > +                       r = -EINVAL;
> > +                       goto bad4;
> > +               }
> > +       } else {
> > +               dmc->size = to_sector(dmc->cache_dev->bdev->bd_inode->i_size);
> > +       }
> > +
> > +       if (argc >= 5) {
> > +               if (sscanf(argv[4], "%u", &dmc->assoc) != 1) {
> > +                       ti->error = "rxc: Invalid cache associativity";
> > +                       r = -EINVAL;
> > +                       goto bad4;
> > +               }
> > +               if (!dmc->assoc || (dmc->assoc & (dmc->assoc - 1)) ||
> > +                       dmc->size < dmc->assoc) {
> > +                       ti->error = "rxc: Invalid cache associativity";
> > +                       r = -EINVAL;
> > +                       goto bad4;
> > +               }
> > +       } else
> > +               dmc->assoc = DEFAULT_CACHE_ASSOC;
> > +
> > +       /* Convert size (in sectors) to blocks. Then round size (in blocks now)
> > +        * down to a multiple of associativity */
> > +       do_div(dmc->size, dmc->block_size);
> > +       tmpsize = dmc->size;
> > +       do_div(tmpsize, dmc->assoc);
> > +       dmc->size = tmpsize * dmc->assoc;
> > +
> > +       dev_size = to_sector(dmc->cache_dev->bdev->bd_inode->i_size);
> > +       data_size = dmc->size * dmc->block_size;
> > +       if (data_size > dev_size) {
> > +               DMERR("Requested cache size exeeds the cache device's capacity (%lu>%lu)",
> > +                       (unsigned long)data_size, (unsigned long)dev_size);
> > +               ti->error = "rxc: Invalid cache size";
> > +               r = -EINVAL;
> > +               goto bad4;
> > +       }
> > +
> > +       consecutive_blocks = dmc->assoc;
> > +       dmc->consecutive_shift = ffs(consecutive_blocks) - 1;
> > +
> > +       order = dmc->size * sizeof(struct cacheblock);
> > +#if defined(__x86_64__)
> > +       DMINFO("Allocate %luKB (%luB per) mem for %lu-entry cache" \
> > +               "(capacity:%luMB, associativity:%u, block size:%u sectors(%uKB))",
> > +               (unsigned long)order >> 10, sizeof(struct cacheblock),
> > +               (unsigned long)dmc->size,
> > +               (unsigned long)data_size >> (20-SECTOR_SHIFT),
> > +               dmc->assoc, dmc->block_size,
> > +               dmc->block_size >> (10-SECTOR_SHIFT));
> > +#else
> > +       DMINFO("Allocate %luKB (%dB per) mem for %lu-entry cache" \
> > +               "(capacity:%luMB, associativity:%u, block size:%u sectors(%uKB))",
> > +               (unsigned long)order >> 10, sizeof(struct cacheblock),
> > +               (unsigned long)dmc->size,
> > +               (unsigned long)data_size >> (20-SECTOR_SHIFT), dmc->assoc,
> > +               dmc->block_size, dmc->block_size >> (10-SECTOR_SHIFT));
> 
> This assumes that Intel is the only company in the world.
> 
> > +#endif
> > +       dmc->cache = (struct cacheblock *)vmalloc(order);
> 
> No need for this cast.
> 
> 
> > +       if (!dmc->cache) {
> > +               ti->error = "Unable to allocate memory";
> > +               r = -ENOMEM;
> > +               goto bad4;
> > +       }
> > +       dmc->cache_state = (u_int8_t *)vmalloc(dmc->size);
> > +       if (!dmc->cache_state) {
> > +               ti->error = "Unable to allocate memory";
> > +               r = -ENOMEM;
> > +               vfree((void *)dmc->cache);
> 
> No need for this case.  Also this doesn't smell right at all.  Why
> aren't we doing the vfree() after the goto?  Also we don't vfree if
> dm_set_target_max_io_len() fails.
> 
> > +               goto bad4;
> > +       }
> > +
> > +       order = (dmc->size >> dmc->consecutive_shift) * sizeof(u_int32_t);
> > +       dmc->set_lru_next = (u_int32_t *)vmalloc(order);
> > +       if (!dmc->set_lru_next) {
> > +               ti->error = "Unable to allocate memory";
> > +               r = -ENOMEM;
> > +               vfree((void *)dmc->cache);
> > +               vfree((void *)dmc->cache_state);
> > +               goto bad4;
> > +       }
> > +
> > +       /* Initialize the cache structs */
> 
> Obvious.
> 
> > +       for (i = 0; i < dmc->size ; i++) {
> > +               dmc->cache[i].dbn = 0;
> > +               dmc->cache_state[i] = INVALID;
> > +       }
> > +
> > +       /* Initialize the point where LRU sweeps begin for each set */
> > +       for (i = 0 ; i < (dmc->size >> dmc->consecutive_shift) ; i++)
> > +               dmc->set_lru_next[i] = i * dmc->assoc;
> > +
> > +       spin_lock_init(&dmc->cache_spin_lock);
> > +
> > +       dmc->reads = 0;
> > +       dmc->writes = 0;
> > +       dmc->cache_hits = 0;
> > +       dmc->replace = 0;
> > +       dmc->wr_invalidates = 0;
> > +       dmc->rd_invalidates = 0;
> > +       dmc->cached_blocks = 0;
> > +       dmc->cache_wr_replace = 0;
> > +
> > +       r = dm_set_target_max_io_len(ti, dmc->block_size);
> > +       if (r)
> > +               goto bad4;
> > +       ti->private = dmc;
> > +       return 0;
> > +bad4:
> > +       kcached_client_destroy(dmc);
> > +bad3:
> > +       dm_io_client_destroy(dmc->io_client);
> > +       dm_put_device(ti, dmc->cache_dev);
> > +bad2:
> > +       dm_put_device(ti, dmc->disk_dev);
> > +bad1:
> > +       kfree(dmc);
> > +bad:
> > +       return r;
> > +}
> > +
> > +/* Destroy the cache mapping. */
> > +static void cache_dtr(struct dm_target *ti)
> > +{
> > +       struct cache_c *dmc = (struct cache_c *) ti->private;
> > +
> > +       kcached_client_destroy(dmc);
> > +
> > +       if (dmc->reads + dmc->writes > 0) {
> > +               DMINFO("stats:\n\treads(%lu), writes(%lu)\n",
> > +                       dmc->reads, dmc->writes);
> > +               DMINFO("\tcache hits(%lu),replacement(%lu), write replacement(%lu)\n" \
> > +                       "\tread invalidates(%lu), write invalidates(%lu)\n",
> > +                       dmc->cache_hits, dmc->replace, dmc->cache_wr_replace,
> > +                       dmc->rd_invalidates, dmc->wr_invalidates);
> > +               DMINFO("conf:\n\tcapacity(%luM), associativity(%u), block size(%uK)\n" \
> > +                       "\ttotal blocks(%lu), cached blocks(%lu)\n",
> > +                       (unsigned long)dmc->size*dmc->block_size>>11, dmc->assoc,
> > +                       dmc->block_size>>(10-SECTOR_SHIFT),
> > +                       (unsigned long)dmc->size, dmc->cached_blocks);
> > +       }
> > +
> > +       dm_io_client_destroy(dmc->io_client);
> > +       vfree((void *)dmc->cache);
> > +       vfree((void *)dmc->cache_state);
> > +       vfree((void *)dmc->set_lru_next);
> 
> Search your code for casts and remove the pointless ones.  vfree() takes
> a void pointer.
> 
> > +
> > +       dm_put_device(ti, dmc->disk_dev);
> > +       dm_put_device(ti, dmc->cache_dev);
> > +       kfree(dmc);
> > +}
> > +
> > +static void rxc_status_info(struct cache_c *dmc, status_type_t type,
> > +       char *result, unsigned int maxlen)
> > +{
> > +       int sz = 0; /* DMEMIT */
> 
> Ugly.  Fix DMEMIT.
> 
> > +
> > +       DMEMIT("stats:\n\treads(%lu), writes(%lu)\n", dmc->reads,
> > +               dmc->writes);
> > +
> > +       DMEMIT("\tcache hits(%lu) replacement(%lu), write replacement(%lu)\n" \
> > +               "\tread invalidates(%lu), write invalidates(%lu)\n" \
> > +               "\tuncached reads(%lu), uncached writes(%lu)\n" \
> > +               "\tdisk reads(%lu), disk writes(%lu)\n" \
> > +               "\tcache reads(%lu), cache writes(%lu)\n",
> > +               dmc->cache_hits, dmc->replace, dmc->cache_wr_replace,
> > +               dmc->rd_invalidates, dmc->wr_invalidates,
> > +               dmc->uncached_reads, dmc->uncached_writes,
> > +               dmc->disk_reads, dmc->disk_writes,
> > +               dmc->cache_reads, dmc->cache_writes);
> > +}
> > +
> > +static void rxc_status_table(struct cache_c *dmc, status_type_t type,
> > +       char *result, unsigned int maxlen)
> > +{
> > +       int i;
> > +       int sz = 0; /* DMEMIT */
> > +
> > +       DMEMIT("conf:\n\trxd dev (%s), disk dev (%s) mode (%s)\n" \
> > +               "\tcapacity(%luM), associativity(%u), block size(%uK)\n" \
> > +               "\ttotal blocks(%lu), cached blocks(%lu)\n",
> > +               dmc->cache_devname, dmc->disk_devname, "WRITETHROUGH",
> > +               (unsigned long)dmc->size*dmc->block_size>>11, dmc->assoc,
> > +               dmc->block_size>>(10-SECTOR_SHIFT),
> > +               (unsigned long)dmc->size, dmc->cached_blocks);
> > +       DMEMIT(" Size Hist: ");
> > +       for (i = 1 ; i <= 32 ; i++) {
> > +               if (size_hist[i] > 0)
> > +                       DMEMIT("%d:%llu ", i*512, size_hist[i]);
> > +       }
> > +}
> > +
> > +/* Report cache status:
> > + *  Output cache stats upon request of device status;
> > + *  Output cache configuration upon request of table status. */
> > +static void cache_status(struct dm_target *ti, status_type_t type,
> > +       unsigned status_flags,
> > +       char *result, unsigned int maxlen)
> > +{
> > +       struct cache_c *dmc = (struct cache_c *) ti->private;
> > +
> > +       DPRINTK("%s: debug: entering %s\n", RxPREFIX, __func__);
> > +
> > +       switch (type) {
> > +       case STATUSTYPE_INFO:
> > +               rxc_status_info(dmc, type, result, maxlen);
> > +               break;
> > +       case STATUSTYPE_TABLE:
> > +               rxc_status_table(dmc, type, result, maxlen);
> > +               break;
> > +       }
> > +}
> > +
> > +static struct target_type cache_target = {
> > +       .name           = "rxcache",
> > +       .version        = {3, 0, 0},
> > +       .module         = THIS_MODULE,
> > +       .ctr            = cache_ctr,
> > +       .dtr            = cache_dtr,
> > +       .map            = rxc_map,
> > +       .status         = cache_status,
> > +};
> > +
> > +int __init rxc_init(void)
> > +{
> > +       int r;
> 
> r is not a common name for return values.  Use "int ret".
> 
> 
> > +
> > +       DPRINTK("%s: debug: entering %s\n", RxPREFIX, __func__);
> > +
> > +       r = jobs_init();
> > +       if (r)
> > +               return r;
> > +
> > +       _kcached_wq = create_singlethread_workqueue("kcached");
> > +       if (!_kcached_wq) {
> > +               DMERR("failed to start kcached");
> 
> Most kernel functions handle their own error messages.  No need for this
> one, I bet.
> 
> 
> > +               return -ENOMEM;
> > +       }
> > +
> > +       INIT_WORK(&_kcached_work, do_work);
> > +
> > +       for (r = 0 ; r < 33 ; r++)
> > +               size_hist[r] = 0;
> 
> This shouldn't reuse "r".  It should be "int i".  I stands for iterator.
> Actually just mem_zero().
> 
> 
> 
> > +       r = dm_register_target(&cache_target);
> > +       if (r < 0) {
> > +               DMERR("cache: register failed %d", r);
> 
> Delete.
> 
> > +               return r;
> > +       }
> > +       pr_info("%s: Module successfully loaded.\n", RxPREFIX);
> 
> Delete.
> 
> > +       return r;
> 
>       return 0;
> 
> > +}
> > +
> > +void rxc_exit(void)
> > +{
> > +       DPRINTK("%s: debug: entering %s\n", RxPREFIX, __func__);
> 
> Delete.  Use ftrace for debugging.
> 
> > +       dm_unregister_target(&cache_target);
> > +       jobs_exit();
> > +       destroy_workqueue(_kcached_wq);
> > +       pr_info("%s: Module successfully unloaded.\n", RxPREFIX);
> 
> 
> Delete.
> 
> > +}
> > +
> > +
> > +module_init(rxc_init);
> > +module_exit(rxc_exit);
> > +
> > +MODULE_LICENSE("GPL");
> > +MODULE_AUTHOR(DRIVER_AUTHOR);
> > +MODULE_DESCRIPTION(DRIVER_DESC);
> > +MODULE_VERSION(VERSION_STR);
> > +MODULE_INFO(Copyright, COPYRIGHT);
> > diff -uNpr linux-next.orig/drivers/staging/rapiddisk/rxcommon.h linux-next/drivers/staging/rapiddisk/rxcommon.h
> > --- linux-next.orig/drivers/staging/rapiddisk/rxcommon.h        1969-12-31 18:00:00.000000000 -0600
> > +++ linux-next/drivers/staging/rapiddisk/rxcommon.h     2015-05-31 13:36:08.753757911 -0500
> > @@ -0,0 +1,22 @@
> > +/*******************************************************
> > + ** Copyright (c) 2011-2015 Petros Koutoupis
> > + ** All rights reserved.
> > + **
> > + ** filename: rxcommon.h
> > + ** created: 12Feb12, petros@xxxxxxxxxxxxxxxxxxx
> > + **
> > + ** This file is licensed under GPLv2.
> > + **
> > + ******************************************************/
> > +
> > +#include <linux/device.h>
> > +
> > +#define VERSION_STR    "3.0"
> > +#define COPYRIGHT      "Copyright 2014 - 2015 Petros Koutoupis"
> > +#define DRIVER_AUTHOR  "Petros Koutoupis <petros@xxxxxxxxxxxxxxxxxxx>"
> 
> Don't use defines.
> 
> > +
> > +#if defined RxDEBUG
> > +#define DPRINTK(s, arg...) printk(s "\n", ##arg)
> > +#else
> > +#define DPRINTK(s, arg...)
> > +#endif
> > diff -uNpr linux-next.orig/drivers/staging/rapiddisk/rxdsk.c linux-next/drivers/staging/rapiddisk/rxdsk.c
> > --- linux-next.orig/drivers/staging/rapiddisk/rxdsk.c   1969-12-31 18:00:00.000000000 -0600
> > +++ linux-next/drivers/staging/rapiddisk/rxdsk.c        2015-05-31 13:36:08.753757911 -0500
> > @@ -0,0 +1,799 @@
> > +/*******************************************************
> > + ** Copyright (c) 2011-2015 Petros Koutoupis
> > + ** All rights reserved.
> > + **
> > + ** filename: rdsk.c
> > + ** description: RapidDisk is an enhanced Linux RAM disk
> > + **      module to dynamically create, remove, and resize
> > + **      RAM drives. RapidDisk supports both volatile
> > + **     and non-volatile memory.
> > + ** created: 1Jun11, petros@xxxxxxxxxxxxxxxxxxx
> > + **
> > + ** This file is licensed under GPLv2.
> > + **
> > + ******************************************************/
> > +
> > +#include <linux/init.h>
> > +#include <linux/module.h>
> > +#include <linux/moduleparam.h>
> > +#include <linux/version.h>
> > +#include <linux/blkdev.h>
> > +#include <linux/bio.h>
> > +#include <linux/fs.h>
> > +#include <linux/hdreg.h>
> > +#include <linux/proc_fs.h>
> > +#include <linux/errno.h>
> > +#include <linux/radix-tree.h>
> > +#include <asm/io.h>
> > +#include "rxcommon.h"
> > +
> > +#define RxPREFIX       "rxd"
> > +#define DRIVER_DESC    "RapidDisk (rxdsk) is an enhanced RAM disk block device driver."
> > +#define DEVICE_NAME    "rxd"
> > +#define PROC_NODE      "rxctl"
> > +#define BYTES_PER_SECTOR       512
> > +#define MAX_RxDISKS            128
> > +#define DEFAULT_MAX_SECTS      127
> > +#define DEFAULT_REQUESTS       128
> > +
> > +#define FREE_BATCH             16
> > +#define SECTOR_SHIFT           9
> > +#define PAGE_SECTORS_SHIFT     (PAGE_SHIFT - SECTOR_SHIFT)
> > +#define PAGE_SECTORS           (1 << PAGE_SECTORS_SHIFT)
> > +
> > +#define NON_VOLATILE_MEMORY    0
> > +#define VOLATILE_MEMORY                1
> > +
> > +/* ioctls */
> > +#define INVALID_CDQUERY_IOCTL  0x5331
> > +#define RXD_GET_STATS          0x0529
> > +
> > +
> > +static DEFINE_MUTEX(rxproc_mutex);
> > +static DEFINE_MUTEX(rxioctl_mutex);
> > +
> > +struct rdsk_device {
> > +       int num;
> > +       bool volatile_memory;
> > +       struct request_queue *rdsk_queue;
> > +       struct gendisk *rdsk_disk;
> > +       struct list_head rdsk_list;
> > +       unsigned long long max_blk_alloc;
> > +       unsigned long long start_addr;
> > +       unsigned long long end_addr;
> > +       unsigned long long size;
> > +       unsigned long error_cnt;
> > +       spinlock_t rdsk_lock;
> > +       struct radix_tree_root rdsk_pages;
> > +};
> > +
> > +int rdsk_ma_no = 0, rxcnt = 0;
> > +static int max_rxcnt = MAX_RxDISKS;
> > +static int max_sectors = DEFAULT_MAX_SECTS, nr_requests = DEFAULT_REQUESTS;
> > +struct proc_dir_entry *rx_proc = NULL;
> > +static LIST_HEAD(rdsk_devices);
> > +
> > +/*
> > + * Supported insmod params
> > + **/
> > +module_param(max_rxcnt, int, S_IRUGO);
> > +MODULE_PARM_DESC(max_rxcnt, " Total RAM Disk devices available for use. (Default = 128)");
> > +module_param(max_sectors, int, S_IRUGO);
> > +MODULE_PARM_DESC(max_sectors, " Maximum sectors (in KB) for the request queue. (Default = 127)");
> > +module_param(nr_requests, int, S_IRUGO);
> > +MODULE_PARM_DESC(nr_requests, " Number of requests at a given time for the request queue. (Default = 128)");
> > +
> > +/*
> > + * Function Declarations
> > + **/
> > +static ssize_t read_proc(struct file *, char __user *, size_t, loff_t *);
> > +static ssize_t write_proc(struct file *, const char __user *, size_t, loff_t *);
> > +static int rdsk_do_bvec(struct rdsk_device *, struct page *, unsigned int, unsigned int, int, sector_t);
> > +static int rdsk_ioctl(struct block_device *, fmode_t, unsigned int, unsigned long);
> > +static void rdsk_make_request(struct request_queue *, struct bio *);
> > +static int attach_device(int, int); /* disk num, disk size */
> > +static int attach_nv_device(int, unsigned long long, unsigned long long); /* disk num, start / end addr */
> > +static int detach_device(int);   /* disk num */
> > +static int resize_device(int, int); /* disk num, disk size */
> > +
> > +
> > +/*
> > + * Functions Definitions
> > + **/
> > +static ssize_t write_proc(struct file *file, const char __user *buffer,
> > +       size_t count, loff_t *data)
> > +{
> > +       int num, size, err = (int)count;
> > +       char *ptr, *buf;
> > +       unsigned long long start_addr, end_addr;
> > +
> > +       DPRINTK("%s: debug: entering %s\n", RxPREFIX, __func__);
> 
> Delete.
> 
> > +
> > +       mutex_lock(&rxproc_mutex);
> > +
> > +       if (!buffer || count > PAGE_SIZE)
> > +               return -EINVAL;
> 
> Returning under lock.
> 
> > +
> > +       buf = (char *)__get_free_page(GFP_KERNEL);
> 
> Do the allocation outside the lock.
> 
> > +       if (!buf)
> > +               return -ENOMEM;
> > +
> > +       if (copy_from_user(buf, buffer, count))
> > +               return -EFAULT;
> 
> memory leak.
> 
> > +
> > +       if (!strncmp("rxdsk attach ", buffer, 13)) {
> > +               ptr = buf + 13;
> > +               num = simple_strtoul(ptr, &ptr, 0);
> > +               size = simple_strtoul(ptr + 1, &ptr, 0);
> > +
> > +               if (attach_device(num, size) != 0) {
> > +                       pr_info("%s: Unable to attach rxd%d\n", RxPREFIX, num);
> > +                       err = -EINVAL;
> 
>       goto unlock;
> 
> 
> Anyway, you're not allowed to add /proc files.  Add it to sysfs or
> debugfs.
> 
> Hopefully, I've given you enough to work on.  I will review the rest
> after you resend.  If I complained once, it means find all similar code
> and fix it as well.
> 
> regards,
> dan carpenter
> 
_______________________________________________
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxx
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel