Re: Another cache target

Heinz Mauelshagen <heinzm@xxxxxxxxxx> · Mon, 17 Dec 2012 17:54:57 +0100

Darrick,

please try attached patch, which is on my 
git@xxxxxxxxxx:lvmguy/linux-2.6, branch thin-dev_Work as well.
Does that fix the issue for you?

Thanks,
Heinz

On 12/14/2012 02:16 AM, Darrick J. Wong wrote:
On Thu, Dec 13, 2012 at 04:57:15PM -0500, Mike Snitzer wrote:
On Thu, Dec 13 2012 at  3:19pm -0500,
Joe Thornber <ejt@xxxxxxxxxx> wrote:

Here's a cache target that Heinz Mauelshagen, Mike Snitzer and I
have been working on.

It's also available in the thin-dev branch of my git tree:

git@xxxxxxxxxx:jthornber/linux-2.6.git
This url is best for others to clone from:
git://github.com/jthornber/linux-2.6.git

The main features are a plug-in architecture for policies which decide
what data gets cached, and reuse of the metadata library from the thin
provisioning target.
It should be noted that there are more cache replacement policies
available in Joe's thin-dev branch via the "basic" policy, see:
drivers/md/dm-cache-policy-basic.c

(these basic policies include fifo, lru, lfu, and many more)
  
These patches apply on top of the dm patches that agk has got queued
for 3.8.
agk's patches are here:
http://people.redhat.com/agk/patches/linux/editing/series.html

But agk hasn't staged all the required patches yet.  I've imported agk's
editing tree (and a couple other required patches that I previously
posted to dm-devel, which aren't yet in agk's tree) into the
'dm-for-3.8' branch on my github tree here:
git://github.com/snitm/linux.git

This 8 patch patchset from Joe should apply cleanly ontop of my
'dm-for-3.8' branch.

But if all you care about is a tree with all the changes then please
just use Joe's github 'thin-dev' branch.
A full list of broken-out patches would've been nice, but oh well, I ate this
git tree. :)

Curiously, the Documentation/device-mapper/dm-cache.txt says to specify devices
in the order: metadata, origin, and cache, but the code (and Joe's mail) seeem
to want metadata, cache, origin.  This sort of makes me wonder what's going on?

Also, I found a bug when using the mru policy.  If I do this:

<set up a scsi_debug "ssd" with a 448M /dev/sda1 for cache and the rest for
  metadata on /dev/sda2>
# echo 0 67108864 cache /dev/sda2 /dev/sda1 /dev/vda 512 0 mru 0 | dmsetup create fubar
...<use fubar, fill up the cache>...
# dmsetup remove fubar
# echo 0 67108864 cache /dev/sda2 /dev/sda1 /dev/vda 512 0 mru 0 | dmsetup create fubar

I see the following crash in dmesg:

[  426.661458] scsi1 : scsi_debug, version 1.82 [20100324], dev_size_mb=512, opts=0x0
[  426.663955] scsi 1:0:0:0: Direct-Access     Linux    scsi_debug       0004 PQ: 0 ANSI: 5
[  426.667005] sd 1:0:0:0: Attached scsi generic sg0 type 0
[  426.667020] sd 1:0:0:0: [sda] 1048576 512-byte logical blocks: (536 MB/512 MiB)
[  426.667046] sd 1:0:0:0: [sda] Write Protect is off
[  426.667057] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[  426.667203]  sda: unknown partition table
[  426.667311] sd 1:0:0:0: [sda] Attached SCSI disk
[  426.694055]  sda: sda1 sda2
[  448.155368] bio: create slab <bio-1> at 1
[  460.762930] promote thresholds = 65/4 queue stats = 1/0
[  468.121084] promote thresholds = 65/4 queue stats = 1/1
[  471.970865] dm-cache statistics:
[  471.974809] read hits:	887895
[  471.976948] read misses:	499
[  471.978195] write hits:	0
[  471.979380] write misses:	0
[  471.980716] demotions:	7
[  471.982391] promotions:	1799
[  471.983798] copies avoided:	7
[  471.985137] cache cell clashs:	0
[  471.986886] commits:		1653
[  471.988410] discards:		0
[  474.177476] bio: create slab <bio-1> at 1
[  474.206000] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[  474.209037] IP: [<ffffffffa01b1aad>] queue_evict_default+0x1d/0x50 [dm_cache_basic]
[  474.209969] PGD 0
[  474.209969] Oops: 0002 [#1] PREEMPT SMP
[  474.209969] Modules linked in: scsi_debug dm_cache_basic dm_cache_mq dm_cache dm_bio_prison dm_persistent_data dm_bufio crc_t10dif nfsv4 sch_fq_codel eeprom nfsd auth_rpcgss exportfs af_packet btrfs zlib_deflate libcrc32c [last unloaded: scsi_debug]
[  474.209969] CPU 0
[  474.209969] Pid: 1285, comm: kworker/u:2 Not tainted 3.7.0-dmcache #1 Bochs Bochs
[  474.209969] RIP: 0010:[<ffffffffa01b1aad>]  [<ffffffffa01b1aad>] queue_evict_default+0x1d/0x50 [dm_cache_basic]
[  474.209969] RSP: 0018:ffff880055641be8  EFLAGS: 00010282
[  474.209969] RAX: ffff880073a85eb0 RBX: ffff880037ca5c00 RCX: 0000000000000000
[  474.209969] RDX: 0000000000000000 RSI: 0007fff80005ffff RDI: ffff880073a85eb0
[  474.209969] RBP: ffff880055641be8 R08: e000000000000000 R09: ffff880072d619a0
[  474.209969] R10: 0000000000000034 R11: fffffff80005ffff R12: ffff880037f33d30
[  474.209969] R13: ffff880037ca5c78 R14: ffff880055641c98 R15: 000000000001ffff
[  474.209969] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[  474.209969] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  474.209969] CR2: 0000000000000008 CR3: 0000000001a0c000 CR4: 00000000000407f0
[  474.209969] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  474.209969] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  474.209969] Process kworker/u:2 (pid: 1285, threadinfo ffff880055640000, task ffff88007cb62de0)
[  474.209969] Stack:
[  474.209969]  ffff880055641c58 ffffffffa01b28a4 0000000000000040 0000000000000286
[  474.209969]  ffff880000000000 ffffffffa017658c 0000000000000000 ffff880155641cd0
[  474.209969]  ffff880055641c58 ffff88007cac7400 ffff880055641d50 ffff880037f33d30
[  474.209969] Call Trace:
[  474.209969]  [<ffffffffa01b28a4>] basic_map+0x484/0x708 [dm_cache_basic]
[  474.209969]  [<ffffffffa017658c>] ? dm_bio_detain+0x5c/0x80 [dm_bio_prison]
[  474.209969]  [<ffffffffa019c221>] process_bio+0x101/0x4c0 [dm_cache]
[  474.209969]  [<ffffffffa019cb4f>] do_worker+0x56f/0x630 [dm_cache]
[  474.209969]  [<ffffffff81081ab6>] ? finish_task_switch+0x56/0xb0
[  474.209969]  [<ffffffff8106fa31>] process_one_work+0x121/0x490
[  474.209969]  [<ffffffffa019c5e0>] ? process_bio+0x4c0/0x4c0 [dm_cache]
[  474.209969]  [<ffffffff81070be5>] worker_thread+0x165/0x3f0
[  474.209969]  [<ffffffff81070a80>] ? manage_workers+0x2a0/0x2a0
[  474.209969]  [<ffffffff81076010>] kthread+0xc0/0xd0
[  474.209969]  [<ffffffff81075f50>] ? flush_kthread_worker+0xb0/0xb0
[  474.209969]  [<ffffffff815680ac>] ret_from_fork+0x7c/0xb0
[  474.209969]  [<ffffffff81075f50>] ? flush_kthread_worker+0xb0/0xb0
[  474.209969] Code: de 48 89 47 08 48 89 f8 5d c3 0f 0b 66 90 66 66 66 66 90 55 48 8b bf f8 01 00 00 48 89 e5 e8 ab ff ff ff 48 8b 48 28 48 8b 50 30 <48> 89 51 08 48 89 0a 48 ba 00 01 10 00 00 00 ad de 48 b9 00 02
[  474.209969] RIP  [<ffffffffa01b1aad>] queue_evict_default+0x1d/0x50 [dm_cache_basic]
[  474.209969]  RSP <ffff880055641be8>
[  474.209969] CR2: 0000000000000008
[  474.333040] ---[ end trace 20dda5f362594054 ]---
[  474.336010] BUG: unable to handle kernel paging request at ffffffffffffffd8
[  474.336680] IP: [<ffffffff810761f0>] kthread_data+0x10/0x20
[  474.336680] PGD 1a0e067 PUD 1a0f067 PMD 0
[  474.336680] Oops: 0000 [#2] PREEMPT SMP
[  474.336680] Modules linked in: scsi_debug dm_cache_basic dm_cache_mq dm_cache dm_bio_prison dm_persistent_data dm_bufio crc_t10dif nfsv4 sch_fq_codel eeprom nfsd auth_rpcgss exportfs af_packet btrfs zlib_deflate libcrc32c [last unloaded: scsi_debug]
[  474.336680] CPU 0
[  474.336680] Pid: 1285, comm: kworker/u:2 Tainted: G      D      3.7.0-dmcache #1 Bochs Bochs
[  474.336680] RIP: 0010:[<ffffffff810761f0>]  [<ffffffff810761f0>] kthread_data+0x10/0x20
[  474.336680] RSP: 0018:ffff8800556417a8  EFLAGS: 00010096
[  474.336680] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81bb2f80
[  474.336680] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88007cb62de0
[  474.336680] RBP: ffff8800556417a8 R08: 0000000000000001 R09: 0000000000000083
[  474.336680] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[  474.336680] R13: ffff88007cb631d0 R14: 0000000000000000 R15: 0000000000000001
[  474.336680] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[  474.336680] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  474.336680] CR2: ffffffffffffffd8 CR3: 0000000001a0c000 CR4: 00000000000407f0
[  474.336680] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  474.336680] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  474.336680] Process kworker/u:2 (pid: 1285, threadinfo ffff880055640000, task ffff88007cb62de0)
[  474.336680] Stack:
[  474.336680]  ffff8800556417c8 ffffffff81071445 ffff8800556417c8 ffff88007fc12880
[  474.336680]  ffff880055641848 ffffffff81565a58 ffff8800556417f8 ffff880037daeba0
[  474.336680]  ffff88007cb62de0 ffff880055641fd8 ffff880055641fd8 ffff880055641fd8
[  474.336680] Call Trace:
[  474.336680]  [<ffffffff81071445>] wq_worker_sleeping+0x15/0xc0
[  474.336680]  [<ffffffff81565a58>] __schedule+0x5f8/0x7c0
[  474.336680]  [<ffffffff81565d39>] schedule+0x29/0x70
[  474.336680]  [<ffffffff81057748>] do_exit+0x678/0x9e0
[  474.336680]  [<ffffffff8155fe50>] ? printk+0x4d/0x4f
[  474.336680]  [<ffffffff8100662b>] oops_end+0xab/0xf0
[  474.336680]  [<ffffffff8155f7a6>] no_context+0x201/0x210
[  474.336680]  [<ffffffff8155f986>] __bad_area_nosemaphore+0x1d1/0x1f0
[  474.336680]  [<ffffffff8110ba75>] ? mempool_kmalloc+0x15/0x20
[  474.336680]  [<ffffffff8155f9b8>] bad_area_nosemaphore+0x13/0x15
[  474.336680]  [<ffffffff810311a2>] __do_page_fault+0x322/0x4d0
[  474.336680]  [<ffffffff8111109f>] ? get_page_from_freelist+0x1bf/0x460
[  474.336680]  [<ffffffff81335eca>] ? virtblk_request+0x44a/0x460
[  474.336680]  [<ffffffff81232d56>] ? cpumask_next_and+0x36/0x50
[  474.336680]  [<ffffffff81232d56>] ? cpumask_next_and+0x36/0x50
[  474.336680]  [<ffffffff8108fa53>] ? update_sd_lb_stats+0x123/0x610
[  474.336680]  [<ffffffff8103138e>] do_page_fault+0xe/0x10
[  474.336680]  [<ffffffff8102e425>] do_async_page_fault+0x35/0xa0
[  474.336680]  [<ffffffff81567925>] async_page_fault+0x25/0x30
[  474.336680]  [<ffffffffa01b1aad>] ? queue_evict_default+0x1d/0x50 [dm_cache_basic]
[  474.336680]  [<ffffffffa01b1aa5>] ? queue_evict_default+0x15/0x50 [dm_cache_basic]
[  474.336680]  [<ffffffffa01b28a4>] basic_map+0x484/0x708 [dm_cache_basic]
[  474.336680]  [<ffffffffa017658c>] ? dm_bio_detain+0x5c/0x80 [dm_bio_prison]
[  474.336680]  [<ffffffffa019c221>] process_bio+0x101/0x4c0 [dm_cache]
[  474.336680]  [<ffffffffa019cb4f>] do_worker+0x56f/0x630 [dm_cache]
[  474.336680]  [<ffffffff81081ab6>] ? finish_task_switch+0x56/0xb0
[  474.336680]  [<ffffffff8106fa31>] process_one_work+0x121/0x490
[  474.336680]  [<ffffffffa019c5e0>] ? process_bio+0x4c0/0x4c0 [dm_cache]
[  474.336680]  [<ffffffff81070be5>] worker_thread+0x165/0x3f0
[  474.336680]  [<ffffffff81070a80>] ? manage_workers+0x2a0/0x2a0
[  474.336680]  [<ffffffff81076010>] kthread+0xc0/0xd0
[  474.336680]  [<ffffffff81075f50>] ? flush_kthread_worker+0xb0/0xb0
[  474.336680]  [<ffffffff815680ac>] ret_from_fork+0x7c/0xb0
[  474.336680]  [<ffffffff81075f50>] ? flush_kthread_worker+0xb0/0xb0
[  474.336680] Code: 00 48 89 e5 5d 48 8b 40 c8 48 c1 e8 02 83 e0 01 c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 87 98 03 00 00 55 48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
[  474.336680] RIP  [<ffffffff810761f0>] kthread_data+0x10/0x20
[  474.336680]  RSP <ffff8800556417a8>
[  474.336680] CR2: ffffffffffffffd8
[  474.336680] ---[ end trace 20dda5f362594055 ]---
[  474.336680] Fixing recursive fault but reboot is needed!
[  477.004016] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1
[  477.004016] Shutting down cpus with NMI
[  477.004016] panic occurred, switching back to text console

*Before* it crashes, though, I can run my iops exerciser and watch the numbers
climb from ~300 to ~100000.  Nice work! :)

(The default policy engine doesn't seem to have this problem, but I haven't
figured out how to make it cache blocks yet...)

--D
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel

diff --git a/drivers/md/dm-cache-policy-basic.c b/drivers/md/dm-cache-policy-basic.c
index 5843d51..a26a2c0 100644
--- a/drivers/md/dm-cache-policy-basic.c
+++ b/drivers/md/dm-cache-policy-basic.c
@@ -1088,11 +1088,10 @@ static int find_free_cblock(struct policy *p, dm_cblock_t *result)
 	return r;
 }
 
-static void add_cache_entry(struct policy *p, struct basic_cache_entry *e)
+static void alloc_cblock_insert_cache_and_count_entry(struct policy *p, struct basic_cache_entry *e)
 {
 	unsigned t, u, end = ARRAY_SIZE(e->ce.count[T_HITS]);
 
-	p->queues.fns->add(p, &e->ce.list);
 	alloc_cblock(p, e->cblock);
 	insert_cache_hash_entry(p, e);
 
@@ -1104,6 +1103,12 @@ static void add_cache_entry(struct policy *p, struct basic_cache_entry *e)
 			p->cache_count[t][u] += e->ce.count[t][u];
 }
 
+static void add_cache_entry(struct policy *p, struct basic_cache_entry *e)
+{
+	p->queues.fns->add(p, &e->ce.list);
+	alloc_cblock_insert_cache_and_count_entry(p, e);
+}
+
 static void remove_cache_entry(struct policy *p, struct basic_cache_entry *e)
 {
 	unsigned t, u, end = ARRAY_SIZE(e->ce.count[T_HITS]);
@@ -1406,6 +1411,8 @@ static void sort_in_cache_entry(struct policy *p, struct basic_cache_entry *e)
 		list_add_tail(&e->ce.list, elt);
 	else
 		list_add(&e->ce.list, elt);
+
+	queue_add_tail(&p->queues.walk, &e->walk);
 }
 
 static int basic_load_mapping(struct dm_cache_policy *pe,
@@ -1426,20 +1433,25 @@ static int basic_load_mapping(struct dm_cache_policy *pe,
 		unsigned reads, writes;
 
 		hint_to_counts(hint, &reads, &writes);
+		e->ce.count[T_HITS][0] = reads;
+		e->ce.count[T_HITS][1] = writes;
 
 		if (IS_MULTIQUEUE(p) || IS_TWOQUEUE(p) || IS_LFU_MFU_WS(p)) {
 			/* FIXME: store also in larger hints rather than making up. */
-			e->ce.count[T_HITS][0] = reads;
-			e->ce.count[T_HITS][1] = writes;
 			e->ce.count[T_SECTORS][0] = reads << p->block_shift;
 			e->ce.count[T_SECTORS][1] = writes << p->block_shift;
-			add_cache_entry(p, e);
-			p->nr_cblocks_allocated = to_cblock(from_cblock(p->nr_cblocks_allocated) + 1);
+		}
+	}
 
-		} else
-			sort_in_cache_entry(p, e);
+	if (IS_MULTIQUEUE(p) || IS_TWOQUEUE(p) || IS_LFU_MFU_WS(p))
+		add_cache_entry(p, e);
+	else {
+		sort_in_cache_entry(p, e);
+		alloc_cblock_insert_cache_and_count_entry(p, e);
 	}
 
+	p->nr_cblocks_allocated = to_cblock(from_cblock(p->nr_cblocks_allocated) + 1);
+
 	return 0;
 }
 
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel