On Tue, Jun 14, 2016 at 6:46 PM, Mike Snitzer <snitzer@xxxxxxxxxx> wrote: > On Tue, Jun 14 2016 at 4:19pm -0400, > Jeff Moyer <jmoyer@xxxxxxxxxx> wrote: > >> Mike Snitzer <snitzer@xxxxxxxxxx> writes: >> >> > On Tue, Jun 14 2016 at 9:50am -0400, >> > Jeff Moyer <jmoyer@xxxxxxxxxx> wrote: >> > >> >> "Kani, Toshimitsu" <toshi.kani@xxxxxxx> writes: >> >> >> >> >> I had dm-linear and md-raid0 support on my list of things to look at, >> >> >> did you have raid0 in your plans? >> >> > >> >> > Yes, I hope to extend further and raid0 is a good candidate. >> >> >> >> dm-flakey would allow more xfstests test cases to run. I'd say that's >> >> more important than linear or raid0. ;-) >> > >> > Regardless of which target(s) grow DAX support the most pressing initial >> > concern is getting the DM device stacking correct. And verifying that >> > IO that cross pmem device boundaries are being properly split by DM >> > core (via drivers/md/dm.c:__split_and_process_non_flush()'s call to >> > max_io_len). >> >> That was a tongue-in-cheek comment. You're reading way too much into >> it. >> >> >> Also, the next step in this work is to then decide how to determine on >> >> what numa node an LBA resides. We had discussed this at a prior >> >> plumbers conference, and I think the consensus was to use xattrs. >> >> Toshi, do you also plan to do that work? >> > >> > How does the associated NUMA node relate to this? Does the >> > DM requests_queue need to be setup to only allocate from the NUMA node >> > the pmem device is attached to? I recently added support for this to >> > DM. But there will likely be some code need to propagate the NUMA node >> > id accordingly. >> >> I assume you mean allocate memory (the volatile kind). That should work >> the same between pmem and regular block devices, no? > > This is the commit I made to train DM to be numa node aware: > 115485e83f497fdf9b4 ("dm: add 'dm_numa_node' module parameter") Hmm, but this is global for all DM device instances. > As is the DM code is focused on memory allocations. But I think blk-mq > may use the NUMA node for via tag_set->numa_node. But that is moot > given pmem is bio-based right? Right. > > Steps could be taken to make all threads DM creates for a a given device > get pinned to the specified NUMA node too. I think it would be useful if a DM instance inherited the numa node from the component devices by default (assuming they're all from the same node). A "dev_to_node(disk_to_dev(disk))" conversion works for pmem devices. As far as I understand, Jeff wants to go further and have a linear span across component devices from different nodes with an interface to do an LBA-to-numa-node conversion. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html