On Tue, 2016-06-14 at 11:41 -0400, Mike Snitzer wrote: > On Tue, Jun 14 2016 at 9:50am -0400, > Jeff Moyer <jmoyer@xxxxxxxxxx> wrote: > > "Kani, Toshimitsu" <toshi.kani@xxxxxxx> writes: > > > > I had dm-linear and md-raid0 support on my list of things to look > > > > at, did you have raid0 in your plans? > > > > > > Yes, I hope to extend further and raid0 is a good candidate. > > > > dm-flakey would allow more xfstests test cases to run. I'd say that's > > more important than linear or raid0. ;-) > > Regardless of which target(s) grow DAX support the most pressing initial > concern is getting the DM device stacking correct. And verifying that > IO that cross pmem device boundaries are being properly split by DM > core (via drivers/md/dm.c:__split_and_process_non_flush()'s call to > max_io_len). Agreed. I've briefly tested stacking and it seems working fine. As for IO crossing pmem device boundaries, __split_and_process_non_flush() is used when the device is mounted without DAX option. With DAX, this case is handled by dm_blk_direct_access() that limits return size. This leads the caller to iterate (read/write) or fallback to a smaller size (mmap pfault). > My hope is to nail down the DM core and its dependencies in block etc. > Doing so in terms of dm-linear doesn't seem like wasted effort > considering you told me it'd be useful to have for pmem devices. Yes, I think dm-linear is useful as it gives more flexibility, ex. it allows creating a large device with multiple pmem devices. > > Also, the next step in this work is to then decide how to determine on > > what numa node an LBA resides. We had discussed this at a prior > > plumbers conference, and I think the consensus was to use xattrs. > > Toshi, do you also plan to do that work? > > How does the associated NUMA node relate to this? Does the > DM requests_queue need to be setup to only allocate from the NUMA node > the pmem device is attached to? I recently added support for this to > DM. But there will likely be some code need to propagate the NUMA node > id accordingly. Each pmem device has sysfs "numa_node" so that tools like numactl can be used to bind application to run on the same locality as pmem device (since CPU directly accesses to pmem). This won't work well with mapped device since it can be composed with multiple localities. Locality info would need to be managed file-basis as Jeff mentioned. Thanks, -Toshi��.n��������+%������w��{.n�����{����w��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f