On Fri, Apr 26, 2024 at 09:25:53AM +0800, Dongsheng Yang wrote: > > > 在 2024/4/24 星期三 下午 11:14, Gregory Price 写道: > > On Wed, Apr 24, 2024 at 02:33:28PM +0800, Dongsheng Yang wrote: > > > > > > > > > 在 2024/4/24 星期三 下午 12:29, Dan Williams 写道: > > > > Dongsheng Yang wrote: > > > > > From: Dongsheng Yang <dongsheng.yang.linux@xxxxxxxxx> > > > > > > > > > > Hi all, > > > > > This patchset introduce cbd (CXL block device). It's based on linux 6.8, and available at: > > > > > https://github.com/DataTravelGuide/linux > > > > > > > > > [..] > > > > > (4) dax is not supported yet: > > > > > same with famfs, dax device is not supported here, because dax device does not support > > > > > dev_dax_iomap so far. Once dev_dax_iomap is supported, CBD can easily support DAX mode. > > > > > > > > I am glad that famfs is mentioned here, it demonstrates you know about > > > > it. However, unfortunately this cover letter does not offer any analysis > > > > of *why* the Linux project should consider this additional approach to > > > > the inter-host shared-memory enabling problem. > > > > > > > > To be clear I am neutral at best on some of the initiatives around CXL > > > > memory sharing vs pooling, but famfs at least jettisons block-devices > > > > and gets closer to a purpose-built memory semantic. > > > > > > > > So my primary question is why would Linux need both famfs and cbd? I am > > > > sure famfs would love feedback and help vs developing competing efforts. > > > > > > Hi, > > > Thanks for your reply, IIUC about FAMfs, the data in famfs is stored in > > > shared memory, and related nodes can share the data inside this file system; > > > whereas cbd does not store data in shared memory, it uses shared memory as a > > > channel for data transmission, and the actual data is stored in the backend > > > block device of remote nodes. In cbd, shared memory works more like network > > > to connect different hosts. > > > > > > > Couldn't you basically just allocate a file for use as a uni-directional > > buffer on top of FAMFS and achieve the same thing without the need for > > additional kernel support? Similar in a sense to allocating a file on > > network storage and pinging the remote host when it's ready (except now > > it's fast!) > > I'm not entirely sure I follow your suggestion. I guess it means that cbd > would no longer directly manage the pmem device, but allocate files on famfs > to transfer data. I didn't do it this way because I considered at least a > few points: one of them is, cbd_transport actually requires a DAX device to > access shared memory, and cbd has very simple requirements for space > management, so there's no need to rely on a file system layer, which would > increase architectural complexity. > > However, we still need cbd_blkdev to provide a block device, so it doesn't > achieve "achieve the same without the need for additional kernel support". > > Could you please provide more specific details about your suggestion? Fundamentally you're shuffling bits from one place to another, the ultimate target is storage located on another device as opposed to the memory itself. So you're using CXL as a transport medium. Could you not do the same thing with a file in FAMFS, and put all of the transport logic in userland? Then you'd just have what looks like a kernel bypass transport mechanism built on top of a file backed by shared memory. Basically it's unclear to me why this must be done in the kernel. Performance? Explicit bypass? Some technical reason I'm missing? Also, on a tangential note, you're using pmem/qemu to emulate the behavior of shared CXL memory. You should probably explain the coherence implications of the system more explicitly. The emulated system implements what amounts to hardware-coherent memory (i.e. the two QEMU machines run on the same physical machine, so coherency is managed within the same coherence domain). If there is no explicit coherence control in software, then it is important to state that this system relies on hardware that implements snoop back-invalidate (which is not a requirement of a CXL 3.x device, just a feature described by the spec that may be implemented). ~Gregory