On Mon, Feb 18, 2019 at 2:50 AM Johannes Thumshirn <jthumshirn@xxxxxxx> wrote: > > On 16/02/2019 06:39, Dave Chinner wrote: > [..] > > >> We've supported this since mid 2018 and commit ba23cba9b3bd ("fs: > >> allow per-device dax status checking for filesystems"). That is, > >> we can have DAX on the XFS RT device indepently of the data device. > >> > >> That is, you set up pmem in three segments - two small identical > >> segments start get mirrored with RAID1 as the data device, and > >> the remainder as a block device that is dax capable set up as the > >> XFS realtime device. Set the RTINHERIT bit on the root directory at > >> mkfs time ("-d rtinherit=1") and then all the data goes to the DAX > >> capable realtime device, and all the metadata goes to the software > >> raided pmem block devices that aren't DAX capable. > >> > >> Problem already solved, yes? > > > > Sorry, this was meant to be a reply to Dan's email commenting about > > some people needing mirrored metadata, not the parent that was > > talking about whole device RAID... > > > > i.e. mirrored metadata w/ FS-DAX for data should already be a solved > > problem... > > Trying to answer you both. > > But deferring the data redundancy to the application sounds like a no-go > to me, sorry. We don't do that for "traditional" block storage (SCSI, > NVMe, you name it). Some applications might already be able to handle it > but definitively not all. I don't see your random DBMS like MariaDB or > Postgres already doing data duplication over interleave sets of NV-DIMMs. Oh, definitely agreed. I was just saying for the subset of applications that *do* perform application level redundancy the lack of metadata redundancy was a liability. > And if you carve out a bit of your pmem space into an own namespace for > the metadata (did I understand you right here?) you still have the > problem that all data written to the DIMMs is interleaved in an > interleave set, if I understand it correctly. > > So if one DIMM in your interleave set goes bad, you're lost anyways. Yes, if you want to be able to survive the loss of a single-DIMM then you need to disable interleaving and RAID across the DIMMs. However, once you do that, dax for data can't work by definition, but RAID for metadata would work.