On Mon, Jan 06, 2020 at 07:24:53AM +0000, Sitsofe Wheeler wrote: > At Linux Plumbers 2019 Dr Richard Hipp presented a talk about SQLite > (https://youtu.be/-oP2BOsMpdo?t=5525 ). One of the slides was titled > "Things to discuss" > (https://sqlite.org/lpc2019/doc/trunk/slides/sqlite-intro.html/#6 ) > and had a few questions: > > 1. Reliable ways to discover detailed filesystem properties > 2. fbarrier() > 3. Notify the OS about unused regions in the database file > > For 1. I think Jan Kara said that supporting it was undesirable for > details like just how much additional fsync were needed due to > competing constraints (https://youtu.be/-oP2BOsMpdo?t=6063 ). Someone > mentioned there was a > patch for fsinfo to discover if you were on a network filesystem > (https://www.youtube.com/watch?v=-oP2BOsMpdo&feature=youtu.be&t=5525 > )... > For 2. there was a talk by MySQL dev Sergei Golubchik ( > https://youtu.be/-oP2BOsMpdo?t=1219 ) talking about how barriers had > been taken out and was there a replacement. In > https://youtu.be/-oP2BOsMpdo?t=1731 Chris Mason(?) seems to suggest > that the desired effect could be achieved with io_uring chaining. Even though it wasn't explicitly mentioned, I'm pretty sure that those "write barriers" for ordering groups of writes against other groups of writes are intended to be used for data integrity purposes. The problem is that data integrity writes also require any uncommitted filesytsem metadata to be written in the correct order to disk along with the data. i.e. you can write to the log file, but if the transactions during that write that allocate space and/or convert it to written space have not been committed to the journal then the data is not on stable storage and so data completion ordering cannot be relied on for integrity related operations. This is why write ordering always comes back to "you need to use fdatasync(), O_DSYNC or RWF_DSYNC" - it is the only way to guarantee the integrity of a initial data write(s) right down to the hardware before starting the new dependent write(s). Hence AIO_FSYNC and now chained operations in io_uring to allow fsync to be issues asynchronously and be used as a "write barrier" between groups of order dependent IOs... > For 3. it sounded like Jan Kara was saying there wasn't anything at > the moment (hypothetically you could introduce a call that marked the > extents as "unwritten" but it doesn't sound like you can do that You can do that with fallocate() - FALLOC_FL_ZERO_RANGE will mark the unused range as unwritten in XFS, or you can just punch a hole to free the unused space with FALLOC_FL_PUNCH_HOLE... > today) and even if you wanted to use something like TRIM it wouldn't > be worth it unless you were trimming a large (gigabytes) amount of > data (https://youtu.be/-oP2BOsMpdo?t=6330 ). Punch the space out, then run a periodic background fstrim so the filesystem can issue efficient TRIM commands over free space... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx