On Mon, Jan 21, 2019 at 5:23 AM Roman Penyaev <rpenyaev@xxxxxxx> wrote: > > Hi Ilya, > > On 2019-01-18 17:29, Ilya Dryomov wrote: > > On Fri, Jan 18, 2019 at 3:56 PM Roman Penyaev <rpenyaev@xxxxxxx> wrote: > >> > >> Hi all, > >> > >> This is an attempt to split DISCARD and WRITE_ZEROES paths on krbd > >> side > >> when REQ_NOUNMAP flag is set for a block layer request. > > > > Hi Roman, > > > > I'm working on splitting DISCARD and WRITE_ZEROES handling right now. > > The idea is to punt on small and/or unaligned discard requests which > > don't actually free up any space but translate into a RADOS zero op. > > I'm not changing how WRITE_ZEROES is implemented though, so this is > > orthogonal to your work -- just wanted to give a heads up. > > Good to know, thanks for telling me. > > >> Currently both REQ_OP_DISCARD and REQ_OP_WRITE_ZEROES block layer > >> requests > >> fall down to CEPH_OSD_OP_ZERO request, which punches holes on osd > >> side. > >> > >> With a new CEPH_OSD_OP_FLAG_ZERO_NOUNMAP flag for CEPH_OSD_OP_ZERO > >> request > >> osd can zero out blocks, instead of punching holes. > > > > REQ_NOUNMAP is just a hint, the block device is free to ignore it. > > IIRC the only way to control it from userspace is through fallocate(2): > > FALLOC_FL_PUNCH_HOLE can unmap, while FALLOC_FL_ZERO_RANGE is supposed > > to not unmap. Given that fallocate(2) on block devices is fairly new, > > I'm curious if you have an application that actually cares in mind? > > No, no. This is an attempt to follow block layer semantics, nothing > more. > Indeed, the users of REQ_NONUMAP are ioctl() and fallocate(), so the > only > practical value which comes to mind is performance (preallocate zeroed > blocks and format any fs, etc) and possible secure-erase. After some > internal discussions about performance of writing zeroes (instead of > true DISCARD) this seems does not bring any value, at least on > bluestore, > but secure wipe can make sense (for example using blkdiscard --zerouut). The zeroed writes would need to be smaller than the bluestore min alloc size for that to work. Otherwise, bluestore will just allocate a new blob extent, write zeroes to it, and pivot the object metadata to point to the new allocation. > -- > Roman > -- Jason