On 10/29/2014 04:09 PM, Dave Chinner wrote: > On Wed, Oct 29, 2014 at 03:10:51PM -0600, Jens Axboe wrote: >> On 10/29/2014 02:14 PM, Dave Chinner wrote: >>> On Wed, Oct 29, 2014 at 11:23:38AM -0700, Jason B. Akers wrote: >>>> The following series enables the use of Solid State hybrid drives >>>> ATA standard 3.2 defines the hybrid information feature, which provides a means for the host driver to provide hints to the SSHDs to guide what to place on the SSD/NAND portion and what to place on the magnetic media. >>>> >>>> This implementation allows user space applications to provide the cache hints to the kernel using the existing ionice syscall. >>>> >>>> An application can pass a priority number coding up bits 11, 12, and 15 of the ionice command to form a 3 bit field that encodes the following priorities: >>>> OPRIO_ADV_NONE, >>>> IOPRIO_ADV_EVICT, /* actively discard cached data */ >>>> IOPRIO_ADV_DONTNEED, /* caching this data has little value */ >>>> IOPRIO_ADV_NORMAL, /* best-effort cache priority (default) */ >>>> IOPRIO_ADV_RESERVED1, /* reserved for future use */ >>>> IOPRIO_ADV_RESERVED2, >>>> IOPRIO_ADV_RESERVED3, >>>> IOPRIO_ADV_WILLNEED, /* high temporal locality */ >>>> >>>> For example the following commands from the user space will make dd IOs to be generated with a hint of IOPRIO_ADV_DONTNEED assuming the SSHD is /dev/sdc. >>>> >>>> ionice -c2 -n4096 dd if=/dev/zero of=/dev/sdc bs=1M count=1024 >>>> ionice -c2 -n4096 dd if=/dev/sdc of=/dev/null bs=1M count=1024 >>> >>> This looks to be the wrong way to implement per-IO priority >>> information. >>> >>> How does a filesystem make use of this to make sure it's >>> metadata ends up with IOPRIO_ADV_WILLNEED to store frequently >>> accessed metadata in flash. Conversely, journal writes need to >>> be issued with IOPRIO_ADV_DONTNEED so they don't unneceessarily >>> consume flash space as they are never-read IOs... >> >> Not disagreeing that loading more into the io priority fields is a >> bit... icky. I see why it's done, though, it requires the least amount >> of plumbing. > > Yeah, but we don't do things the easy way just because it's easy. We > do things the right way. ;) Still not disagreeing with you, merely stating that I can see why they chose to do it this way. Still doesn't change the fact that it feels like a hack, not a designed solution. >> As for the fs accessing this, the io nice fields are readily exposed >> through the ->bi_rw setting. So while the above example uses ionice to >> set a task io priority (that a bio will then inherit), nothing prevents >> you from passing it in directly from the kernel. > > Right, but now the filesystem needs to provide that on a per-inode > basis, not from the task structure as the task that is submitting > the bio is not necesarily the task doing the read/write syscall. Whomever submits the bio would need to provide it, yes. And with the disconnect for async writes, that becomes... interesting. > e.g. the write case above doesn't actually inherit the task priority > at the bio level at all because the IO is being dispatched by a > background flusher thread, not the ioniced task calling write(2). Oh yes, I realize that. > IMO using ionice is a nice hack, but utimately it looks mostly useless > from a user and application perspective as cache residency is a > property of the data being read/written, not the task doing the IO. > e.g. a database will want it's indexes in flash and bulk > data in non-cached storage. > > IOWs, to make effective use of this the task will need different > cache hints for each different type of data needs to do IO on, and > so overloading IO priorities just seems the wrong direction to be > starting from. Agree. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html