Re: newstore direction

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ric Wheeler wrote:
On 10/23/2015 07:06 AM, Ric Wheeler wrote:
On 10/23/2015 02:21 AM, Howard Chu wrote:
Normally, best practice is to use batching to avoid paying worst case latency
>when you do a synchronous IO. Write a batch of files or appends without
fsync,
>then go back and fsync and you will pay that latency once (not per file/op).
If filesystems would support ordered writes you wouldn't need to fsync at
all. Just spit out a stream of writes and declare that batch N must be
written before batch N+1. (Note that this is not identical to "write
barriers", which imposed the same latencies as fsync by blocking all I/Os at
a barrier boundary. Ordered writes may be freely interleaved with un-ordered
writes, so normal I/O traffic can proceed unhindered. Their ordering is only
enforced wrt other ordered writes.)

One other note, the file & storage kernel people discussed using ordering
years ago. One of the issues is that the devices themselves need to support.
While S-ATA devices are portrayed as SCSI in the kernel, ATA does not (and
still does not as far as I know?) support ordered tags.

Yes, that's a bigger problem. ATA NCQ/TCQ aren't up to the job.

>>> A bit of a shame that Linux's SCSI drivers support Ordering attributes but
>>> nothing above that layer makes use of it.
>>
>> I think that if the stream on either side of the barrier is large enough,
>> using ordered tags (SCSI speak) versus doing stream1, fsync(), stream2,
>> should have the same performance.

>> Not clear to me if we could do away with an fsync to trigger a cache flush
>> here either - do SCSI ordered tags require that the writes be acknowledged
>> only when durable, or can the device ack them once the target has them
>> (including in a volatile write cache)?

fsync() is too blunt a tool; its use gives you both C and D of ACID (Consistency and Durability). Ordered tags give you Consistency; there are lots of applications that can live without perfect Durability but losing Consistency is a major headache.

If the stream of writes is large enough, you could omit fsync because everything is being forced out of the cache to disk anyway. In that scenario, the only thing that matters is that the writes get forced out in the order you intended, so that an interruption or crash leaves you in a known (or knowable) state vs unknown.

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux