Re: [PATCH 1/2] block: Add support for atomic writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Quoting Jeff Moyer (2013-11-07 10:43:41)
> Chris Mason <chris.mason@xxxxxxxxxxxx> writes:
> 
> > Unfortunately, it's hard to say.  I think the fusionio cards are the
> > only shipping devices that support this, but I've definitely heard that
> > others plan to support it as well.  mariadb/percona already support the
> > atomics via fusionio specific ioctls, and turning that into a real
> > O_ATOMIC is a priority so other hardware can just hop on the train.
> >
> > This feature in general is pretty natural for the log structured squirrels
> > they stuff inside flash, so I'd expect everyone to support it.  Matthew,
> > how do you feel about all of this?
> >
> > With the fusionio drivers, we've recently increased the max atomic size.
> > It's basically 1MB, disjoint or contig doesn't matter.  We're powercut
> > safe at 1MB.
> >
> >> 
> >> Basically, I'd like to avoid requiring a trial and error programming
> >> model to determine what an application can expect to work (like we have
> >> with O_DIRECT right now).
> >
> > I'm really interested in ideas on how to provide that.  But, with dm,
> > md, and a healthy assortment of flash vendors, I don't know how...
> 
> Well, we have control over dm and md, so I'm not worried about that.
> For the storage vendors, we'll have to see about influencing the
> standards bodies.
> 
> The way I see it, there are 3 pieces of information that are required:
> 1) minimum size that is atomic (likely the physical block size, but
>    maybe the logical block size?)
> 2) maximum size that is atomic (multiple of minimum size)
> 3) whether or not discontiguous ranges are supported
> 
> Did I miss anything?

It'll vary from vendor to vendor.  A discontig range of two 512KB areas
is different from 256 distcontig 4KB areas.

And it's completely dependent on filesystem fragmentation.  So, a given
IO might pass for one file and fail for the next.

In a DM/MD configuration, an atomic IO inside a single stripe on raid0
could succeed while it will fail if it spans two stripes to two
different devices.

> 
> > I've attached my current test program.  The basic idea is to fill
> > buffers (1MB in size) with a random pattern.  Each buffer has a
> > different random pattern.
> >
> > You let it run for a while and then pull the plug.  After the box comes
> > back up, run the program again and it looks for consistent patterns
> > filling each 1MB aligned region in the file.
> [snip]
> > In order to reliably find torn blocks without O_ATOMIC, I had to bump
> > the write size to 1MB and run 24 instances in parallel. 
> 
> Thanks for the program (I actually have my own setup for verifying torn
> writes, the veritable dainto[1], which nobody uses).  Just to be certain,
> you did bump /sys/block/<dev>/queue/max_sectors_kb to 1MB, right?

Since the atomics patch does things as a list of bios, there's no
max_sectors_kb to worry about.  Each individual bio was only 4K.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux