Re: the speed of file read write on USB

Greg Freemyer <greg.freemyer@xxxxxxxxx> · Tue, 12 Oct 2010 11:51:06 -0400

On Tue, Oct 12, 2010 at 10:48 AM, loody <miloody@xxxxxxxxx> wrote:
> hi:
> thanks for your kind reply :)
>
> 2010/10/12 Greg Freemyer <greg.freemyer@xxxxxxxxx>:
>> On Sun, Oct 10, 2010 at 2:08 PM, loody <miloody@xxxxxxxxx> wrote:
>>>  Dear all:
>>> I am so SORRY that I send the mail before I finish it, since I finger
>>> flip over the send button.
>>> SORRY~~~
>>>
>>>  i write a simple program as below to count the speed of writing a file
>>>  over usb, gettimeofday before writing and gettimeofday when finish
>>>  writing. but I found something make me curious.
>>>
>>>  1. my program is compiled as static and I use open, write instead of
>>>  libc function calls.
>>>
>>> 2. I use the same kernel and usb modules, kernel version is 2.6.31
>>>  the only difference is I have 2 rootfs, both of them are
>>>  cross-compiled as arm platform.
>>>  Here comes the problem.
>>>  the speed of the roofs1 I got is 8MB/s but rootfs2 is 1MB/s
>>>
>>>  my concerns are:
>>>  1. my program is build as static, that means the libs in the roofs has
>>>  nothing to do with this program.
>>>  2. my program is written with file I/O, instead of file operations
>>>  supported by c lib, that means I direct calling kernel system call for
>>>  writing data. If my assumption above are correct, that seems the
>>>  kernel make me slow on rootfs2.
>>>
>>>  3. in the beginning, I thought there maybe some other program, like
>>>  threads, running on rootfs2 which let my speed get slow.
>>>     but how could I find them on the taret?
>>>  4. if I really want to find out whether the delay comes from kernel
>>> instead of usb or other driver module, is there configs I can open to
>>> monitor the write flow and found out where it stuck?
>>> appreciate your help,
>>> miloody
>>
>> You don't describe how your flushing the cache.
>>
>> i find most out of whack benchmarks like this are caused by not
>> properly managing the cache flushing process.
>>
>> Since you wrote your own "benchmark" tool, just be sure it calls
>> fsync() before closing the file and taking your time measurement.
>>
>> Greg
>>
> I mount the usb disk with sync option and that is the reason why it is so slow.
> BTW, theoretically random r/w should be the same speed as sequential r/w right?
> for usb device it just send sequential bulk command and random ones right?
> appreciate your help,
> miloody

(Damn, I wrote a novel.  Hope you have time to read it!)

Based on your question, I assume you are talking about flash / SSD,
and the answer for those is:

Not really.  i/o patterns to flash drives in particular matter a lot
more than you imply.

Unfortunately, it gets very complicated and it is hard to get the
internal details to know what the optimum i/o pattern is.   And it
varies from one flash design to another.  Only with the high-end SSDs
can you get to a point that i/o pattern is more or less unimportant.
But they have speeds above 8MB/sec, so I assume you are not working
with high-end SSD.

====
If this is for an embedded app that you can spec. a specific part for
and invest performance tuning time in, then you need to spend some
time characterizing the flash device you specify.

Some details you likely know, but may not have considered:

Flash drives work with erase blocks (EBs).  And EBs are 128KB fairly
often.  So I'll assume that size.

(note: Sometimes the EB size is used as the cylinder size in the CHS
(cylinder/head/sector) geometry which you can interrogate via hdparm.)

It is my very limited understanding that low-end flash devices
maintain a single mapping for the entire erase block, but aiui on
every write to the flash drive, an available erase block is allocated
to hold the new data.

Thus, on low-end flash devices, it is my understanding they have to be
erased immediately prior to use.  This erasing takes milliseconds
which is very much on the same order of time as a disk seek and is why
you see such slow performance out of low-end flash.

Basically for every write:
  new EB allocated
  new EB erased  (ie. this takes milliseconds)
  the original erase blocks data is read into a temporary buffer,
  temp buffer modified,
  temp buffer written to the newly allocated and erased EB
  logical to physical EB mapping table updated to point to the new EB
  original EB marked as free in mapping table.

All of the above is handled internal to the flash drive and some of it
likely happens in parallel.

So you can see that every write to a erase block triggers a lot of
activity that takes real world time.

So still assuming one mapping per EB, if you have a properly aligned
partition, then

     dd if=/dev/zero of=/dev/sdx1 bs=128KB

Will go at the optimum speed of the flash drive, because every i/o
updates a single EB and incurs only one EB modification cycles worth
of overhead.

But unaligned writes, would each incur 2 write cycles and could easily
be twice as slow.

Now change the bs to 4KB and it is conceivable that with a really
low-end device you will need 32 write cycles because you have not
optimized your transfer size to the flash device.

Now back to your question about random i/o versus sequential.

Let's assume your flash drive is smart enough to cache/coalesce the
above 4K writes into a single EB update and thus for a sequential
write bs=4KB and bs=128KB run at the same speed.

Now introduce random i/o with 128KB writes perfectly aligned to the
EBs.  You should see no performance degradation because every write
still triggers exactly one EB write cycle.

But now do 4KB writes randomly around the drive.  With our simplified
device this is going to trigger a EB write cycle for every 4KB write
(or more if they cross a EB boundary.)

All the above is complicated enough, but I _believe_ the next tier up
in complexity from a flash drive internals perspective is for the
flash drive to track the mappings in sub-EB allocations, so a random
write may only invalidate a portion of a EB while leaving the rest of
it valid.  And the write's are accumulated until a full new EB can be
written.

That sounds great until you realize how fast you run out of EBs if
many / most of them are only partially full as would happen with lots
of random 4KB i/o.

In that case random 4KB i/o to a new flash drive will appear great
because it has tons of free EBs to grab and use, but when the supply
of free EBs run out, you will see a drastic drop in speed because of
the sudden introduction of EB handling delays.

Overall you can see that working with low-end flash drives is very
non-deterministic and workloads are very important as is the age and
specific usage history of the device.

OTOH, if you get a high-end SSD that maintains free EB queues and
performs the erasing in the background then your original statement
that random i/o and sequential i/o should be at roughly the same speed
becomes accurate.

The Intel SSDs introduced 2 years ago were the first SSDs to offer
background EB erasing, but even then you have to worry about the SSD
running out of spare EBs to work with.  ie. If it runs out of spare
EBs it can't erase them in the background.

That is why ATA-8 introduced the trim command.

Unfortunately the linux kernel's implementation of trim (discard) is
rather poor at present.  (Maybe 2.6.37 will be better?)

My preference for now is to use the userspace script "wiper.sh" that
is included in hdparm v9.32 to trim SSDs that support trim.  (older
versions of hdparm are known buggy so you really need the latest
version or a patched older version).

Greg

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ