Re: the speed of file read write on USB

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi:

2010/10/12 Greg Freemyer <greg.freemyer@xxxxxxxxx>:
> On Tue, Oct 12, 2010 at 10:48 AM, loody <miloody@xxxxxxxxx> wrote:
>> hi:
>> thanks for your kind reply :)
>>
>> 2010/10/12 Greg Freemyer <greg.freemyer@xxxxxxxxx>:
>>> On Sun, Oct 10, 2010 at 2:08 PM, loody <miloody@xxxxxxxxx> wrote:
>>>>  Dear all:
>>>> I am so SORRY that I send the mail before I finish it, since I finger
>>>> flip over the send button.
>>>> SORRY~~~
>>>>
>>>>  i write a simple program as below to count the speed of writing a file
>>>>  over usb, gettimeofday before writing and gettimeofday when finish
>>>>  writing. but I found something make me curious.
>>>>
>>>>  1. my program is compiled as static and I use open, write instead of
>>>>  libc function calls.
>>>>
>>>> 2. I use the same kernel and usb modules, kernel version is 2.6.31
>>>>  the only difference is I have 2 rootfs, both of them are
>>>>  cross-compiled as arm platform.
>>>>  Here comes the problem.
>>>>  the speed of the roofs1 I got is 8MB/s but rootfs2 is 1MB/s
>>>>
>>>>  my concerns are:
>>>>  1. my program is build as static, that means the libs in the roofs has
>>>>  nothing to do with this program.
>>>>  2. my program is written with file I/O, instead of file operations
>>>>  supported by c lib, that means I direct calling kernel system call for
>>>>  writing data. If my assumption above are correct, that seems the
>>>>  kernel make me slow on rootfs2.
>>>>
>>>>  3. in the beginning, I thought there maybe some other program, like
>>>>  threads, running on rootfs2 which let my speed get slow.
>>>>     but how could I find them on the taret?
>>>>  4. if I really want to find out whether the delay comes from kernel
>>>> instead of usb or other driver module, is there configs I can open to
>>>> monitor the write flow and found out where it stuck?
>>>> appreciate your help,
>>>> miloody
>>>
>>> You don't describe how your flushing the cache.
>>>
>>> i find most out of whack benchmarks like this are caused by not
>>> properly managing the cache flushing process.
>>>
>>> Since you wrote your own "benchmark" tool, just be sure it calls
>>> fsync() before closing the file and taking your time measurement.
>>>
>>> Greg
>>>
>> I mount the usb disk with sync option and that is the reason why it is so slow.
>> BTW, theoretically random r/w should be the same speed as sequential r/w right?
>> for usb device it just send sequential bulk command and random ones right?
>> appreciate your help,
>> miloody
>
> (Damn, I wrote a novel.  Hope you have time to read it!)
>
> Based on your question, I assume you are talking about flash / SSD,
> and the answer for those is:
>
> Not really.  i/o patterns to flash drives in particular matter a lot
> more than you imply.
>
> Unfortunately, it gets very complicated and it is hard to get the
> internal details to know what the optimum i/o pattern is.   And it
> varies from one flash design to another.  Only with the high-end SSDs
> can you get to a point that i/o pattern is more or less unimportant.
> But they have speeds above 8MB/sec, so I assume you are not working
> with high-end SSD.
>
> ====
> If this is for an embedded app that you can spec. a specific part for
> and invest performance tuning time in, then you need to spend some
> time characterizing the flash device you specify.
>
> Some details you likely know, but may not have considered:
>
> Flash drives work with erase blocks (EBs).  And EBs are 128KB fairly
> often.  So I'll assume that size.
>
> (note: Sometimes the EB size is used as the cylinder size in the CHS
> (cylinder/head/sector) geometry which you can interrogate via hdparm.)
>
> It is my very limited understanding that low-end flash devices
> maintain a single mapping for the entire erase block, but aiui on
> every write to the flash drive, an available erase block is allocated
> to hold the new data.
>
> Thus, on low-end flash devices, it is my understanding they have to be
> erased immediately prior to use.  This erasing takes milliseconds
> which is very much on the same order of time as a disk seek and is why
> you see such slow performance out of low-end flash.
>
> Basically for every write:
>  new EB allocated
>  new EB erased  (ie. this takes milliseconds)
>  the original erase blocks data is read into a temporary buffer,
>  temp buffer modified,
>  temp buffer written to the newly allocated and erased EB
>  logical to physical EB mapping table updated to point to the new EB
>  original EB marked as free in mapping table.
>
> All of the above is handled internal to the flash drive and some of it
> likely happens in parallel.
>
> So you can see that every write to a erase block triggers a lot of
> activity that takes real world time.
>
> So still assuming one mapping per EB, if you have a properly aligned
> partition, then
>
>     dd if=/dev/zero of=/dev/sdx1 bs=128KB
>
> Will go at the optimum speed of the flash drive, because every i/o
> updates a single EB and incurs only one EB modification cycles worth
> of overhead.
>
> But unaligned writes, would each incur 2 write cycles and could easily
> be twice as slow.
>
> Now change the bs to 4KB and it is conceivable that with a really
> low-end device you will need 32 write cycles because you have not
> optimized your transfer size to the flash device.
>
> Now back to your question about random i/o versus sequential.
>
> Let's assume your flash drive is smart enough to cache/coalesce the
> above 4K writes into a single EB update and thus for a sequential
> write bs=4KB and bs=128KB run at the same speed.
>
> Now introduce random i/o with 128KB writes perfectly aligned to the
> EBs.  You should see no performance degradation because every write
> still triggers exactly one EB write cycle.
>
> But now do 4KB writes randomly around the drive.  With our simplified
> device this is going to trigger a EB write cycle for every 4KB write
> (or more if they cross a EB boundary.)
>
> All the above is complicated enough, but I _believe_ the next tier up
> in complexity from a flash drive internals perspective is for the
> flash drive to track the mappings in sub-EB allocations, so a random
> write may only invalidate a portion of a EB while leaving the rest of
> it valid.  And the write's are accumulated until a full new EB can be
> written.
>
> That sounds great until you realize how fast you run out of EBs if
> many / most of them are only partially full as would happen with lots
> of random 4KB i/o.
>
> In that case random 4KB i/o to a new flash drive will appear great
> because it has tons of free EBs to grab and use, but when the supply
> of free EBs run out, you will see a drastic drop in speed because of
> the sudden introduction of EB handling delays.
>
> Overall you can see that working with low-end flash drives is very
> non-deterministic and workloads are very important as is the age and
> specific usage history of the device.
>
> OTOH, if you get a high-end SSD that maintains free EB queues and
> performs the erasing in the background then your original statement
> that random i/o and sequential i/o should be at roughly the same speed
> becomes accurate.
>
> The Intel SSDs introduced 2 years ago were the first SSDs to offer
> background EB erasing, but even then you have to worry about the SSD
> running out of spare EBs to work with.  ie. If it runs out of spare
> EBs it can't erase them in the background.
>
> That is why ATA-8 introduced the trim command.
>
> Unfortunately the linux kernel's implementation of trim (discard) is
> rather poor at present.  (Maybe 2.6.37 will be better?)
>
> My preference for now is to use the userspace script "wiper.sh" that
> is included in hdparm v9.32 to trim SSDs that support trim.  (older
> versions of hdparm are known buggy so you really need the latest
> version or a patched older version).
>
> Greg
>
Million thanks for your such detail explanation :)
I have some questions about kernel:
1. it seems kernel will have some mechanism to control the time and
unit for writing/reading  USB devices to reach the best performance.
If it is true, would you please tell me where it is located? Will the
time and unit will be affect by some options pass to mount, such as
"sync"?
appreciate your help,
miloody

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ




[Index of Archives]     [Newbies FAQ]     [Linux Kernel Mentors]     [Linux Kernel Development]     [IETF Annouce]     [Git]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux SCSI]     [Linux ACPI]
  Powered by Linux