Re: Slow USB storage device?

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 19 Jan 2010 10:14:09 -0800 (PST)

On Tue, 19 Jan 2010, Kay Sievers wrote:
> 
> Most of the RAID metadata formats can probably get a minimal size
> requirement. It is probably not really be neccesary to support a 1.4
> MB RAID auto-assembly :)

Setting the debug mask to all ones, I get some crazy sh*t:

	[19] swsuspend:
		sb-buffer read() off=1024 len=3072
		sb-buffer read() off=4096 len=4096
		sb-buffer read() off=8192 len=8192
		sb-buffer read() off=16384 len=16384
		sb-buffer read() off=32768 len=32768

and it turns out that it looks like the fundamental problem is that the 
sb-buffer size is simply much much too big.

We don't actually want to read 32kB at all - that swsuspend thing is 
lookign for a small signature at various offsets. So why do we do those 
reads of increasing size?

The sb-buffer code makes the first 68kB special, and uses the logic that 
even if you want to read just 1kB, we'll read the whole beginning of disk 
up to the point you needed. So no wonder we always end up reading lots of 
data off those disks - the whole logic seems to assume that it's totally 
free to read a single big area.

Which is simply not worth it to begin with.

If it's a fast disk, it doesn't matter one whit if we do it as 5 reads or 
as one read.

Sure, the five reads are going to be slower, but since blkid does a total 
of ten reads or so, performance does not _matter_ for the fast disk case! 
It's not like we're reading gigabytes of data, where it matters a lot 
whether we do the reads in 4kB chunks or 256kB chunks!

So the only case where performance matters is for the really slow disks, 
and there we're much better off transferring less data.

So the whole buffering is actually detrimental to performance.

So with this trivial patch, things seem to improve. The swsuspend case, 
for example, becomes

	[19] swsuspend:
		sb-buffer read() off=1024 len=3072
		extra-buffer read: off=7168 len=1024
		extra-buffer read: off=15360 len=1024
		extra-buffer read: off=31744 len=1024
		extra-buffer read: off=64512 len=1024

instead. 

Leave the caching to the kernel. The kernel will cache things properly if 
there are buffers that get read over and over again (unless you close and 
re-open the device, which blkid doesn't do).

With this trivial one-liner change, the timings change from

 - before (with the previous patch to not do read-ahead):

	[root@EeePC util-linux-ng]# time /sbin/blkid -p /dev/sdc

	real	0m4.914s
	user	0m0.001s
	sys	0m0.005s

 - after (with this additional one-liner):

	[root@EeePC util-linux-ng]# time misc-utils/blkid -p /dev/sdc

	real	0m2.887s
	user	0m0.011s
	sys	0m0.032s

so it still reads way too much, but it's gone from five to three seconds, 
with a simple one-liner change to not buffer so much in user space.

It still too slow, I think, but considering that it used to be almost 15s, 
things have certainly improved a lot. And getting rid of the crazy things 
(like probing for swsuspend at all on a 505kB disk!) would improve it 
further.

NOTE! This patch does mean that blkid will do more 'read()' system calls, 
but they will be (a) smaller and (b) many of them will overlap, and the 
kernel caching kicks in. For example, we have hot offsets like the 7kB and 
8kB marks:

	[19] swsuspend:
		extra-buffer read: off=7168 len=1024
	[20] swap:
		extra-buffer read: off=7168 len=1024

and

	[27] reiserfs:
		extra-buffer read: off=8192 len=1024
	[32] zfs:
		extra-buffer read: off=8192 len=1024
	[35] ufs:
		extra-buffer read: off=8192 len=1377
	[36] hpfs:
		extra-buffer read: off=8192 len=1024
	[45] ocfs:
		extra-buffer read: off=8192 len=1024
	[46] ocfs2:
		extra-buffer read: off=8192 len=1024

but from a performance standpoint, none of that matters, because the 
kernel will read those locations just once, and then just copy from the 
cache for subsequent ones.

Before, none of those cases caused duplicate reads, because blkid cached 
the whole first 68kB in that BLKID_SB_BUFSIZ buffer. But it did so by 
reading much more than necessary.

You can see the system call timings changing radically too (again, 
filtering only system calls that take more than a tenth of a second):

 - before:

	read(3, "\0\0\0\0\0\0\0\0"..., 1024) = 1024 <0.256844>
	read(3, "\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.256971>
	read(3, "\0\0\0\0\0\0\0\0"..., 8192) = 8192 <0.514547>
	read(3, "\353<\220){'`j"..., 16384) = 16384 <1.033542>
	read(3, "\0\0\0\0\0\0\0\0"..., 32768) = 32768 <2.064483>
	read(3, "\0\0\0\0\0\0\0\0"..., 1024) = 1024 <0.257442>
	read(3, "\0\0\0\0\0\0\0\0"..., 1024) = 1024 <0.257447>
	read(3, "\0\0\0\0\0\0\0\0"..., 1377) = 1377 <0.256989>

 - after:

	read(3, "\0\0\0\0\0\0\0\0"..., 1024) = 1024 <0.257850>
	read(3, "\0\0\0\0\0\0\0\0"..., 1024) = 1024 <0.256907>
	read(3, "\0\0\0\0\0\0\0\0"..., 1024) = 1024 <0.257585>
	read(3, "\0\0\0\0\0\0\0\0"..., 1024) = 1024 <0.257583>
	read(3, "\0\0\0\0\0\0\0\0"..., 1024) = 1024 <0.258587>
	read(3, "\0\0\0\0\0\0\0\0"..., 1024) = 1024 <0.257433>
	read(3, "\0\0\0\0\0\0\0\0"..., 1024) = 1024 <0.256586>
	read(3, "\0\0\0\0\0\0\0\0"..., 1024) = 1024 <0.257574>
	read(3, "\0\0\0\0\0\0\0\0"..., 1024) = 1024 <0.258016>
	read(3, "\0\0\0\0\0\0\0\0"..., 1377) = 1377 <0.256531>
	read(3, "\0\0\0\0\0\0\0\0"..., 512) = 512 <0.257059>

ie we do more reads (the above ignores all the ones that end up being 
duplicate and come from the cache, and thus take no time at all), but 
because they are much smaller, they are much faster. That quarter second 
is now always the time to fill one 4kB block (which is the kernel caching 
granularity), which is why the time is the same for 512 bytes as for 1377 
bytes (and consistent with that 15kB/s "performance" of the drive).

(Btw, you could just remove the whole BLKID_SB_BUFSIZ logic in the first 
place, but I didn't do that. I intentionally went for "smallest possible 
patch" rather than "get rid of the user-level caching entirely").

			Linus

---
 shlibs/blkid/src/blkidP.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/shlibs/blkid/src/blkidP.h b/shlibs/blkid/src/blkidP.h
index 4a3ae20..1e3d5dc 100644
--- a/shlibs/blkid/src/blkidP.h
+++ b/shlibs/blkid/src/blkidP.h
@@ -133,7 +133,7 @@ struct blkid_prval
 	struct blkid_chain	*chain;		/* owner */
 };
 
-#define BLKID_SB_BUFSIZ		0x11000
+#define BLKID_SB_BUFSIZ		4096
 
 /*
  * Filesystem / Raid magic strings
--
To unsubscribe from this list: send the line "unsubscribe util-linux-ng" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html