Re: extremely long blockages when doing random writes to SSD

Luigi Semenzato <semenzato@xxxxxxxxxx> · Thu, 25 Jun 2015 11:24:53 -0700

I looked at this some more and I am not sure that there is any bug, or
other possible tuning.

While the random-write process runs, iostat -x -k 1 reports these numbers:

average queue size: around 300
average write wait: typically 200 to 400 ms, but can be over 1000 ms
average read wait: typically 50 to 100 ms

(more info at crbug.com/414709)

The read latency may be enough to explain the jank.  In addition, the
browser can do fsyncs, and I think that those will block for a long
time.

Ionice doesn't seem to make a difference.  I suspect that once the
blocks are in the output queue, it's first-come/first-serve.  Is this
correct or am I confused?

We can fix this on the application side but only partially.  The OS
version updater can use O_SYNC.  The problem is that his can happen in
a number of situations, such as when simply downloading a large file,
and in other code we don't control.

On Wed, Jun 24, 2015 at 4:43 PM, Luigi Semenzato <semenzato@xxxxxxxxxx> wrote:
> Kernel version is 3.8.
>
> I am not using a file system, I am writing directly into a partition.
>
> Here's the little test app.  I call it "random-write" but you're
> welcome to call it whatever you wish.
>
> My apologies for the copyright notice.
>
> /* Copyright 2015 The Chromium OS Authors. All rights reserved.
>  * Use of this source code is governed by a BSD-style license that can be
>  * found in the LICENSE file.
>  */
>
> #define _FILE_OFFSET_BITS 64
> #include <fcntl.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <strings.h>
> #include <unistd.h>
> #include <sys/stat.h>
> #include <sys/time.h>
> #include <sys/types.h>
>
> #define PAGE_SIZE 4096
> #define GIGA (1024 * 1024 * 1024)
>
> typedef u_int8_t u8;
> typedef u_int64_t u64;
>
> typedef char bool;
> const bool true = 1;
> const bool false = 0;
>
>
> void permute_randomly(u64 *offsets, int offset_count) {
>   int i;
>   for (i = 0; i < offset_count; i++) {
>     int r = random() % (offset_count - i) + i;
>     u64 t = offsets[r];
>     offsets[r] = offsets[i];
>     offsets[i] = t;
>   }
> }
>
> u8 page[4096];
> off_t offsets[2 * (GIGA / PAGE_SIZE)];
>
> int main(int ac, char **av) {
>   u64 i;
>   int out;
>
>   /* Make "page" slightly non-empty, why not. */
>   page[4] = 1;
>   page[34] = 1;
>   page[234] = 1;
>   page[1234] = 1;
>
>   for (i = 0; i < sizeof(offsets) / sizeof(offsets[0]); i++) {
>     offsets[i] = i * PAGE_SIZE;
>   }
>
>   permute_randomly(offsets, sizeof(offsets) / sizeof(offsets[0]));
>
>   if (ac < 2) {
>     fprintf(stderr, "usage: %s <device>\n", av[0]);
>     exit(1);
>   }
>
>   out = open(av[1], O_WRONLY);
>   if (out < 0) {
>     perror(av[1]);
>     exit(1);
>   }
>
>   for (i = 0; i < sizeof(offsets) / sizeof(offsets[0]); i++) {
>     int rc;
>     if (lseek(out, offsets[i], SEEK_SET) < 0) {
>       perror("lseek");
>       exit(1);
>     }
>     rc = write(out, page, sizeof(page));
>     if (rc < 0) {
>       perror("write");
>       exit(1);
>     } else if (rc != sizeof(page)) {
>       fprintf(stderr, "wrote %d bytes, expected %d\n", rc, sizeof(page));
>       exit(1);
>     }
>   }
> }
>
> On Wed, Jun 24, 2015 at 3:25 PM, Andrew Morton
> <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>> On Wed, 24 Jun 2015 14:54:09 -0700 Luigi Semenzato <semenzato@xxxxxxxxxx> wrote:
>>
>>> Greetings,
>>>
>>> we have an app that writes 4k blocks to an SSD partition with more or
>>> less random seeks.  (For the curious: it's called "update engine" and
>>> it's used to install a new Chrome OS version in the background.)  The
>>> total size of the writes can be a few hundred megabytes.  During this
>>> time, we see that other apps, such as the browser, block for seconds,
>>> or tens of seconds.
>>>
>>> I have reproduced this behavior with a small program that writes 2GB
>>> worth of 4k blocks randomly to the SSD partition.  I can get apps to
>>> block for over 2 minutes, at which point our hang detector triggers
>>> and panics the kernel.
>>>
>>> CPU: Intel Haswell i7
>>> RAM: 4GB
>>> SSD: 16GB SanDisk
>>> kernel: 3.8
>>>
>>> >From /proc/meminfo I see that the "Buffers:" entry easily gets over
>>> 1GB.  The problem goes away completely, as expected, if I use O_SYNC
>>> when doing the random writes, but then the average size of the I/O
>>> requests goes down a lot, also as expected.
>>>
>>> First of all, it seems that there may be some kind of resource
>>> management bug.  Maybe it has been fixed in later kernels?  But, if
>>> not, is there any way of encouraging some in-between behavior?  That
>>> is, limit the allocation of I/O buffers to a smaller amount, which
>>> still give the system a chance to do some coalescing, but perhaps
>>> avoid the extreme badness that we are seeing?
>>>
>>
>> What kernel version?
>>
>> Are you able to share that little test app with us?
>>
>> Which filesystem is being used and with what mount options etc?
>>
>>
>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>