lvm cache monitoring attempt ;-)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am trying to understand the cache workings better.

To my dismay much of my current problems are actually caused by the SSD being a rather slow writer, and the fact that while working with data, most of it may end up in the cache, and if those blocks are then written to they will have to be updated on the cache.

I actually had an issue that the writing of data did not go down to cache speed (which is really low, about 14MB/s :-/) but it would stall for periods of time and not continue, I haven't seen data on that yet. I cannot identify it from the "cp" I have been doing.

I wrote a script to get some live data on cache performance, I don't know if that is the way to do it.

The basics of it were:

#!/bin/bash

while [ -d /proc/$1 ]; do
  vmstat -d | grep -e "dm-12\|dm-14\|dm-15" | tee -a out
  sleep 2s
done

Feed this script the process ID of the thing you want to monitor; ie. wait for its finishing.

Then another script can feed the thing into an acceptable output:

#!/bin/bash

rm -f *-parse
for f in dm-12 dm-14 dm-15; do

  cat out | grep $f > out-$f
printf " %-19s\n" $( ls -l /dev/mapper | grep $f | tr -s ' ' | cut -f9 -d' ' | sed 's/linux-//' ) > out-${f}-parse

printf "%4s %6s %8s %3s %3s\n" tot sector ms cur sec >> out-${f}-parse cat out-$f | awk '{print $6 " " $8 " " $9 " " $10 " " $11}' | { old=( 0 0 0 0 0 ); while read -a arr; do ms=$(( arr[2] - old[2] )); if [ $ms -gt 100 ]; then ms=$(( ms / 1000 )); fi; printf "%4d %6d %8d %3d %3d\n" $(( arr[0] - old[0] )) $(( arr[1] - old[1] )) $ms ${arr[3]} $(( arr[4] - old[4] )); old=( ${arr[@]} ); done; } | tail -n +2 >> out-${f}-parse
  done

paste *-parse

The only thing it does is select some fields. For each device the fields are put into an array. For most fields except that of the outstanding IO requests, the difference is calculated with the last line. So all numbers are deltas except the fourth field. If the "ms" field is greater than 100, it is divided by 1000 so you don't see these huge numbers. For some reason "delta milliseconds" can easily exceed a million, which would mean more than a 1000 seconds per second.

I have no clue; even if they are microseconds, these numbers have an average of about 5000000 per 2 seconds if writing takes place. The vmstat help says:

       ms: milliseconds spent reading
       ms: milliseconds spent writing

I'd have to dig in the source or whatever to see what it really is. 5 seconds per 2 seconds is still a lot, etc.

Here is an example output:

root_cache_cdata root_corig root tot sector ms cur sec tot sector ms cur sec tot sector ms cur sec 288 36864 4972 2 2 0 0 0 0 0 18 36864 353 0 2 320 40960 3648 2 2 0 0 0 0 0 20 40960 224 0 2 0 0 2904 1 2 0 0 0 0 0 0 0 192 0 2 609 77824 6396 2 2 7 48 36 0 0 44 77872 380 0 2 320 40960 6852 1 2 0 0 0 0 0 20 40960 441 0 2 721 92160 6316 2 2 8 56 28 0 0 52 92216 398 0 2 353 45064 3285 2 3 0 0 0 0 0 23 45064 206 0 2 0 0 345 2 2 0 0 0 0 0 0 0 13 0 2 0 0 3644 1 2 0 0 0 0 0 0 0 250 0 2 337 43008 3288 1 2 7 48 44 0 0 27 43056 202 0 2

<snip>

271 34688 6591 1 2 2945 376960 359 1 1 201 411648 443 0 2 0 0 5673 1 2 1472 188416 2883 1 2 92 188416 533 0 2 1 0 3326 1 2 1482 188488 3614 0 2 101 188488 411 0 2 0 0 2812 1 2 2304 294912 908 1 2 144 294912 263 0 2 1 0 5454 0 2 1385 176192 2306 2 2 94 176192 489 0 2 0 0 2977 0 2 1088 139264 3834 2 2 68 139264 412 0 2 0 0 1386 0 2 1648 210944 3825 2 2 103 210944 326 0 2 1 0 6097 0 2 905 114752 4596 2 2 64 114752 687 0 2

For cache_cdata and corig sectors per write is always 128 (max) which comes down to 64kB per write which makes sense.

On this output you can see that my corig is a much faster writer ;-).

In this output there was apparently no stalling; either the cache was writing, or the origin. Earlier I had issues when transfer rates would drop below 1M/s. Maybe it was caused by not having noatime?

I would have very long stalls and this was the reason for me to investigate this. A program like rsync (-aPh) or my own script would show abysmal rates for longer times. I would also have "ls" not responding for the longest time, or "rm -rf *" taking maybe 20 seconds to complete for a meagre 1.6GB. In one file.

And I still don't know what causes that. I'd have to do more rsyncs for example but also watch the vmstat -d output at the same time.

Which would probably mean having to add timestamps which is annoying; I don't have live output at this point.

You'd need "logtail" to obtain new input while still remembering the last line (previous one). Anyway.

Maybe this would allow me to troubleshoot the "hangs" I have experienced.

Regards.

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux