Understanding I/O with fio Statistics
fio
, according to its manual page, is a tool that can spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
The output of fio
command shows some statistics of the I/O, like the bandwidth and different types of latency. The tool is commonly used for measuring the I/O performance of hardware and software.
fio
provides various parameters for customizing the I/O. Changing some parameters, you can see how the performance is impacted. I tested some kinds of parameters and saw how they had affected the performance. The data collected in the tests was visualized with a chart. In this story, I will share what I observed.
I don’t think I have found anything new in I/O area. I just hope the story help you a bit in understanding how some factors impact I/O performance.
Test Settings
Hardware
My test was done on a AArch64 server. The storage device was a SATA disk. I didn’t check any more information of the disk, because that’s not important. After all this is not a benchmark report.
Software
Ubuntu 18.04 was installed on the server. fio
version was 3.1
. To parse the output of fio
, I also installed jq
, the command-line JSON processor.
Test Command
I used the following command to test:
fio --name=test --output-format=json --filename=/home/michael/test.fio --size=20g --runtime=10s --ioengine=libaio --rw=${x} --direct=1 --numjobs=${v}/--blocksize=${v}/--zonesize=${v}
With that command, the fixed setting of each test is:
- All I/O data was written to or read from file
/home/michael/test.fio
- The file size was limited to 20G
- Each test duration was limited to 10s
- The I/O engine was
libaio
- The I/O was not buffered
Variables
Among the many parameters offered by fio
, I chose 3 of them to measure:
--numjobs
: The number of processing threads--blocksize
: The block size for I/O units--zonesize
: The size of zones that a file is divided into
There are 2 reasons for selecting them: One is that they are numeric, which is good for visualization. The other reason is that their impact is significant.
Script
Here comes the test script I created:
#!/bin/bash# Read/write types
rw="read randread write randwrite readwrite randrw"
# Number of jobs
nj="1 2 4 8 16 32 64 128 256 512 1024 2048"
# Block sizes
bs="1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1m 2m"
# Zone sizes
zs="512k 1m 2m 4m 8m 16m 32m 64m 128m 256m 512m 1g"for x in ${rw}; do
echo "I/O (read, write in kbps) with type: ${x}, numjobs: ${nj}"
for v in ${nj}; do
fio --name=test --output-format=json --filename=/home/michael/test.fio --size=20g --runtime=10s --ioengine=libaio --rw=${x} --direct=1 --numjobs=${v} > tmp.json
echo `cat tmp.json | jq '[.jobs[] | .read.io_bytes] | add/1024/10'`, `cat tmp.json | jq '[.jobs[] | .write.io_bytes] | add/1024/10'`
done
donefor x in ${rw}; do
echo "I/O (read, write in kbps) with type: ${x}, blocksizes: ${bs}"
for v in ${bs}; do
fio --name=test --output-format=json --filename=/home/michael/test.fio --size=20g --runtime=10s --ioengine=libaio --rw=${x} --direct=1 --blocksize=${v} > tmp.json
echo `cat tmp.json | jq '[.jobs[] | .read.io_bytes] | add/1024/10'`, `cat tmp.json | jq '[.jobs[] | .write.io_bytes] | add/1024/10'`
done
donefor x in ${rw}; do
echo "I/O (read, write in kbps) with type: ${x}, zonesizes: ${zs}"
for v in ${zs}; do
fio --name=test --output-format=json --filename=/home/michael/test.fio --size=20g --runtime=10s --ioengine=libaio --rw=${x} --direct=1 --zonesize=${v} > tmp.json
echo `cat tmp.json | jq '[.jobs[] | .read.io_bytes] | add/1024/10'`, `cat tmp.json | jq '[.jobs[] | .write.io_bytes] | add/1024/10'`
done
done
The main body of the script consists of three 2-level loops, each for a kind of variable. The level-2 loop go through the selected values for the variable and use them to run the fio
test.
The output was in json
format. jq
parsed the output, picked the data in .read.io_bytes
and .write.io_bytes
fields that were the amount of I/O in bytes handled by a thread. The pipeline add/1024/10
added up the data of all the thread and converted into KBPS (kbytes per second). Finally I used the KBPS number as the performance indicator.
The output of the script was like:
Chart
Finally the statistics collected with the script was visualized in the following chart. (If the text is too small to recognize, try opening the picture in a new tab.)
The design of the chart:
- X-axis contains 3 different variables: number of jobs, block size and zone size. The interval are all in 2-based logarithmic scale.
- Y-axis is the I/O bandwidth in KBPS. The interval is in 10-based logarithmic scale.
- Data for different variables are marked in different colors: Data for testing different “number of jobs” are marked in red; those for “block size” are in green; “zone size” data are in blue.
- For a certain variable, the colors from the deepest to the lightest stands for the I/O type separately:
read
,randread
,write
,randwrite
,readwrite
,randrw
. Soread
data is in the deepest color, whilerandrw
data is in the lightest color. - The bandwidth numbers for writing are drawn in solid lines, those for reading are in dash lines.
- The legends to the right of the chart are in the format of “[variable]-[I/O type]-[read or write]”. For example
numjobs-randrw-read
means the data are the reading bandwidth with I/O typerandrw
, the series are collected for differentnumjobs
settings.
Now let’s see what we can learn from the chart:
- See the series
numjobs-read-read
. When the number of reading threads increased from 1 to 2, the bandwidth decreased badly. But after that, while the number of threads increased, the bandwidth increased a lot. - See
numjobs-randwrite-write
. The writing performance ofrandwrite
type I/O had its lowest value when there were 64 threads. - See
numjobs-randread-read
. The reading performance ofrandread
type I/O increased when the number of threads increased. After the threads count reached 64, the increase was rapid. - See
numjobs-readwrite-*
andnumjobs-randrw-*
. They are mixed types I/O. When the number of threads changed, the bandwidth didn’t change significantly. - See all the green lines. The increasing block sizes lead to increasing performance.
- See all the blue lines. Generally the I/O performance degraded when the zone size increased.
- See
zonesize-read-read
andzonesize-write-write
. When the reading and writing were smooth (not random), the performance was almost free from the zone size change. - See
zonesize-readwrite-read
andzonesize-readwrite-write
. The reading and writing were also stable when I/O type was mixed. Just the performance was low. - See
zonesize-randwrite-write
andzonesize-randread-read
. The performance of random reading and writing jumped down at around 64M zone size. - See
zonesize-randrw-read
andzonesize-randrw-write
. When the I/O was mixed and random, the performance degraded when the zone size increased. - See the last points of every series. All the good-performance bandwidth data was going to some point close to 100000 kbps, while all the bad-performance bandwidth data was going to somewhere around 1000kbps.