Understanding I/O with fio Statistics

5 min readOct 8, 2022

fio, according to its manual page, is a tool that can spawn a number of threads or processes doing a particular type of I/O action as specified by the user.

The output of fio command shows some statistics of the I/O, like the bandwidth and different types of latency. The tool is commonly used for measuring the I/O performance of hardware and software.

fio provides various parameters for customizing the I/O. Changing some parameters, you can see how the performance is impacted. I tested some kinds of parameters and saw how they had affected the performance. The data collected in the tests was visualized with a chart. In this story, I will share what I observed.

I don’t think I have found anything new in I/O area. I just hope the story help you a bit in understanding how some factors impact I/O performance.

Test Settings

Hardware

My test was done on a AArch64 server. The storage device was a SATA disk. I didn’t check any more information of the disk, because that’s not important. After all this is not a benchmark report.

Software

Ubuntu 18.04 was installed on the server. fio version was 3.1. To parse the output of fio, I also installed jq, the command-line JSON processor.

Test Command

I used the following command to test:

fio --name=test --output-format=json --filename=/home/michael/test.fio --size=20g --runtime=10s --ioengine=libaio --rw=${x} --direct=1 --numjobs=${v}/--blocksize=${v}/--zonesize=${v}

With that command, the fixed setting of each test is:

All I/O data was written to or read from file /home/michael/test.fio
The file size was limited to 20G
Each test duration was limited to 10s
The I/O engine was libaio
The I/O was not buffered

Variables

Among the many parameters offered by fio, I chose 3 of them to measure:

--numjobs: The number of processing threads
--blocksize: The block size for I/O units
--zonesize: The size of zones that a file is divided into

There are 2 reasons for selecting them: One is that they are numeric, which is good for visualization. The other reason is that their impact is significant.

Script

Here comes the test script I created:

#!/bin/bash# Read/write types
rw="read randread write randwrite readwrite randrw"
# Number of jobs
nj="1 2 4 8 16 32 64 128 256 512 1024 2048"
# Block sizes
bs="1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1m 2m"
# Zone sizes
zs="512k 1m 2m 4m 8m 16m 32m 64m 128m 256m 512m 1g"for x in ${rw}; do
    echo "I/O (read, write in kbps) with type: ${x}, numjobs: ${nj}"
    for v in ${nj}; do
        fio --name=test --output-format=json --filename=/home/michael/test.fio --size=20g --runtime=10s --ioengine=libaio --rw=${x} --direct=1 --numjobs=${v} > tmp.json
    echo `cat tmp.json | jq '[.jobs[] | .read.io_bytes] | add/1024/10'`, `cat tmp.json | jq '[.jobs[] | .write.io_bytes] | add/1024/10'`
    done
donefor x in ${rw}; do
    echo "I/O (read, write in kbps) with type: ${x}, blocksizes: ${bs}"
    for v in ${bs}; do
        fio --name=test --output-format=json --filename=/home/michael/test.fio --size=20g --runtime=10s --ioengine=libaio --rw=${x} --direct=1 --blocksize=${v} > tmp.json
    echo `cat tmp.json | jq '[.jobs[] | .read.io_bytes] | add/1024/10'`, `cat tmp.json | jq '[.jobs[] | .write.io_bytes] | add/1024/10'`
    done
donefor x in ${rw}; do
    echo "I/O (read, write in kbps) with type: ${x}, zonesizes: ${zs}"
    for v in ${zs}; do
        fio --name=test --output-format=json --filename=/home/michael/test.fio --size=20g --runtime=10s --ioengine=libaio --rw=${x} --direct=1 --zonesize=${v} > tmp.json
    echo `cat tmp.json | jq '[.jobs[] | .read.io_bytes] | add/1024/10'`, `cat tmp.json | jq '[.jobs[] | .write.io_bytes] | add/1024/10'`
    done
done

The main body of the script consists of three 2-level loops, each for a kind of variable. The level-2 loop go through the selected values for the variable and use them to run the fio test.

The output was in json format. jq parsed the output, picked the data in .read.io_bytes and .write.io_bytes fields that were the amount of I/O in bytes handled by a thread. The pipeline add/1024/10 added up the data of all the thread and converted into KBPS (kbytes per second). Finally I used the KBPS number as the performance indicator.

The output of the script was like:

Chart

Finally the statistics collected with the script was visualized in the following chart. (If the text is too small to recognize, try opening the picture in a new tab.)

The design of the chart:

X-axis contains 3 different variables: number of jobs, block size and zone size. The interval are all in 2-based logarithmic scale.
Y-axis is the I/O bandwidth in KBPS. The interval is in 10-based logarithmic scale.
Data for different variables are marked in different colors: Data for testing different “number of jobs” are marked in red; those for “block size” are in green; “zone size” data are in blue.
For a certain variable, the colors from the deepest to the lightest stands for the I/O type separately: read, randread, write, randwrite, readwrite, randrw. So read data is in the deepest color, while randrw data is in the lightest color.
The bandwidth numbers for writing are drawn in solid lines, those for reading are in dash lines.
The legends to the right of the chart are in the format of “[variable]-[I/O type]-[read or write]”. For example numjobs-randrw-read means the data are the reading bandwidth with I/O type randrw, the series are collected for different numjobs settings.

Now let’s see what we can learn from the chart:

See the series numjobs-read-read. When the number of reading threads increased from 1 to 2, the bandwidth decreased badly. But after that, while the number of threads increased, the bandwidth increased a lot.
See numjobs-randwrite-write. The writing performance of randwrite type I/O had its lowest value when there were 64 threads.
See numjobs-randread-read. The reading performance of randread type I/O increased when the number of threads increased. After the threads count reached 64, the increase was rapid.
See numjobs-readwrite-* and numjobs-randrw-*. They are mixed types I/O. When the number of threads changed, the bandwidth didn’t change significantly.
See all the green lines. The increasing block sizes lead to increasing performance.
See all the blue lines. Generally the I/O performance degraded when the zone size increased.
See zonesize-read-read and zonesize-write-write. When the reading and writing were smooth (not random), the performance was almost free from the zone size change.
See zonesize-readwrite-read and zonesize-readwrite-write. The reading and writing were also stable when I/O type was mixed. Just the performance was low.
See zonesize-randwrite-write and zonesize-randread-read. The performance of random reading and writing jumped down at around 64M zone size.
See zonesize-randrw-read and zonesize-randrw-write. When the I/O was mixed and random, the performance degraded when the zone size increased.
See the last points of every series. All the good-performance bandwidth data was going to some point close to 100000 kbps, while all the bad-performance bandwidth data was going to somewhere around 1000kbps.