Cloud Hypervisor + GDB + Arm64 Part 5: AArch64 Address Translation Sketch
This article is a collection of essential knowledge of address translation on AArch64. It doesn’t cover every aspects of the translation. I only focus on the information that is related to the scenario that happens when we use GDB to debug the guest kernel.
To enable the GDB support on Arm64 architecture for Cloud Hypervisor (a virtual machine monitor written in Rust), we need to translate the virtual address (VA) used by the guest kernel to the guest physical address (IPA). This scenario covers these requirements:
- Only stage 1 translation is needed.
- Only high address range is taken care: 0xFFFF_0000_0000_0000 ~ 0xFFFF_FFFF_FFFF_FFFF, because it’s in kernel space.
- Only Exception Level 1 (EL1) is involved, because the guest kernel is running on it.
Most of the content is quoted from AArch64 reference manual [1].
Address
The aim of address translation is to convert the virtual address (VA) to physical address (PA)or intermediate physical address (IPA).
Address Type
- Virtual address (VA)
An address used in an instruction, as a data or instruction address, is a Virtual Address (VA). - Intermediate physical address (IPA)
In a translation regime that provides two stages of address translation, the IPA is:
• The output address (OA) from the stage 1 translation.
• The output address (IA) for the stage 2 translation. - Physical address (PA)
The address of a location in a physical memory map. That is, an output address from the PE to the
memory system.
Address Size
The address size (for both the input and output address of the translation process) is determined by following rules:
- Up to 52 bits when all of the following are true:
-FEAT_LPA2
is implemented.
-TCR_ELx.DS==1
for the translation regime controlled by that register.
- The 4KB or 16KB translation granule is used. - Up to 52 bits when both of the following are true:
-FEAT_LVA
is implemented
- The 64KB translation granule is used. - Up to 48 bits, otherwise.
Address Size Configuration
Determining Physical Address Size
The ID_AA64MMFR0_EL1.PARange
field indicates the implemented PA size:
Configuring Input Address Size
TCR_ELx.TxSZ
fields specify the input address size:
- For a stage of translation that can support two VA ranges
TheTCR_ELx
has twoTxSZ
fields, corresponding to the two VA ranges:
-TCR_ELx.T0SZ
specifies the size for the lower VA range, translated usingTTBR0_ELx
.
-TCR_ELx.T1SZ
specifies the size for the upper VA range, translated usingTTBR1_ELx
. - For a stage of translation that supports only a single input address (IA) range
- TheTCR_ELx
has a singleT0SZ
field, and IAs are translated usingTTBR0_ELx
.
Configuring Output Address Size
TCR_ELx.{I}PS
must be programmed to maximum output address size for a stage of translation:
Translation Regime
A translation regime comprises either:
- A single stage of address translation.
- This maps an input VA to an output PA. - Two, sequential, stages of address translation, where:
- Stage 1 maps an input VA to an output IPA.
- Stage 2 maps an input IPA to an output PA.
Translation Table Walks
The translation table walk is the set of lookups that are required to translate the VA to the PA.
The translation result includes:
- The required PA
- The memory attributes for the target memory region
- The access permissions for the target memory regions
The following diagram generally describes a stage of a 3-level lookup:
Granule Size
VMSAv8–64 supports translation granule sizes of 4KB, 16KB, and 64KB.
The memory translation granule size defines both:
- The maximum size of a single translation table.
- The memory page size. That is, the granularity of a translation table lookup.
Identifying supported granule sizes
For stage 1 translation:
For the stage 1 translation, the supported granule sizes can be found by checking ID_AA64MMFR0_EL1.TGram*
field:
For stage 2 translation
Ignored.
Effect Of Granule Size On Translation
Different granule sizes make differences in following aspects of the address translation:
- Page size
- Page address range
- Address bits resolved in one level of lookup
- Maximum number of entries in a translation table
The table below lists how the granule size affects everything:
Granule Size’s Effect On IA Breaking Down
Take 4KB for example, a 52-bit IA should be broken down in this way:
Granule Size’s Effect On Translation Tables
The effect of granule size on TTBR_ELx
and the translation table of different levels can be interpreted from this table:
x
in the table above is the least significant bits index in the IA for current translation level.
Take 4KB granule size and 48-bit IA for example, following information is obtain from the table:
- The translation table address can be found at
TTBR_ELx[47:12]
. - The address
IA[47:12]
are to be broken down for each level of translation. - On each level, 9 bits of
IA[47:12]
should be resolved as the index in a translation table. For level 1, the index isIA[47(=39+8), 39]
. - The address
IA[11:0]
is the offset inside a page.
Address Translation Process
With the background knowledge introduced above, not it’s time to take a deeper look into the details of a translation process.
In this chapter I will only describe the process of a stage 1 translation with 4KB granule.
Initial Lookup Level
Before going through the translation table, the first question to answer is how many levels to look up.
Generally the rule of the initial lookup level is:
- If the input address is in a bigger range, more translation levels need to be performed to cover that range, so the initial lookup level is lower;
- Otherwise, if the input address is in a smaller range, less translation levels are needed, so the initial lookup level is higher.
For a stage 1 translation, the required initial lookup level is determined only by the required input address range specified by the corresponding TCR_ELx.TnSZ
field.
Specifically, the range size is 2^(64 - TCR_ELx.TnSZ
) bytes.
When using the 4KB translation granule, the relationship between the initial lookup level and TCR_ELx.TnSZ
is described by the table:
Let me try to give an example for what the table means. For a range 1 translation, the input address is between 0xFFFF_0000_0000_0000 ~ 0xFFFF_FFFF_FFFF_FFFF. But the input address may not cover the whole range because it is really a big range (2⁴⁸ or 2⁵²). The actual range can be calculated with 2^(64 - TCR_ELx.T1SZ
).
For example, if TC1_ELx.T1SZ
= 28, the real input address range size is 2^(64 - 28) = 2³⁶. A 3-level translation table can cover all the possible addresses in this range. So level -1 and 0 is not needed. The initial lookup level is 1. This case matches the 3rd line of the table above.
Going Through Translation Table
Once the initial lookup level is identified, it’s time to go through the translation tables to find out the PA.
A simplified process of the going through is like:
TTBR_ELx
register holds the address of the first table to go through.- Some bits in the VA contain the index of an entry in that table. And the entry holds the address of the next table to go through.
- Repeat last step until coming to level-3 table.
- Some bits in the VA contain the index of an entry in that table. The entry holds the address of a physical page.
- The last bits (12 bits for 4KB page) of the VA is the offset in the physical page. By combining the address of the physical page identified in last step and the offset, the PA is concluded.
Selecting TTBR_ELx
When there is only one VA range is supported, TTBR0_ELx
must be used for address translation.
But when two VA ranges are supported, the correct TTBR_ELx
(TTBR0_ELx
or TTBR1_ELx
) register need to be selected:
TTBR0_ELx
points to the initial translation table for the lower VA range:
(48 bits VA) 0x0000_0000_0000_0000 ~ 0x0000_FFFF_FFFF_FFFF
(52 bits VA) 0x0000_0000_0000_0000 ~ 0x000F_FFFF_FFFF_FFFFTTBR1_ELx
points to the initial translation table for the upper VA range:
(48 bits VA) 0xFFF0_0000_0000_0000 ~ 0xFFFF_FFFF_FFFF_FFFF
(52 bits VA) 0xFFF0_0000_0000_0000 ~ 0xFFFF_FFFF_FFFF_FFFF
So, which TTBR_ELx
is used depends only on the VA presented for translation. The most significant bits of the VA must all be the same value and:
- If the most significant bits of the VA are zero, then
TTBR0_ELx
is used. - If the most significant bits of the VA are one, then
TTBR1_ELx
is used.
Calculating Table Entry Address
Now let’s take a step further to see how the address of a table entry is calculated.
Take 4KB for an example, the table below gives all the information for the calculation.
BaseAddr
: The base address for the level of lookup, as defined by:
For the initial lookup level, the value of the appropriateTTBR_ELx.BADDR
field.
Otherwise, the translation table address returned by the previous level of lookup.PAMax
: The supported PA width, in bits.- Symbols in the calculation of stage 2 are ignored.
Relevant Registers
This chapter collects the introduction of some registers that are involved in our translation use case.
TCR_EL1
Translation Control Register (EL1)
The control register for stage 1 of the EL1&0 translation regime.
TTBR1_EL1
Translation Table Base Register 1 (EL1)
Holds the base address of the translation table for the initial lookup for stage 1 of the translation of an address from the higher VA range in the EL1&0 stage 1 translation regime, and other information for this translation regime.
ID_AA64MMFR0_EL1
AArch64 Memory Model Feature Register 0
Provides information about the implemented memory model and memory management support in AArch64 state.
PARange, bits [3:0] - Physical Address range supported:
- 0b0000 32 bits, 4GB.
- 0b0001 36 bits, 64GB.
- 0b0010 40 bits, 1TB.
- 0b0011 42 bits, 4TB.
- 0b0100 44 bits, 16TB.
- 0b0101 48 bits, 256TB.
- 0b0110 52 bits, 4PB (for ARMv8.2-LPA only).
- All other values are reserved.