Skip to content

16 System and Implementation Considerations

Chapter 16 System and Implementation Considerations 16.1 Stages Arm strongly recommends that stage 2 is only used to provide device assignment to a guest OS. To support other usage scenarios, Arm recommends implementations also implement stage 1 if stage 2 is implemented. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1116

Chapter 16. System and Implementation Considerations 16.2. Caching 16.2 Caching An SMMU implementation is not required to implement caching of any kind, but Arm expects that performance requirements will require caching of at least some configuration or translation information. Caching of configuration or translations might manifest as separate caches for each type of structure, or some combination of structures into a smaller number of caches. When architected structure types are cached together as one combined entry, the invalidation and lookup semantics remain identical to many specialized per-structure caches. For example, an implementation with caches of each structure and translation stage implemented separately would contain: • An STE cache – Indexed by StreamID. – Invalidated by StreamID, StreamID-span, or all. – Might contain a Level 1 Stream table cache of pointers to 2nd-level Stream table. * Identical index/invalidation requirements. • A CD cache – Indexed by StreamID and SubstreamID, or address. * Note: Address index could be used, calculated from the STE.S1ContextPtr of a prior STE lookup plus the incoming SubstreamID. This arrangement might be useful where many STEs share a common table of CDs. – Invalidated by StreamID and SubstreamID, StreamID, or all. – Might contain a Level 1 CD table cache of pointers to 2nd-level CD table. * Identical index and invalidation requirements. • A VMS PARTID_MAP cache – Indexed by VMID, or StreamID. * Note: A VMS cache might be indexed by StreamID, but this would be no different from storing VMS data as part of an STE cache. – Invalidated by VMID, or StreamID. • A stage 1 TLB (VA to IPA). – Indexed by VA, ASID, VMID and EL. * EL is the Exception level or StreamWorld on whose behalf the TLB entry was inserted. – Invalidated by VA, ASID, VMID and EL, or ASID, VMID and EL, or VMID and EL, or EL. – Might contain or be paired with a walk cache (invalidated under the same conditions as PE translation walk caches). • A stage 2 TLB (IPA to PA) – Indexed by IPA , VMID and EL. – Invalidated by IPA and VMID, or VMID, or all. – Might contain or be paired with a walk cache. In this example, consistency is maintained by looking up cache entries in order from STE through CD, then using the parameters determined from that stream configuration to lookup in the TLBs. That is, given a StreamID, SubstreamID and VA input, convert StreamID and SubstreamID into ASID, VMID and Exception level, then use VA, ASID, VMID and EL to lookup in the TLBs. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1117

Chapter 16. System and Implementation Considerations 16.2. Caching 16.2.1 Caching combined structures Note: In the example given in 16.2 Caching, a stage 1 TLB is implemented separately to the stage 2 TLB. While this layout mimics the two stages of translation table, it might not provide optimal performance. A designer might determine that a combined stage 1 and stage 2 TLB is better, where each entry would translate VA to PA directly but would inherit invalidation requirements from both original structures. For translations and TLBs, the SMMU invalidation rules for combined TLBs directly match those of Armv8-A. Note: For example, an invalidation by IPA is not required to invalidate entries in a combined S1 and S2 TLB but is required to be paired with a second invalidation by VA or stage 1-all that would affect S1 and S2 TLB entries. Combined structure caches maintain the same invalidation semantics as discrete structures. An entry of a combined cache is invalidated if any part of the entry would have been invalidated in an equivalent operation, with the same parameters, on a discrete cache. Note: For example, a particular SMMU implementation maintains only two hardware caches, a combined cache of STE and CD, and a TLB. In this layout, a StreamID and SubstreamID and VA input looks up StreamID and SubstreamID in the combined cache and determines the ASID, VMID, and StreamWorld at the same time. The translation is then looked up using VA, ASID, VMID and StreamWorld to determine the PA. The invalidation requirements of the combined STE and CD cache are the union of the requirements of the separate structures. STE invalidation operations invalidate every combined cache entry that contains data loaded using a given StreamID. This covers all CDs fetched from a given STE, which is implied anyway by the CMD_CFGI_STE to invalidate CDs subordinate to the STE. Conversely, every entry that contains data from a CD to be invalidated must be invalidated even if the STE portion is still valid. An implementation might combine a TLB with configuration caching so that a single cache is looked up by StreamID and SubstreamID and VA and results in a PA output. Entries in this cache are invalidated when any part of an entry would match (or cannot be proven to not match) a required invalidation for STE, CD or translation. Note: Implementations must balance a trade-off between over-invalidation that might be necessary to cover all required entries, and the cost of adding extra tagging. For example, a single cache might tag entries by VA, ASID, VMID, and StreamID so that broadcast TLB invalidations can remove only relevant entries, or so that an STE invalidation removes only entries that could have been constructed from the given StreamID. 16.2.2 Data dependencies between structures The configuration structures logically make a tree or graph by indicating subsequent structures (and onwards, indicating translation tables). The structures contain fields to locate the next structure in the chain but might also modify interpretation of subsequent structures. The dependencies between structures are: • STE to CD to TT (stage 1) • STE to TT (stage 2) (Here, ‘STE’ might be composed of multi-level L1STD to STE lookups and CD might be composed of multi-level L1CD to CD lookups.) The STE contains fields that determine how to locate a CD and stage 2 translations (whichever are relevant to STE.Config) but also contains fields that modify the behavior of a CD and translation table walks performed through it. For example: • The STE StreamWorld (STRW plus STE Security state) determines the translation regime, which: – Tags caches of translations subsequently inserted, to separate lookup and match on invalidation – Determines which translation table formats are valid for use by the CD. • STE.S1STALLD modifies CD.S behavior upon stage 1 fault. • VMSAv8-32 LPAE stage 2 translation (STE.Config[1] == 1 and STE.S2AA64 selects VMSAv8-32 LPAE) causes a 64-bit stage 1 CD (CD.AA64 == 1) to be ILLEGAL. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1118

Chapter 16. System and Implementation Considerations 16.2. Caching Note: A change to an STE field requires an STE invalidation. An STE invalidation also invalidates all CDs that were cached through the STE. The CD contains fields that determine how to locate translation tables but also contains fields that modify the behavior of a translation table walk through it: • CD.{AA64, EPDx, SHx, ORx, IRx, TGx, TxSZ, ENDI, NSCFGx} govern walks of the translation table itself. In addition, NSCFGx can influence the NS attribute output from stage 1, because a translation table walk made to memory that is marked as Non-secure at stage 1 can never provide an output address that is Secure at the output of stage 1. • CD.{UWXN, WXN, PAN, AFFD, HADx} govern permission checking with the translation table descriptors. • CD.{ASID, ASET} govern ASID-tagged TLB entries. • CD.{MAIR, AMAIR} modify the attribute determined from translation. • CD.{HA,HD} determines HTTU configuration for walks performed through TTB{0,1}. Some STE and CD fields are permitted to be cached as part of a translation or TLB entry (therefore requiring invalidation of TLB entries that might contain the old value when the fields are changed). These fields are noted in sections 5.2 Stream Table Entry and 5.4.1 CD notes. With the exception of these fields, no other information is expected to be ‘carried forward’ between structures. Some configuration register fields are, where indicated, permitted to be cached in a TLB. Changes to these fields require invalidation of any TLB entries that might cache a previous value of the field. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1119

Chapter 16. System and Implementation Considerations 16.3. Programming implications of bus address sizing 16.3 Programming implications of bus address sizing If pointers are programmed into a device from the PE that would, on the PE, cause translation faults due to failing sign-extension checks, the SMMU will also raise a translation fault because of the sign-extension checks on input. However, if a system cannot convey all 64 address bits from a device, or a device lacks the ability to register upper address bits, the SMMU does not have enough information to perform these checks. In this case, Arm recommends that if detection of such errors is required, software (or the device, if it has the facility to hold the full address) checks the validity of upper bits. On Armv8-A PEs, the TBI facility allows the top byte of addresses to contain tags that are ignored when checking address sign-extension validity. If an address is truncated to <= 56 bits on the flow through device DMA registers to device DMA accesses to I/O interconnect to the SMMU, the SMMU cannot check the validity of the top byte so effectively TBI is always used. In such a system, another entity (device, software) must check addresses if required. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1120

Chapter 16. System and Implementation Considerations 16.4. System integration 16.4 System integration • The SMMU must be in the same Shareability domain as any other agents that might use DVM with the SMMU. – In systems implementing architectures prior to Armv8.4, DVM messages are only broadcast over the Inner Shareable domain. – Some systems implementing Armv8.4 may broadcast DVM messages over the Outer Shareable domain. • In general, Arm does not expect SMMUs to be connected in series. Note: This topology needs special software support, particularly when different software modules manage different SMMUs. This must not be used to construct two stages of translation using two SMMU implementations that support only one stage, as it is programmed differently to an SMMU that supports both stages. • When used with a PCIe subsystem, an SMMU implementation must support at least the full (16-bit) range of PCI RequesterIDs and the system must ensure that a Root Complex generates StreamIDs from PCI RequesterIDs (BDF) in a one to one or linear fashion so that StreamID[15:0] == RequesterID[15:0]. A larger StreamID might be constructed by concatenating the RequesterIDs from multiple PCI domains (or “segments” in ACPI terminology), for example: – StreamID[17:0] == { pci_rc_id[1:0], pci_bus[7:0], pci_dev[4:0], pci_fn[2:0] }; that is, StreamID[17:0] == { pci_domain[1:0], RequesterID[15:0] }; When used with a PCIe system supporting PASIDs, Arm recommends that the SMMU supports the same number of (or fewer) PASID bits supported by client Root Complexes so that software is able to detect end-to-end SubstreamID capabilities through the SMMU. • If accesses from a device are expected to experience page faults and the Stall model is used, Arm recommends that a system does not depend on other devices on the same SMMU path as the device in order to resolve the faults. Because a stalled transaction occupies an input buffer resource, the SMMU might not guarantee to pass traffic whether faulting or not, and any new request for device DMA might deadlock. • Streams belonging to PCIe endpoints must not be stalled. The Terminate model is the only useful option. Stalling PCIe transactions risks either timeouts from the PCIe endpoint (which might be difficult to recover from), or deadlock in certain scenarios. A system is permitted to enforce this, for safety reasons. See section 3.12 Fault models, recording and reporting. • Specifically, PCIe traffic (especially if configured to Terminate, architecturally not stalling) must not be held up waiting for any PE action, including draining the Event queue or restarting stalled transactions. PCIe traffic must always make forward progress without unbounded delays dependent on software. An implementation must ensure that transactions to be terminated are not blocked by any other users of the SMMU which might consume resources or stall transactions for an indefinite time. 16.4.1 System integration for an SMMU with RME DA 16.4.1.1 StreamID For a device interface that might operate in a trusted or untrusted mode (that is, such that SEC_SID = Non-secure or Realm), the StreamID presented to the SMMU is the same across the two modes. 16.4.1.2 DeviceID According to the PCIe specification regarding the TDISP feature [1], a TDISP-compliant device might issue MSIs in two manners: 1. If the MSI is configured via the MSI capability in configuration space, it is sent to the host SoC with T = 0 and therefore presented to the SMMU with SEC_SID = Non-secure. 2. If the MSI is configured via the MSI-X capability in the protected MMIO region of the device interface, it is sent to the host SoC with T = 1 and therefore presented to the SMMU with SEC_SID = Realm. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1121

Chapter 16. System and Implementation Considerations 16.4. System integration MSIs from a single device interface are presented to the GIC ITS interface with the same DeviceID regardless of which MSI mechanism is used. Note: The target PA space of an MSI is determined from configuration in translation tables, the DPT and the configuration structures for the programming interface, StreamID and address for the target of the MSI, consistent with the behavior for any client-originated access. See 3.18 Interrupts and notifications, 3.10.1 StreamID Security state (SEC_SID), and 3.10.2 Support for Secure state. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1122

Chapter 16. System and Implementation Considerations 16.5. System software 16.5 System software Note: Software must: • Not assume that both stage 1 and stage 2 are implemented. • Support systems in which broadcast TLB invalidation messages are not supported so do not invalidate SMMU TLB entries, that is fall back to software TLB invalidation messages. • Discover StreamID and SubstreamID sizes and capabilities. • Probe SMMU_IDR1 for PRESET configuration table and queue base pointers, only allocating memory for pointers that require initialization. • Discover the maximum table sizes of the SMMU rather than using fixed-size tables. • Not make assumptions about which SMMU Security state or states it is interacting with, and not make assumptions about which Security states are supported in the SMMU. • Present system-specific StreamIDs as part of firmware descriptions for each device, as the StreamIDs associated with a physical device are system-specific. • Ensure that when HTTU is not used, descriptors mapping DMA memory are marked Accessed (and not Read-Only, if DMA writes are expected) in order to avoid faults. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1123

Chapter 16. System and Implementation Considerations 16.6. IMPLEMENTATION DEFINED features 16.6 IMPLEMENTATION DEFINED features 16.6.1 Configuration cache locking and TLB locking Note: The lockdown of configuration cache entries and TLB entries is not a feature directly described by the SMMU architecture because cache structures might vary between implementation and entry lockdown might expose this layout to software. An implementation might support cache locking in an IMPLEMENTATION DEFINED manner using registers in the IMPLEMENTATION DEFINED memory map. Note: These registers might expose cache contents and provide insertion, probe and invalidation operations. If an implementation supports TLB locking, TLB invalidation must be consistent with the Armv8-A rules on locked entries: • A TLB invalidate-all operation does not invalidate locked entries. • An implementation might choose to implement TLB invalidate-by-VA or invalidate-by-ASID operations so that they do one of the following: – Invalidate locked entries that are explicitly matched by the operation. – Do not invalidate locked entries. A locked TLB entry is not affected by over-invalidation side effects of invalidation operations that do not directly match the entry. The behavior of CMD_CFGI_* commands with respect to locked configuration cache entries is IMPLEMENTATION DEFINED. See SMMU_S_INIT.INV_ALL, this initialization invalidate-all operation invalidates locked entries. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1124

Chapter 16. System and Implementation Considerations 16.7. Interconnect-specific features 16.7 Interconnect-specific features 16.7.1 Reporting of Unsupported Client Transactions The SMMU behaves as though a single transaction is associated with one translation. SMMU implementations might define their own input alignment restrictions leading to an unsupported client transaction error. For example, an implementation with an AMBA downstream interconnect is likely to treat an incoming transaction that crosses a 4KB boundary as unsupported, because these would violate the alignment rules of the downstream interconnect. For AMBA 4 systems the following upstream client transactions are unsupported: • Far Atomic operations (see section 16.7.6 Far Atomic operations) where not supported by the downstream interconnect or SMMU implementation. Such transactions will be aborted and an F_UUT event will be recorded if possible. 16.7.2 Non-data transfer transactions Some interconnect architectures support transactions that do not perform a data transfer, prefetch or translation request action. If the input interconnect can express the following transactions from client devices, the transactions will be terminated silently by the SMMU as SLVERR (or equivalent on non-AMBA interconnect): • DVM operations (of all sub-types). • Barriers. The interconnect architecture of an implementation might support the following non-data operations, also known as Cache Maintenance Operations (CMOs), that perform address-based cache maintenance: • Clean. • Invalidate. • CleanInvalidate. • CleanToPersistence. • Destructive hint (DH): Operation that has a hint side effect of invalidate. Note: In AMBA AXI [8], the equivalent of the above operations are the CleanShared, MakeInvalid, CleanInvalid, CleanSharedPersist and InvalidateHint transactions, respectively. SMMUv3.0: • The SMMU does not support CMOs. If an SMMUv3.0 implementation can receive these operations from client devices, they are handled in an IMPLEMENTATION DEFINED way. A system that requires SMMU support for CMOs is required to implement SMMUv3.1 or later. SMMUv3.1 and later: • CMOs that are not address-based are not supported and are silently terminated by the SMMU. • CMOs are permitted to pass into the system, without the transformation described in section 16.7.2.1 Control of Cache Maintenance Operations, when the transaction bypasses all implemented stages of translation. See section 16.7.2.3 Memory types and Shareability for Cache Maintenance Operations on memory types. – Note: For a Secure stream, SMMU_S_CR0.SIF still applies. See section 16.7.2.2 Permissions model for Cache Maintenance Operations. • When one or more stages of translation is applied, the SMMU allows these operations to progress into the system subject to the configuration controls and permission model described in sections 16.7.2.1 Control of Cache Maintenance Operations and 16.7.2.2 Permissions model for Cache Maintenance Operations. Additionally, Invalidate operations might be transformed into Clean and Invalidate operations as part of these checks. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1125

Chapter 16. System and Implementation Considerations 16.7. Interconnect-specific features • When the input interconnect can deliver these operations, but the output interconnect does not support them, the transactions are silently terminated. SMMU implementations supporting AMBA might define an IMPLEMENTATION DEFINED set of unsupported incoming transactions. SMMU implementations supporting other interconnects might define their own set of unsupported incoming transactions. Note: See section 3.22 Destructive reads and directed cache prefetch transactions. A destructive read (read with invalidate), write with directed prefetch, or standalone directed prefetch transaction is not considered to be a discrete Cache Maintenance Operation and is handled differently. 16.7.2.1 Control of Cache Maintenance Operations In SMMUv3.1 and later, STE.DRE controls whether an Invalidate operation is transformed as follows: Input transaction class DRE == 0 DRE == 1 Notes Invalidate Transformed into CleanInvalidate. The operation is treated identically to a CleanInvalidate for permission evaluation. Eligible for output as Invalidate (if permissions checks allow) If SMMU_IDR3.MTCOMB is 1, then for a Forced-WB transaction, the value of STE.DRE is treated as 0. Destructive hint (DH) Transformed into No-op. Eligible for output as destructive hint (if permissions checks allow) If SMMU_IDR3.MTCOMB is 1, then for a Forced-WB transaction, the value of STE.DRE is treated as 0. The STE.DRE field applies in this manner when one or more stages of translation are applied. This does not include the case where the only stage of translation is skipped due to STE.S1DSS. 16.7.2.2 Permissions model for Cache Maintenance Operations In SMMUv3.1 and later, the SMMU_S_CR0.SIF permission check applies to Cache Maintenance Operations (CMOs). This applies when either translation or bypass occurs. In SMMUv3.1 and later, when one or more stages of translation are applied, the following permissions are required for CMOs: Maintenance operation type Required permissions Behavior if permissions not met Clean, CleanInvalidate, CleanToPersistence Identical to ordinary read: Requires Read or Execute permission, (depending on input InD and INSTCFG) at privilege appropriate to PnU input and STE.PRIVCFG. Identical to ordinary read. Invalidate To progress as Invalidate, requires both Read-or-Execute permission (depending on input InD and INSTCFG) and Write permission at a privilege appropriate to the PnU input and STE.PRIVCFG. If no Read/Exec permission is available behavior is identical to an ordinary read. If Read/Exec permission is available but Write permission is not, an Invalidate is transformed into a CleanInvalidate operation.(1) ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1126

Chapter 16. System and Implementation Considerations 16.7. Interconnect-specific features Maintenance operation type Required permissions Behavior if permissions not met Destructive hint (DH) To progress as destructive hint, requires both Read-or-Execute permission (depending on input InD and STE.INSTCFG), and Write permission that does not result in HTTU update of Dirty state, at a privilege appropriate to the PnU input and STE.PRIVCFG. Invalidate does not occur.(1)(2) (1) This includes the case where a GPC does not grant write permission when SMMU_ROOT_IDR0.GDI == 1. See 3.25.10 Granular Data Isolation. (2) A DH operation is a hint. If the required permissions are not met for a DH at a given stage of translation, the DH operation is treated as a No-op and does not progress into the system. If HTTU of dirty state is enabled, a DH operation does not mark a page Dirty. If the translation for a DH operation is writable-clean, the SMMU does not perform the hardware update of dirty state and instead the DH operation is treated as a No-op and does not progress into the system. If a DH operation is permitted to progress through a stage of translation and HTTU of Access flag is enabled for that stage, AF is updated. If the translation conditions permit an AF update, but the DH is not permitted to progress into the system, a coincidental speculative update of AF might occur. If a Clean, CleanInvalidate, Invalidate or CleanToPersistence operation leads to a fault, it is recorded as a read, that is RnW == 1. The read can be treated as either data or instruction, depending on the input InD/INSTCFG. On fault, these operations stall in the same way as an ordinary read transaction if the SMMU is configured for stalling fault behavior. Retry and termination behave the same as for an ordinary read or write transaction. If these transactions are stalled and retried, they are retried as the same transaction type. A DH transaction does not cause faults in the SMMU nor does it cause an abort response to be returned where the interconnect architecture requires a response. An implementation is permitted to downgrade a DH operation as described in this section, for any reason. Note: The input interconnect might supply all CMOs as Data. See section 3.13.8 Hardware flag update for Cache Maintenance Operations and Destructive Reads for information on the behavior of HTTU for Invalidate operations. When HTTU is enabled for Access flag updates and the translation descriptor and AFFD configuration require it, a Clean, CleanInvalidate, CleanToPersistence, Invalidate, or DH operation updates the Access flag. See section 3.13.2 Access flag hardware update. 16.7.2.3 Memory types and Shareability for Cache Maintenance Operations Cache Maintenance Operations (CMOs) do not have a memory type. If an input shareability is provided, it does not undergo any normalization before entering the attribute determination process described in Chapter 13 Attribute Transformation. If an input shareability is not provided, the default shareability is taken as described in section 13.1.3 Default input attributes. After input, the output Shareability of a CMO is determined in the same way to that of an ordinary transaction. Note: This means that the input shareability of a CMO is not dependent on the input memory type even if the input bus encodes a memory type, because the SMMU does not consider a memory type to be provided on input for a CMO. In SMMUv3.1 and later, this rule applies to all such operations in all translation and bypass configurations, including: • Global bypass (attribute set from GBPA). • STE bypass (whether STE.Config == 0b100 or STE.S1DSS and STE.Config == 0b101 causes skip of the only stage of translation). • Translation. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1127

Chapter 16. System and Implementation Considerations 16.7. Interconnect-specific features Note: On AMBA AXI5 interfaces [8], it is not permitted to issue CMOs, including the DH operations, with Sys shareability. 16.7.3 Treatment of AMBA Exclusives from client devices The AXI specification does not permit Exclusive accesses to the Shareable domain. Therefore, if the SMMU interface to the system interconnect is AXI, an Exclusive access transaction that is translated into an Inner-shareable or Outer-shareable transaction cannot be marked as Exclusive. Arm recommends that such transactions are transformed into non-Exclusives. For more information about the relationship between AMBA and Armv8 output attributes, see section 16.7.5.2 Conversion of Armv8 attributes to AMBA on output and representation of Shareability. The outcome of such a transformed Exclusive transaction is equivalent to that of an ordinary transaction and depends on whether the transaction experiences a fault and, if it faults, fault configuration. The transaction will experience one of the following: • Translates without fault, returning the non-exclusive transaction’s response to the upstream client device. A response of EXOK is not possible (as the transaction is now non-exclusive). A response of OK will be treated as an exclusive fail by the upstream client device. • Faults on translation and is terminated with abort. These aborts are reported to the upstream client device in the same way for transformed Exclusive transactions as for regular transactions (for example as SLVERR). • Fault on stage 1 translation and be terminated with RAZ/WI semantics because CD.A == 0. This returns a response of OK. 16.7.4 Treatment of downstream aborts Some systems might allow a Completer device to abort transactions, returning status to the Requester. Translated transactions initiated by a client device that are aborted in the memory system are not recorded in the SMMU. The abort is returned to the client device, which is responsible for recording and reporting such faults. Aborted transactions that were internally-initiated by the SMMU are recorded by the SMMU if possible to do so. The event recorded by the SMMU, on one of its accesses being returned with abort status (whether aborted by the interconnect or Completer), depends on the type of access: STE fetch: F_STE_FETCH CD fetch: F_CD_FETCH VMS fetch: F_VMS_FETCH Translation table walk: F_WALK_EABT Command queue read entry: GERROR.CMDQ_ERR & Command queue CERROR_ABT ECMDQ read entry: GERROR.CMDQP_ERR & Command queue CERROR_ABT Event queue access: GERROR.EVENTQ_ABT_ERR PRI queue access: GERROR.PRIQ_ABT_ERR MSI write: GERROR.MSI_*_ABT_ERR 16.7.5 SMMU and AMBA attribute differences 16.7.5.1 Conversion of AMBA attributes to Armv8 on input and inferring Inner At- tributes Note: Previous revisions of AMBA allowed to distinguish between the Non-shareable (NSH), Inner Shareable (ISH) and Outer Shareable (OSH) domains, for example using the AxDOMAIN field of AXI. In the current version ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1128

Chapter 16. System and Implementation Considerations 16.7. Interconnect-specific features of all AMBA specifications, only the Non-shareable (NSH) and Shareable domains may be expressed. This section adopts the current convention. AMBA does not explicitly encode separate inner attributes for an upstream client device. Arm recommends that the inner and outer attributes are considered to be the same as the outer attributes except in the following case for data operations: • It is IMPLEMENTATION DEFINED whether Non-cacheable with any AxDOMAIN value is treated as iNC-oNC-OSH or whether Non-cacheable with an AxDOMAIN value of NSH/Shareable is treated as an iWB-oNC-{NSH,Shareable} type. In the latter case, it must be considered to be Read-Allocate and Write-Allocate. Arm recommends using AxDOMAIN == Sys for Non-cacheable requests. Note: Determination of the inner attributes might be used if the downstream interconnect can convey inner attributes. 16.7.5.1.1 Conversion of input attributes from AMBA to Armv8 architectural attributes Incoming AMBA attributes are converted to SMMU/Armv8 architectural attributes as follows: AMBA attribute Armv8 attribute Notes Device-Sys non-bufferable Device-nGnRnE Device-Sys bufferable Device-nGnRE Normal-Non-cacheable-Sys (bufferable or non-bufferable) Normal-iNC-oNC-OSH Normal-Non-cacheable {NSH,Shareable} (bufferable or non-bufferable)(4) Normal-iNC-oNC-OSH Or Normal-iWB-oNC- {NSH,OSH} This is an IMPLEMENTATION DEFINED choice. When the input is treated as iNC-oNC-OSH, RA/WA/TR do not exist. Otherwise, RA, WA are 1 and TR is 0 (non-transient). Normal-WriteThrough- {NSH,Shareable}(4) Normal-iNC-oNC-OSH(1) Or Normal-iWT-oWT- {NSH,OSH} This is an IMPLEMENTATION DEFINED choice. When the input is treated as iNC-oNC-OSH, RA/WA/TR do not exist. Otherwise, RA,WA are from input and TR is 0 (non-transient). Normal-WriteBack-{NSH,Shareable}(2) Normal-iWB-oWB- {NSH,OSH} RA, WA from input. TR is 0 (non-transient). Normal-WriteBack Shareable/Snoopable(3) Normal-iWB-oWB-OSH RA, WA from input. TR is 0 (non-transient). Normal-WriteBack Non-shareable/Non-snoopable(3) Normal-iNC-oWB-OSH RA, WA from input. TR is 0 (non-transient). (1) The conversion between architectural and AMBA attributes might consider WriteThrough to be equivalent to a Normal Non-cacheable type on output and an implementation might, for consistency, apply this strategy on input. (2) Applicable to implementations that do not support AMBA Outer Cacheable Mode. (3) Applicable to implementations that support AMBA Outer Cacheable Mode. (4) These encodings are permitted but it is recommended that they are not used. An ACE-Sys input Shareability in considered to be OSH for the purposes of attribute combining and overriding as described in section 13.1 SMMU handling of attributes. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1129

Chapter 16. System and Implementation Considerations 16.7. Interconnect-specific features 16.7.5.2 Conversion of Armv8 attributes to AMBA on output and representation of Shareability The SMMU specifies the architectural Inner and Outer Cacheability and Shareability attributes. However, in some circumstances there is a non-obvious transformation of these attributes into an AMBA representation: • The architecture considers any-Device/Normal-iNC-oNC to be OSH, while ACE considers these to be ‘Sys’. • Final attributes of any-Device/Normal-iNC-oNC are presented on AMBA as ACE-Device-Sys/ ACE-Normal-Non-cacheable-Sys. • If the implementation does not transform final attributes of i{WB,WT}-oNC-OSH (inner cacheable of any variety) to Normal-Non-cacheable-SYS as set out in 16.7.5.3 Common interpretation of attribute encoding between SMMU and PE (for example, a different interpretation of attribute mapping is used to that of Arm PE IP) and these attributes are transformed to an ACE cacheable type, the type is represented as ACE-OSH. 16.7.5.2.1 Conversion of Armv8 architectural attributes to AMBA on output SMMU/Armv8 architectural attributes are converted to AMBA attributes on output as follows: Armv8 attribute AMBA attribute Notes Device-nGnRnE Device-Sys non-bufferable Device-(n)G(n)RE Device-Sys bufferable Normal-iNC-oNC-OSH Normal-Non-cacheable-Sys bufferable Architecturally, a Normal-iNC-oNC-{NSH,ISH} attribute is not possible, only OSH. Normal-iNC-oWT -{NSH,ISH,OSH} Normal-Non-cacheable-Sys bufferable (1) Normal-iNC-oWB- {NSH,ISH,OSH}(2) Normal-Non-cacheable-Sys bufferable (1) Normal-iNC-oWB- {NSH,ISH,OSH}(3) Normal-WriteBack Non-shareable/Non-snoopable (1) Normal-iWT-oNC -{NSH,ISH,OSH} Normal-Non-cacheable-Sys bufferable (1) Normal-iWT-oWT -{NSH,ISH,OSH} Normal-Non-cacheable-Sys bufferable (1) Normal-iWT-oWB -{NSH,ISH,OSH} Normal-Non-cacheable-Sys bufferable (1) Normal-iWB-oNC -{NSH,ISH,OSH} Normal-Non-cacheable-Sys bufferable (1) Normal-iWB-oWT -{NSH,ISH,OSH} Normal-Non-cacheable-Sys bufferable (1) Normal-iWB-oWB -{NSH,ISH/OSH}(2) Normal-WriteBack-{NSH,Shareable} Normal-iWB-oWB -{NSH,ISH,OSH}(3) Normal-WriteBack Shareable/Snoopable ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1130

Chapter 16. System and Implementation Considerations 16.7. Interconnect-specific features (1)See 16.7.5.3 Common interpretation of attribute encoding between SMMU and PE below: these transformations correspond to the transformations implemented in the PEs in the system. The outputs shown correspond to Arm Cortex IP. For other PE IP, the interpretations of the Armv8 attributes are IMPLEMENTATION DEFINED. (2) Applicable to implementations that do not support AMBA Outer Cacheable Mode. (3) Applicable to implementations that support AMBA Outer Cacheable Mode. When a cacheable type is output, AMBA interconnect RA and WA attributes are generated directly from the RA/WA portion of the Arm architectural attribute. Section 13.1.7 Ensuring consistent output attributes mandates that the SMMU will not output architecturally-inconsistent attributes or attribute combinations that are illegal for the interconnect. For AMBA, the output AxDOMAIN is made consistent with the final AxCACHE value if it is not already. If required, this is made consistent by choosing the highest (most shareable) value of AxDOMAIN that is legal given AxCACHE. Normal Non-cacheable types are always bufferable. The output AxDOMAIN is ACE-Sys if the final attributes are a Device or a Non-cacheable type. For example, in the case where: • The SMMU is configured to bypass, SMMU_CR0.SMMUEN == 0. • SMMU_GBPA.MTCFG == 1, and the input MemAttr is overridden to ‘iWB-oWB’ by SMMU_GBPA.MemAttr. • SMMU_GBPA.SHCFG == ”use-incoming”. • An ACE input attribute provides ACE-Device-Sys. The final output of the SMMU is ACE-WB-OSH. 16.7.5.3 Common interpretation of attribute encoding between SMMU and PE If interoperation with an Arm A-profile PE is required, then if AMBA Outer Cacheable Mode is not supported, a Normal memory attribute that is not iWB-oWB is transformed to the architectural type iNC-oNC-OSH. See 16.7.5.2 Conversion of Armv8 attributes to AMBA on output and representation of Shareability. In AMBA-ACE systems this is represented as ACE-NC-Sys. Note: AMBA Outer Cacheable Mode enables an additional transformation as shown in the table. For example, a final output attribute of iWT-oNC-NSH is converted to iNC-oNC-OSH and is therefore output into ACE-NC-Sys in an AMBA-ACE system. Access attributes of type any-Device are unaffected by this rule. Otherwise, for interoperation with other PE IP, the transformations between Normal memory attributes that are not iWB-oWB or iNC-oNC and AMBA attributes are IMPLEMENTATION DEFINED. 16.7.6 Far Atomic operations If an interconnect and SMMU supports client device-initiated Far Atomic operations according to the atomic operations specified in Armv8.1-A [2], they experience permission checking as though they perform both a read and a write operation. See section 13.1.1 Attribute definitions for permission checking and fault reporting. An atomic access is considered to be a write that also performs a read, so is always considered to be Data. The InD attribute and any INSTCFG overrides are ignored for atomic accesses. Note: For example, a Far Atomic increment to an address in a read-only page must cause a write Permissions Fault (if all other translation requirements are satisfied). If the transaction is configured to stall and is later retried, the entirety of the transaction must be retried atomically. It is prohibited to satisfy the read of data prior to raising a write fault for the update of the data and then use the same read data when the transaction is later retried. The retry must perform the unbroken atomic transaction in one action. If an upstream interconnect can express this kind of atomic transaction, but the downstream interconnect or system cannot, one of the following occurs: 1. Terminate the transaction in the SMMU with an abort, and record an F_UUT. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1131

Chapter 16. System and Implementation Considerations 16.7. Interconnect-specific features 2. Support Far Atomic transactions within the SMMU, converting them to local monitor atomic operations using a fully-coherent cache in the SMMU. In case (1) where far atomics are not supported at all, Arm recommends that the system ensures that upstream devices are not able to emit these transactions (and that software not expect to use them). 16.7.7 AMBA DVM messages with respect to CD.ASET == 1 TLB entries CD.ASET == 1 affects the interaction of TLB entries with DVM messages in the following ways: Entries created from StreamWorld == NS-EL1 are not required to be invalidated by: • Guest OS TLB invalidation by ASID. • Guest OS TLB invalidation by ASID and VA. Entries created from StreamWorld == Secure are not required to be invalidated by: • Secure TLB invalidation by ASID. • Secure TLB invalidation by ASID and VA. Entries created from StreamWorld == any-EL2-E2H are not required to be invalidated by: • Hypervisor TLB invalidation by ASID. • Hypervisor TLB invalidation by ASID and VA. Entries created from StreamWorld == any-EL2 are not required to be invalidated by: • Hypervisor TLB invalidation by VA. • Hypervisor TLB invalidation by ASID. • Hypervisor TLB invalidation by ASID and VA. Entries created from StreamWorld == EL3 are not required to be invalidated by: • EL3 TLB invalidation by VA. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1132

Chapter 16. System and Implementation Considerations 16.8. Summary of SMMU transactions and their PCIe and AMBA equivalents 16.8 Summary of SMMU transactions and their PCIe and AMBA equivalents Table 16.6: SMMU AMBA PCIe transactions Transaction type AXI/ACE-Lite DTI LTI Transaction type (LATRANS) SMMU PCIe equivalent1 Signal Opcode Ordinary read request Memory read request ARSNOOP ReadNoSnoop ReadOnce DTI_TBU_TRANS_REQ.PERM == R R RCI Not applicable ARSNOOP ReadOnceCleanInvalid DTI_TBU_TRANS_REQ.PERM == R R-CMO DR Not applicable ARSNOOP ReadOnceMakeInvalid DTI_TBU_TRANS_REQ.PERM == R R-DCMO Speculative transaction2 Not applicable Not applicable Not applicable Not applicable Not applicable Far Atomic operations FetchAdd, Swap, CAS AWATOP AtomicStore AtomicLoad AtomicSwap AtomicCompare DTI_TBU_TRANS_REQ.PERM == RW RW Ordinary write transaction Memory write request AWSNOOP WriteNoSnoop WriteUniquePtl WriteNoSnoopFull WriteUniqueFull WriteZero DTI_TBU_TRANS_REQ. PERM == W W W-DCP Memory write request with TLP Processing Hint - with a non-zero Steering Tag (ST) field AWSNOOP WriteUniquePtlStash WriteUniqueFullStash DTI_TBU_TRANS_REQ.PERM == W W-DCP NW-DCP Zero-length Write request with TLP Processing Hint - with a non-zero ST field AWSNOOP StashOnceShared StashOnceUnique DTI_TBU_TRANS_REQ.PERM == SPEC DCP DH Not applicable AWSNOOP InvalidateHint DTI_TBU_TRANS_REQ.PERM == SPEC DHCMO Continued on next page ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1133

Chapter 16. System and Implementation Considerations 16.8. Summary of SMMU transactions and their PCIe and AMBA equivalents Table 16.6 – Continued from previous page Transaction type AXI/ACE-Lite DTI LTI Transaction type (LATRANS) SMMU PCIe equivalent Signal Opcode Clean CleanInvalidate CleanToPersistence Not applicable ARSNOOP CleanShared CleanInvalid CleanSharedPersist DTI_TBU_TRANS_REQ.PERM == R CMO Invalidate Not applicable ARSNOOP MakeInvalid DTI_TBU_TRANS_REQ.PERM == R DCMO Ordinary translation request Not applicable Not applicable Not applicable DTI_TBU_TRANS_REQ.PERM depends on the request type4 Not applicable Ordinary speculative translation request Not applicable Not applicable3 Not applicable DTI_TBU_TRANS_REQ.PERM == SPEC Not applicable3 ATS Translation Request ATS Translation Request Not applicable Not applicable DTI_ATS_TRANS_REQ.nW depends on the request type4 Not applicable ATS PRI ATS PRI Not applicable Not applicable DTI_ATS_PAGE_REQ.{READ, WRITE} depends on the request type4 Not applicable (1) All PCIe transactions can be issued as ATS Translated Transactions. When a PCIe Device issues a transaction as ATS TT, then that transaction can be issued either over LTI as captured in the table or over AXI/ACE-Lite with AxMMUFLOW (or AxMMUATST, depending on the AXI architecture version) equal to 1. (2) The SMMU architecture allows speculative transactions to be transmitted in an IMPLEMENTATION DEFINED manner. See 3.14 Speculative accesses. (3) In the LTI specification, the LATRANS==SPEC channel transaction describes translation prefetch requests, which are not considered speculative translation requests. This is because any SMMU translation request, whether it has been issued on its own or as part of a transaction request, requires a translation response. In the AXI architecture, transactions with AxMMUFLOW==PRI or marked as StashTranslation provide similar functionality. (4) ATS and non-ATS translation requests, as well as PRI requests, can be issued over the DTI bus protocol. DTI_TBU_TRANS_REQ.PERM, DTI_ATS_TRANS_REQ.nW and DTI_ATS_PAGE_REQ.{READ, WRITE} fields can take any value, depending on the type of transaction the translation will be used for. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 1134