Tested Hardware
All development, patching, and validation described on this page was performed on this specific implementation of the ASM1166 — a compact M.2-based 6-port SATA adapter widely available on Amazon and AliExpress. The firmware used as the baseline is the Radxa build (ASM1166_10250005), with patch compatibility also verified for the station-drivers and Silverstone firmware variants.
Symptom
ASMedia ASM1166 6-port SATA PCIe controller with multiple Intel S4500 (or similar fast-responding enterprise) SSDs. Symptoms scale with SSD count.
The problem presents as a hard hang during POST — the BIOS never loads, the controller appears alive but unresponsive. The symptom is highly reproducible and configuration-dependent:
Standard remediation (different firmware builds including Silverstone and Radxa distributions, BIOS ASPM / Option ROM settings) had no effect. The problem is inside the firmware at a level that BIOS settings cannot reach.
Root Cause
Simultaneous FIS Storm
COMRESET is a SATA out-of-band signal that resets a link and triggers a device to send back a D2H Register FIS (Frame Information Structure) containing its link speed and status. The ASM1166's embedded 8051 microcontroller sends COMRESET to each of its six ports sequentially and processes them one at a time.
This loop was written with spinning hard drives in mind. An HDD takes 100ms–2 seconds to spin up and send its response. By the time the 8051 finishes processing port 0 and moves to port 1, only port 0's drive has responded. Natural mechanical latency provides free temporal separation.
An enterprise SSD has no platters. It responds to COMRESET in microseconds. All six simultaneously.
The Corruption Path
With all six SSDs responding at Gen3 simultaneously, a specific code path at 8051 address 0x0300 produces a fatal address calculation:
; firmware computes DPTR from AHCI port status value in reg[0x3D] ; normally, reg[0x3D] reflects a gradually-changing per-port state ; with all 6 SSDs at Gen3 simultaneously, reg[0x3D] = 0x03 MOV A, 0x3D ; reg[0x3D] = 0x03 (all-Gen3 state) ADD A, #0xD6 ; A = 0xD9 MOV DPH, #0x03 ; DPTR high byte = 0x03 MOV DPL, A ; DPTR = 0x03D9 ← PROBLEM ; 0x03A0–0x03D6 = live SATA/PCIe control registers ; 0x03D9 falls INSIDE that space MOVX A, @DPTR ; reads live control register at 0x03CF ; ... masks and writes back, corrupting PCIe link negotiation result ; register 0x03CF = PCIe link negotiation result — now corrupted ; PCIe state machine enters infinite retry loop → card stops responding
This explains the entire symptom matrix. Five SSDs creates near-simultaneous responses that hit the boundary condition intermittently (slowness). Three SSDs rarely triggers the exact register value that causes the corruption. Six HDDs never triggers it because their responses arrive well-separated in time.
The Fix
Inject a ~21 ms delay between each port's COMRESET, serialising the FIS exchange sequence. The 8051 processes each SSD's response fully before triggering the next COMRESET. The corruption path can never be reached.
Hook Mechanism
The port initialisation loop ends at 8051 address 0x04B3 with a jump back to the loop condition. The three bytes there are 02 00 57 — this is an 8051 LJMP 0x0057 instruction (opcode 02 + 16-bit target in big-endian).
The code cave — over 10 KB of zero-fill — starts at 8051 address 0x5681. The delay routine was placed at 0x5700. Its encoding as an LJMP target: 02 57 00.
; ROM offset 0x01C0B3 — the hook ; BEFORE: 02 00 57 ; LJMP 0x0057 (original loop-back) ; AFTER: 02 57 00 ; LJMP 0x5700 (redirect to delay routine) ; ↑↑↑↑ ; only bytes 2 and 3 are swapped. opcode unchanged.
Delay Routine (37 bytes at 8051:0x5700)
A triple-nested DJNZ (Decrement and Jump if Not Zero) loop. Calibrated for the ASM1166's single-cycle 8051 core running at 25 MHz. R2=0x04 (4) produces approximately 21 ms per call.
; ROM offset 0x021300, 8051 address 0x5700 delay_and_resume: PUSH ACC ; preserve registers PUSH B MOV R2, #0x04 ; outer counter: 4 iterations ≈ 21ms outer: MOV R1, #0xFF ; middle counter: 255 middle: MOV R0, #0xFF ; inner counter: 255 inner: DJNZ R0, inner ; 2 cycles each DJNZ R1, middle DJNZ R2, outer POP B POP ACC LJMP 0x0057 ; return to original loop condition check ; Timing: 4 × 255 × 255 × 2 cycles × 40ns/cycle ≈ 21ms per port ; Verified by Python cycle-accurate 8051 emulator before flashing
Timing Verification
The 8051 architecture has two variants: single-cycle (1 clock = 1 machine cycle, common in modern cores) and classic (1 machine cycle = 12 clocks). The factor-of-12 difference would have been fatal to untested timing. A Python cycle-accurate simulator was built and run before any bytes were committed to hardware. Result: confirmed single-cycle at 25 MHz. R2=0x04 → ~21 ms per delay call.
Patch Bytes
Total bytes changed from the original ROM: 43 out of 262,144.
| ROM Offset | Old | New | Purpose |
|---|---|---|---|
| 0x01C0B4–0x01C0B5 | 00 57 | 57 00 | Hook: LJMP 0x0057 → LJMP 0x5700 (2 bytes swapped) |
| 0x021300–0x021324 | 00 × 37 | payload | 37-byte delay routine placed in code cave |
| 0x000248–0x00024B | B1 25 0E 10 | BB 3A 43 03 | 8051 region CRC32 (LE) — updated to match patched MCU code |
| 0x0002FF | 48 | 8F | Descriptor block byte checksum — covers bytes 0x200–0x2FE including CRC entries |
ROM Integrity Architecture
The ASM1166 ROM has five independent integrity layers. All must be consistent or the flashing utility (RomUpdWin.exe) will reject the file with "Not a valid ROM file." — with no further explanation.
| # | Location | What It Covers | Status After Patch |
|---|---|---|---|
| L1 | ROM[0x220–0x24B] | Three CRC32 region entries (UEFI, config block, 8051 MCU). 8051 CRC at 0x0248 (not 0x0244 — common doc error). |
Updated ✓ |
| L2 | ROM[0x2FF] | Byte sum of ROM[0x200–0x2FE] mod 256. Covers the entire descriptor block including all CRC entries. This is the "checksum of the checksum table" — undocumented, found only by disassembling RomUpdWin.exe. | Updated ✓ |
| L3 | ROM[0x238] | CRC32 of config block ROM[0x100–0x1FF]. | Unchanged |
| L4 | ROM[0x12F] | Byte sum of config header ROM[0x100–0x12E]. | Unchanged |
| L5 | x86 BIOS ROM | Standard PCI Option ROM byte-sum checksum. | Unchanged |
Layer 2 (ROM[0x2FF]) is not documented in any public ASMedia resources or community firmware notes. It is discoverable only by reading RomUpdWin.exe's machine code. Two patch attempts failed because of this. If you write your own patch, you must recompute this byte after changing any CRC32 entry.
RomUpdWin.exe — All 9 Validation Checks
| Check | Code VA | Condition | Result |
|---|---|---|---|
| 1 | 0x402200 | ROM[0x200..0x20B] == 12 fixed magic bytes | PASS |
| 2 | 0x402284 | sum(ROM[0x200..0x2FE]) mod 256 == ROM[0x2FF] | PASS |
| 3 | 0x402293 | ROM[0x2F4] (LE32) == file size (262144) | PASS |
| 4 | 0x4022C2 | CRC32 of UEFI region vs ROM[0x228] | PASS |
| 5 | 0x402309 | CRC32 of ROM[0x100..0x1FF] vs ROM[0x238] | PASS |
| 6 | 0x40232E | "ASMT" magic at ROM[0x100] | PASS |
| 7 | 0x402355 | ROM[0x104..0x10F] == 12 magic bytes | PASS |
| 8 | 0x4023D4 | sum(ROM[0x100..0x12E]) == ROM[0x12F] | PASS |
| 9 | 0x4024BC | CRC32 of 8051 region vs ROM[0x248] | PASS |
Firmware Compatibility
The patch hook and both surrounding code blocks are byte-for-byte identical in all three analysed firmware versions. The delay routine ends with LJMP 0x0057 — this address contains the same code in all three ROMs, making the patch universally compatible.
| ROM | Date | Size | Hook Offset | Payload Offset | Patch |
|---|---|---|---|---|---|
| radxa / stock (baseline) | 2024-10-25 | 256 KB | 0x01C0B3 | 0x021300 | ✓ Patched |
| station-drivers (latest) | 2024-12-24 | 256 KB | 0x01C0B3 | 0x021300 | ✓ Patched |
| Silverstone (2021) | 2021-11-08 | 128 KB | 0x0168B3 | 0x01BB00 | ✓ Patched |
Notable Differences Between Firmware Versions
The Dec 2024 station-drivers firmware adds new PCIe lane steering logic (registers 0x1E80, 0x1E83, 0x1E86) absent in the Oct 2024 build. The 2021 Silverstone firmware is half the ROM size and performs 9 COMRESET sequences vs 3–4 in the 2024 builds. None of these differences affect the patch's injection point or behaviour — the port init loop body is byte-identical across all three.