The Engine Room: AMD's Detours-Based Hook Library (amdihk64)

contents

Part 2 of the Anti-Lag+ teardown. Part 1 covered what Anti-Lag+ patched in the game and why VAC flagged it. This one tears down a production inline-hook engine that ships in the very same driver package: amdihk64.dll (“AMD inline-hook”).

Caveat up front. I could not prove amdihk64 is on the Anti-Lag+ code path. Part 1’s Delag detour uses amdxc64’s own embedded Detours; amdihk64 is a separate, standalone hooker - a sibling engine, and an excellent specimen of how this class of trampoline hooker is built. Most of this post is that teardown, which stands on its own. Where I reach past what I proved (e.g. “is this Anti-Lag+’s input hook?”), I flag it explicitly.

In Part 1 we watched amdxc64.dll walk the game’s call stack and inline-patch a call site with Microsoft Detours. That driver carries its own statically-linked copy of Detours. But the package ships a second, standalone hooking component too - amdihk64.dll, ~210 KB, exporting just Init / Terminate and internally advertising AMDSetHookE, AMDSetXHookE, AMDRemoveHookM, AMDRemoveXHookM. Its import table is a hooker’s shopping list:

VirtualAlloc / VirtualProtect / VirtualQuery / FlushInstructionCache   ; patch primitive
SuspendThread / Get/SetThreadContext / ResumeThread                    ; safe live patching
SetWindowsHookExA / CallNextHookEx / UnhookWindowsHookEx / EnumWindows  ; injection / triggers
OpenMutexA / CreateEventA / WaitForMultipleObjects                     ; cross-process control
GetProcAddress / LoadLibraryA                                          ; resolution
Detoured                                                              ; <- Microsoft Detours

That last import, plus a detoured.dll shipped alongside, is the tell: amdihk64 is Microsoft Detours wearing an AMD hat. We’ll prove it from the bytes.

A Detours transaction, in three acts

Detours installs hooks inside a transaction: DetourTransactionBegin → DetourAttach (one or more) → DetourTransactionCommit. amdihk64 implements exactly that shape with three internal functions.

Act 1 - Begin (`0x180001334`)

if (DAT_18002dde4 == 0) {                       // no transaction open?
    tid = GetCurrentThreadId();
    InterlockedClaim(&g_ownerTid, tid);         // this thread now owns the transaction
    g_descriptorList = 0;  g_threadList = 0;    // reset pending hooks + suspended threads
    g_error = 0;
    for (pool = g_trampolinePools; pool; pool = pool->next)
        VirtualProtect(pool, 0x10000, PAGE_EXECUTE_READWRITE, &old);  // open for writing
    return 0;
}
return 0x10dd;                                   // ERROR_INVALID_OPERATION - a txn is already open

Two details worth noting. The trampolines live in pre-allocated 64 KB pools that are kept PAGE_EXECUTE_READ (0x20) at rest and only flipped to RWX (0x40) for the duration of a transaction - minimizing the window where AMD’s own code pages are writable. And 0x10dd is ERROR_INVALID_OPERATION - the exact code Microsoft Detours returns for a nested transaction.

Act 2 - Attach (`0x180001898`): building a trampoline

This is the heart of it. Given a pointer-to-target and a hook function, it builds the trampoline. Cleaned up:

// 1. follow jump thunks to the *real* target (IAT stubs, EB->E9 chains)
if (target[0]==0xFF && target[1]==0x25)        // jmp qword [rip+disp]
    target = **(void***)(target + 6 + *(int32*)(target+2));
else if (target[0]==0xEB && (target += 2 + (int8)target[1], target[0]==0xE9))
    target += 5 + *(int32*)(target+1);

// 2. steal whole instructions off the prologue until we have >= 5 bytes for the E9
uint stolen = 0;
while (stolen < 5) {
    next = LengthDecode(target + stolen);       // FUN_180002318 - instruction-length disassembler
    byte op = target[stolen];
    if (op==0xE9 || op==0xC3 || op==0xC2 || op==0xCC ||   // jmp/ret/ret n/int3 - not relocatable
        (op==0xFF && target[stolen+1]==0x25))             // jmp [rip] - bail
        { error = (stolen < 5) ? 9 : 0; break; }
    stolen = next - target;
}
if (stolen > 0x1a) { error = 6; goto fail; }    // prologue too hairy (>26 bytes)

// 3. trampoline = [stolen original bytes][ FF 25 -> jmp back to target+stolen ]
memcpy(tramp, target, stolen);
tramp[stolen+0] = 0xFF; tramp[stolen+1] = 0x25; // jmp qword [rip+0]
*(int32*)(tramp+stolen+2) = (int32)(&tramp_ptr - (tramp+stolen+6));
fill_with_0xCC(...);                            // pad the slack
tramp->originalLen = stolen;                    // saved so Commit/Abort can restore

// 4. mark the module as "detoured" exactly once (Detours' module signature)
VirtualProtect(target, stolen, PAGE_EXECUTE_READWRITE, &old);
short *hdr = ModuleHeaderSlot(target);
if (hdr && hdr[0xe] != 0x6544 /* "De" */) {     // not already stamped?
    hdr[0xe]=0x6544; hdr[0xf]=0x6f74; hdr[0x10]=0x7572; hdr[0x11]=0x2173;  // ASCII "Detours"-ish tag
}

// 5. queue the descriptor onto the transaction list (NOT patched yet)
desc->target = target; desc->originalProt = old; desc->tramp = tramp;
desc->next = g_descriptorList; g_descriptorList = desc;

Step 1 (follow jumps) is Detours’ DetourCodeFromPointer. Step 2 is the classic “relocate enough whole instructions to fit a 5-byte E9” - with a denylist of instructions that can’t be moved (a ret/jmp/int3 in the first 5 bytes means there’s no safe spot, so it gives up with code 9). Step 4 is Detours stamping its per-module signature so it never double-marks a module. Nothing is patched yet - Attach only stages the work.

Act 3 - Commit (`0x1800014dc`) and the thread-safety trick

Commit is where the prologue actually gets overwritten - and it’s the part most hand-rolled hookers get wrong:

for (desc in g_descriptorList) {
    desc->target[0] = 0xE9;                                  // jmp rel32 -> trampoline
    *(int32*)(desc->target+1) = (tramp+0x30) - (target+5);
    fill_remaining_prologue_with_0xCC();
}
// THE important bit: every thread was suspended at Begin/UpdateThread time.
for (thr in g_threadList) {
    GetThreadContext(thr, &ctx);
    if (ctx.Rip inside a patched prologue)  ctx.Rip = relocate(original -> trampoline);
    if (ctx.Rip inside a trampoline)        ctx.Rip = relocate(trampoline -> original);
    SetThreadContext(thr, &ctx);                             // no thread resumes mid-instruction
}
VirtualProtect(target, len, oldProt, &tmp);                  // relock the page
FlushInstructionCache(GetCurrentProcess(), target, len);
ResumeThread(thr);                                           // for every suspended thread

Suspending the process’s other threads and then rewriting any RIP that was caught inside the region being patched is what makes it safe to hook hot, running code without crashing. If the commit fails partway, a fourth function (0x1800013c8, Abort) restores the original protections, frees the trampolines, re-locks the pools to PAGE_EXECUTE_READ, and resumes the threads - a clean rollback.

So what does it hook?

A worker thread (0x180005270), gated by a named mutex AMDRemoveHookM (so install/remove is coordinated, even across processes), resolves and hooks two functions out of user32.dll:

g_pGetRawInputData   = GetProcAddress(user32, "GetRawInputData");
g_pGetRawInputBuffer = GetProcAddress(user32, "GetRawInputBuffer");
SetInlineHook(&g_pGetRawInputData,   hook_GetRawInputData);
SetInlineHook(&g_pGetRawInputBuffer, hook_GetRawInputBuffer);
...
// plus a thread-local Windows hook on the foreground window's thread:
tid = GetWindowThreadProcessId(GetForegroundWindow(), NULL);
g_kbHook = SetWindowsHookExA(WH_KEYBOARD /*2*/, kbProc, NULL, tid);

GetRawInputData / GetRawInputBuffer are how a game reads raw mouse and keyboard input. Hooking them is a way to timestamp the exact moment the game samples input - the front edge of the input-to-photon pipeline, and exactly the kind of thing a latency feature would want. It’s tempting to draw the line straight to Anti-Lag+: Part 1’s Delag hooked the frame/present path (the back edge), this would be the front edge, and you’d get a tidy “instrument both ends” story.

But I didn’t prove that link, so I won’t assert it. amdihk64 is a general-purpose hook library, and these raw-input hooks could just as plausibly belong to the Adrenalin overlay, the performance/metrics OSD, or AFMF - all of which live in this same package and all of which have their own reasons to watch input. I never found a call from the Delag gate into amdihk64, nor evidence the raw-input hook fires only under a whitelisted (DlgNxt_WListed) profile. So treat the “both edges” framing as a hypothesis, not a finding - it’s the softest joint in the series and I’d rather hand you the ambiguity than a clean story I can’t back.

Why this is the part anti-cheat hates - whoever owns it

Set aside which AMD feature these hooks belong to, because the anti-cheat doesn’t care either. GetRawInputData interception is the canonical aimbot / trigger-bot technique - it’s exactly where input is observed and could be altered. Whatever the intent (here, plausibly just reading a timestamp and calling through), a kernel anti-cheat scanning user32 for inline hooks cannot tell a benign 0xE9 from a malicious one. Same bytes, same verdict. It is, quite literally, Microsoft Detours doing exactly what it says on the tin - inside a process Valve is trying to keep pristine.

Takeaways for anyone writing a hook engine

amdihk64 is, honestly, a good implementation - worth studying as a reference:

Transactions (Begin/Attach/Commit/Abort) so a batch of hooks applies atomically and rolls back cleanly on failure.
Trampoline pools kept non-writable except during a transaction.
Length-aware prologue relocation with a denylist for non-relocatable instructions.
Jump-thunk following so you hook the real function, not an IAT stub.
Thread RIP relocation under suspension - the difference between “hooks running code” and “random crashes.”

It’s careful, production-grade engineering. It’s also why, for one October, a graphics driver and a cheat looked identical to an anti-cheat - because at the instruction level, they were.

All static analysis - Ghidra over MCP, no execution, no modified hooks. Function addresses are from amdihk64.dll build B396516 (Adrenalin 23.10.1). Companion to Part 1.