The Engine Room: AMD's Detours-Based Hook Library (amdihk64)
contents
Part 2 of the Anti-Lag+ teardown. Part 1 covered
what Anti-Lag+ patched in the game and why VAC flagged it. This one tears down a production
inline-hook engine that ships in the very same driver package: amdihk64.dll (“AMD inline-hook”).
Caveat up front. I could not prove
amdihk64is on the Anti-Lag+ code path. Part 1’s Delag detour usesamdxc64’s own embedded Detours;amdihk64is a separate, standalone hooker - a sibling engine, and an excellent specimen of how this class of trampoline hooker is built. Most of this post is that teardown, which stands on its own. Where I reach past what I proved (e.g. “is this Anti-Lag+’s input hook?”), I flag it explicitly.
In Part 1 we watched amdxc64.dll walk the game’s call stack and inline-patch a call site
with Microsoft Detours. That driver carries its own statically-linked copy of Detours. But
the package ships a second, standalone hooking component too - amdihk64.dll, ~210 KB,
exporting just Init / Terminate and internally advertising AMDSetHookE, AMDSetXHookE,
AMDRemoveHookM, AMDRemoveXHookM. Its import table is a hooker’s shopping list:
VirtualAlloc / VirtualProtect / VirtualQuery / FlushInstructionCache ; patch primitive
SuspendThread / Get/SetThreadContext / ResumeThread ; safe live patching
SetWindowsHookExA / CallNextHookEx / UnhookWindowsHookEx / EnumWindows ; injection / triggers
OpenMutexA / CreateEventA / WaitForMultipleObjects ; cross-process control
GetProcAddress / LoadLibraryA ; resolution
Detoured ; <- Microsoft Detours
That last import, plus a detoured.dll shipped alongside, is the tell: amdihk64 is Microsoft
Detours wearing an AMD hat. We’ll prove it from the bytes.
A Detours transaction, in three acts
Detours installs hooks inside a transaction: DetourTransactionBegin → DetourAttach (one or
more) → DetourTransactionCommit. amdihk64 implements exactly that shape with three internal
functions.
Act 1 - Begin (0x180001334)
if (DAT_18002dde4 == 0) { // no transaction open?
tid = GetCurrentThreadId();
InterlockedClaim(&g_ownerTid, tid); // this thread now owns the transaction
g_descriptorList = 0; g_threadList = 0; // reset pending hooks + suspended threads
g_error = 0;
for (pool = g_trampolinePools; pool; pool = pool->next)
VirtualProtect(pool, 0x10000, PAGE_EXECUTE_READWRITE, &old); // open for writing
return 0;
}
return 0x10dd; // ERROR_INVALID_OPERATION - a txn is already open
Two details worth noting. The trampolines live in pre-allocated 64 KB pools that are kept
PAGE_EXECUTE_READ (0x20) at rest and only flipped to RWX (0x40) for the duration of a
transaction - minimizing the window where AMD’s own code pages are writable. And 0x10dd is
ERROR_INVALID_OPERATION - the exact code Microsoft Detours returns for a nested transaction.
Act 2 - Attach (0x180001898): building a trampoline
This is the heart of it. Given a pointer-to-target and a hook function, it builds the trampoline. Cleaned up:
// 1. follow jump thunks to the *real* target (IAT stubs, EB->E9 chains)
if (target[0]==0xFF && target[1]==0x25) // jmp qword [rip+disp]
target = **(void***)(target + 6 + *(int32*)(target+2));
else if (target[0]==0xEB && (target += 2 + (int8)target[1], target[0]==0xE9))
target += 5 + *(int32*)(target+1);
// 2. steal whole instructions off the prologue until we have >= 5 bytes for the E9
uint stolen = 0;
while (stolen < 5) {
next = LengthDecode(target + stolen); // FUN_180002318 - instruction-length disassembler
byte op = target[stolen];
if (op==0xE9 || op==0xC3 || op==0xC2 || op==0xCC || // jmp/ret/ret n/int3 - not relocatable
(op==0xFF && target[stolen+1]==0x25)) // jmp [rip] - bail
{ error = (stolen < 5) ? 9 : 0; break; }
stolen = next - target;
}
if (stolen > 0x1a) { error = 6; goto fail; } // prologue too hairy (>26 bytes)
// 3. trampoline = [stolen original bytes][ FF 25 -> jmp back to target+stolen ]
memcpy(tramp, target, stolen);
tramp[stolen+0] = 0xFF; tramp[stolen+1] = 0x25; // jmp qword [rip+0]
*(int32*)(tramp+stolen+2) = (int32)(&tramp_ptr - (tramp+stolen+6));
fill_with_0xCC(...); // pad the slack
tramp->originalLen = stolen; // saved so Commit/Abort can restore
// 4. mark the module as "detoured" exactly once (Detours' module signature)
VirtualProtect(target, stolen, PAGE_EXECUTE_READWRITE, &old);
short *hdr = ModuleHeaderSlot(target);
if (hdr && hdr[0xe] != 0x6544 /* "De" */) { // not already stamped?
hdr[0xe]=0x6544; hdr[0xf]=0x6f74; hdr[0x10]=0x7572; hdr[0x11]=0x2173; // ASCII "Detours"-ish tag
}
// 5. queue the descriptor onto the transaction list (NOT patched yet)
desc->target = target; desc->originalProt = old; desc->tramp = tramp;
desc->next = g_descriptorList; g_descriptorList = desc;
Step 1 (follow jumps) is Detours’ DetourCodeFromPointer. Step 2 is the classic “relocate enough
whole instructions to fit a 5-byte E9” - with a denylist of instructions that can’t be
moved (a ret/jmp/int3 in the first 5 bytes means there’s no safe spot, so it gives up with
code 9). Step 4 is Detours stamping its per-module signature so it never double-marks a module.
Nothing is patched yet - Attach only stages the work.
Act 3 - Commit (0x1800014dc) and the thread-safety trick
Commit is where the prologue actually gets overwritten - and it’s the part most hand-rolled hookers get wrong:
for (desc in g_descriptorList) {
desc->target[0] = 0xE9; // jmp rel32 -> trampoline
*(int32*)(desc->target+1) = (tramp+0x30) - (target+5);
fill_remaining_prologue_with_0xCC();
}
// THE important bit: every thread was suspended at Begin/UpdateThread time.
for (thr in g_threadList) {
GetThreadContext(thr, &ctx);
if (ctx.Rip inside a patched prologue) ctx.Rip = relocate(original -> trampoline);
if (ctx.Rip inside a trampoline) ctx.Rip = relocate(trampoline -> original);
SetThreadContext(thr, &ctx); // no thread resumes mid-instruction
}
VirtualProtect(target, len, oldProt, &tmp); // relock the page
FlushInstructionCache(GetCurrentProcess(), target, len);
ResumeThread(thr); // for every suspended thread
Suspending the process’s other threads and then rewriting any RIP that was caught inside the
region being patched is what makes it safe to hook hot, running code without crashing. If the
commit fails partway, a fourth function (0x1800013c8, Abort) restores the original protections,
frees the trampolines, re-locks the pools to PAGE_EXECUTE_READ, and resumes the threads - a
clean rollback.
So what does it hook?
A worker thread (0x180005270), gated by a named mutex AMDRemoveHookM (so install/remove is
coordinated, even across processes), resolves and hooks two functions out of user32.dll:
g_pGetRawInputData = GetProcAddress(user32, "GetRawInputData");
g_pGetRawInputBuffer = GetProcAddress(user32, "GetRawInputBuffer");
SetInlineHook(&g_pGetRawInputData, hook_GetRawInputData);
SetInlineHook(&g_pGetRawInputBuffer, hook_GetRawInputBuffer);
...
// plus a thread-local Windows hook on the foreground window's thread:
tid = GetWindowThreadProcessId(GetForegroundWindow(), NULL);
g_kbHook = SetWindowsHookExA(WH_KEYBOARD /*2*/, kbProc, NULL, tid);
GetRawInputData / GetRawInputBuffer are how a game reads raw mouse and keyboard input.
Hooking them is a way to timestamp the exact moment the game samples input - the front edge of
the input-to-photon pipeline, and exactly the kind of thing a latency feature would want. It’s
tempting to draw the line straight to Anti-Lag+: Part 1’s Delag hooked the frame/present path
(the back edge), this would be the front edge, and you’d get a tidy “instrument both ends” story.
But I didn’t prove that link, so I won’t assert it. amdihk64 is a general-purpose hook
library, and these raw-input hooks could just as plausibly belong to the Adrenalin overlay, the
performance/metrics OSD, or AFMF - all of which live in this same package and all of which have
their own reasons to watch input. I never found a call from the Delag gate into amdihk64, nor
evidence the raw-input hook fires only under a whitelisted (DlgNxt_WListed) profile. So treat the
“both edges” framing as a hypothesis, not a finding - it’s the softest joint in the series and I’d
rather hand you the ambiguity than a clean story I can’t back.
Why this is the part anti-cheat hates - whoever owns it
Set aside which AMD feature these hooks belong to, because the anti-cheat doesn’t care either.
GetRawInputData interception is the canonical aimbot / trigger-bot technique - it’s exactly where
input is observed and could be altered. Whatever the intent (here, plausibly just reading a
timestamp and calling through), a kernel anti-cheat scanning user32 for inline hooks cannot tell
a benign 0xE9 from a malicious one. Same bytes, same verdict. It is, quite literally,
Microsoft Detours doing exactly what it says on the tin -
inside a process Valve is trying to keep pristine.
Takeaways for anyone writing a hook engine
amdihk64 is, honestly, a good implementation - worth studying as a reference:
- Transactions (Begin/Attach/Commit/Abort) so a batch of hooks applies atomically and rolls back cleanly on failure.
- Trampoline pools kept non-writable except during a transaction.
- Length-aware prologue relocation with a denylist for non-relocatable instructions.
- Jump-thunk following so you hook the real function, not an IAT stub.
- Thread
RIPrelocation under suspension - the difference between “hooks running code” and “random crashes.”
It’s careful, production-grade engineering. It’s also why, for one October, a graphics driver and a cheat looked identical to an anti-cheat - because at the instruction level, they were.
All static analysis - Ghidra over MCP, no execution, no modified hooks. Function addresses are
from amdihk64.dll build B396516 (Adrenalin 23.10.1). Companion to
Part 1.