Vision & Architecture Direction#
This document captures the intended goals, flows, and architectural direction for VMSifter.
What VMSifter Is#
VMSifter is a CPU instruction testing framework. It injects x86 instructions into controlled execution environments, collects execution results (VMEXIT reason, registers, performance counters), and outputs structured CSV data.
Core Goals#
Instruction discovery: Enumerate the full x86 instruction space on real hardware, identifying valid encodings, their lengths, and behavior
Differential testing: Execute the same instruction set across multiple backends (hypervisors, emulators, bare metal) and compare results to surface discrepancies — undocumented behavior, errata, emulator bugs, spec violations
Microarchitectural fingerprinting: Use performance counter deltas to characterize instruction behavior at the microarchitectural level
Two Axes, Three Phases#
VMSifter’s workloads are organized along two independent axes, producing three distinct phases.
Discovery axis: per CPU model#
The Tunnel fuzzer explores the x86 instruction space byte-by-byte, using EPT execute faults to distinguish complete from incomplete instructions. This requires real-time feedback from an execution backend — the next probe depends on the previous result.
Requires an EPT-capable backend (Xen VM forking, or KVM)
Tightly coupled fuzzer ↔ injector loop
Output: CSV of all discovered valid instruction encodings
The relevant variable is CPU model, not hypervisor. The same physical CPU produces the same instruction set regardless of whether Xen or KVM intercepts the VMEXIT — the silicon decodes the same way. Running Tunnel on Xen on an Intel Ice Lake and on KVM on the same Ice Lake discovers the same instructions; the hypervisor only affects how results are reported, not what the CPU accepts.
Therefore: run Tunnel once per CPU model (Intel SKL, Intel ICL, AMD Zen3, …), on whichever hypervisor is available. Running Tunnel on a second hypervisor on the same CPU adds no discovery value.
Emulators have their own software decoders, so in theory Tunnel could discover what each emulator considers valid — including “false positive” instructions the emulator wrongly accepts. In practice, this is not worth the cost: emulators are orders of magnitude slower than real hardware with VM forking, and CSV replay already catches the important direction (does the emulator handle all real instructions correctly?). The marginal value of discovering emulator-only bogus instructions rarely justifies the runtime.
Testing axis: per backend#
Once a master instruction catalog exists (from Tunnel discovery, manual curation, or external sources like Intel XED’s instruction tables), the remaining work is pure replay: inject each instruction, collect results, dump CSV. The next instruction does not depend on the previous result.
Any backend works: hypervisors (Xen, KVM), emulators (QEMU, Unicorn, Simics, Bochs), decoders (XED, bddisasm, Zydis), bare metal
No fuzzer ↔ injector coupling needed — just iterate over the input CSV
Output: per-backend CSV result files
Run across as many backends as desired
This is what enables differential testing: replay the same instruction catalog on every backend and diff the outputs.
Phase 3: Analysis (external)#
Comparing CSV result files across backends, CPUs, and configurations. This is not VMSifter’s job — it should happen in external tooling better suited for data analysis (pandas, DuckDB, Jupyter notebooks, custom diffing tools, visualization).
VMSifter’s responsibility ends at producing clean, well-structured CSV output.
Scope Boundary: What Lives Inside VMSifter#
The guiding principle: VMSifter handles backends that require privileged, stateful VM lifecycle management. Everything that is a userspace function-call-per-instruction belongs in separate tooling. The CSV file format is the interface between VMSifter and external tools.
Inside VMSifter: Hypervisor Injectors#
Hypervisor-based execution requires privileged, stateful VM lifecycle management — exactly what VMSifter’s scaffolding exists to handle: VM creation/forking, CPU pinning, VMEXIT interception, register canary management, EPT/NPT permission control, performance counter MSR access, and cleanup.
Backend |
Status |
EPT/NPT |
VM Forking |
Notes |
|---|---|---|---|---|
Xen |
Current |
Yes (EPT) |
Yes ( |
Primary backend, requires custom Xen patches |
KVM |
Planned |
Yes (EPT + NPT) |
See below |
Mainline Linux, no patches needed |
Why KVM belongs inside VMSifter:
Same class of problem as Xen: privileged VM lifecycle, VMEXIT interception, register/MSR management
Shares VMSifter infrastructure: PCPU allocation, CPU pinning, worker parallelism, CSV output
Primary motivation: enables Tunnel discovery on AMD CPUs (NPT), which Xen’s current C injector doesn’t support
Secondary benefit: no custom hypervisor patches — ships with mainline Linux, simpler deployment
KVM implementation challenges:
The current C injector is deeply Xen-specific (xc_memshr_fork(), Xen domctl for MSRs, XTF for guest setup). A KVM injector would need:
VM creation via
ioctlon/dev/kvminstead of Xen toolstackGuest setup without XTF (minimal flat binary or direct register/memory setup via KVM API)
State reset mechanism — this is the key challenge:
Approach |
Speed |
Complexity |
Notes |
|---|---|---|---|
Dirty page tracking + register reset |
Fast |
Moderate |
|
Full snapshot/restore |
Slow |
Simple |
Save all state, restore after each instruction. Correct but expensive. |
KVM + userfaultfd |
Fast |
High |
COW-like behavior via userspace fault handling. Closest to Xen fork semantics. |
Hyper-V: Low priority. Windows-only, different ecosystem. Could be a future injector if there’s demand.
Outside VMSifter: Disassemblers / Decoders#
Disassemblers are pure functions: bytes → metadata. They don’t execute instructions, don’t need VMs, sockets, or privileges. A standalone Python script iterating over CSV rows processes millions of instructions per second. VMSifter’s scaffolding would be pure overhead.
These belong in a separate tool that reads VMSifter’s instruction CSV and produces decoder result CSVs.
Backend |
Source |
Output |
|---|---|---|
XED (Intel) |
Intel’s reference x86 encoder/decoder |
Validity, instruction length, operand registers (read/written), EFLAGS modified, category, ISA set, operand width |
bddisasm (Bitdefender) |
Independent x86 decoder |
Validity, length, operand details, CPUID feature flags required |
Zydis (Zyantific) |
Fast open-source decoder |
Validity, length, operand details, accessed flags |
XED is particularly interesting because it is Intel’s own reference decoder. When VMSifter executes an instruction on real Intel hardware and XED decodes the same bytes, any discrepancy is a strong signal:
Length mismatch: CPU decoded N bytes, XED says M → potential undocumented encoding or decoder bug
Register side effects: CPU modified registers that XED doesn’t list as outputs (or vice versa) → undocumented behavior or errata
EFLAGS delta: CPU modified flags that XED says shouldn’t be affected → microarchitectural leak or spec gap
Validity disagreement: CPU executes bytes that XED calls invalid (or CPU faults on bytes XED calls valid) → undocumented instruction or decoder gap
Outside VMSifter: Emulators#
Emulators are userspace-only and don’t benefit from VMSifter’s scaffolding. For CSV replay (the primary mode for emulators), the per-instruction loop is trivial — a thin wrapper around each emulator’s API:
# Unicorn example — the entire core loop
uc = Uc(UC_ARCH_X86, UC_MODE_64)
uc.mem_map(CODE_ADDR, 0x1000)
for insn_bytes in csv_reader:
uc.mem_write(CODE_ADDR, insn_bytes)
set_canary_registers(uc)
try:
uc.emu_start(CODE_ADDR, CODE_ADDR + len(insn_bytes), count=1)
except UcError as e:
fault_type = e.errno
regs = read_all_registers(uc)
csv_writer.write(insn_bytes, fault_type, reg_delta(regs))
What VMSifter would add: ProcessPoolExecutor parallelism — but a simple multiprocessing.Pool does the same without needing CPU pinning. What VMSifter would impose: Unix socket protocol, C subprocess, Xen-centric configuration — all overhead for something that’s a Python function call.
These belong in a separate tool that reads VMSifter’s instruction CSV and produces emulator result CSVs.
Backend |
Fidelity |
Speed |
Notes |
|---|---|---|---|
QEMU (TCG) |
Instruction-level |
Moderate |
Full system emulation, well-understood gaps vs real HW |
Unicorn |
Instruction-level |
Fast |
Lightweight, API-friendly, based on QEMU TCG |
Bochs |
Cycle-approximate |
Slow |
Most faithful x86 emulator, good reference |
Simics |
Cycle-accurate |
Slow |
Commercial, high fidelity, scriptable |
Outside VMSifter: Bare Metal / Ring 0#
Backend |
Notes |
|---|---|
Ring 0 (Linux kernel module) |
Original sandsifter approach, no hypervisor needed |
SMM |
System Management Mode execution |
These are execution backends but don’t involve VM lifecycle management. Whether they belong inside or outside VMSifter depends on how much shared infrastructure they’d use. This is an open question for later.
Outside VMSifter: Analysis#
Comparing CSV result files across backends, CPUs, and configurations. This is explicitly not VMSifter’s job. It should happen in external tooling better suited for data analysis (pandas, DuckDB, Jupyter notebooks, custom diffing tools, visualization).
CSV as the Universal Interface#
All backends — whether inside or outside VMSifter — produce and consume CSV files. The CSV schema must:
Handle both execution backends and decoder backends (optional columns)
Include backend metadata (which backend, which CPU model, which configuration)
Support performance counters (optional, only real hardware)
Support decoder-specific fields: operand details, EFLAGS, ISA set (optional, only decoders)
Be stable across VMSifter versions to allow cross-version comparison
Fuzzer × Injector Matrix (within VMSifter)#
Fuzzer |
Requires Real-Time Feedback |
Compatible Injectors |
|---|---|---|
Tunnel |
Yes (EPT/NPT faults) |
Xen, KVM |
Random |
No |
Xen, KVM |
CSV (replay) |
No |
Xen, KVM |
Drizzler |
No (but needs memory region setup) |
Xen, KVM |
All fuzzer × injector combinations within VMSifter are valid. The Tunnel fuzzer requires EPT/NPT feedback, which all VMSifter injectors provide by definition (they’re hypervisors on real hardware).
End-to-End Differential Testing Flow#
The decoder path is especially powerful: XED can process the full instruction catalog in seconds and provide Intel’s reference answer for length, operands, flags, and validity — a fast, authoritative baseline to diff all execution results against.
Architectural Decisions#
VMSifter’s scope: fuzzers + hypervisor injectors + CSV#
VMSifter keeps instruction generation (fuzzers) and hypervisor-based injection. It does not grow to include emulator backends, disassembler backends, or analysis tooling. Its output boundary is CSV files.
Fuzzers are instruction sources (Tunnel for discovery, Random/CSV/Drizzler for testing)
Injectors are hypervisor backends requiring privileged VM lifecycle management (Xen, KVM)
CSV is the interface to everything else
Tunnel stays coupled to hypervisor injectors#
Tunnel’s algorithm depends on EPT/NPT execute faults for real-time feedback. This cannot be replaced by CSV replay. However, Tunnel only needs to run once per CPU model — the instruction set is a property of the silicon, not the hypervisor.
The only reason to add a hypervisor injector is to reach CPU models the existing injectors can’t (e.g., KVM for AMD with NPT). Adding emulator or decoder backends does not require Tunnel support — only CSV replay, which lives outside VMSifter.
The C injector needs refactoring for multi-hypervisor support#
The current C injector (src/main.c) is deeply Xen-specific. To support KVM, the injector layer needs to be split:
Common: instruction injection logic, register canary management, result message format, socket protocol
Xen-specific:
xc_memshr_fork(), Xen domctl MSR access, XTF guest setupKVM-specific:
ioctl-based VM creation,KVM_GET/SET_REGS, dirty page tracking for state reset
This refactoring is a prerequisite for KVM support and would also clean up the current codebase.