Vision & Architecture Direction#

This document captures the intended goals, flows, and architectural direction for VMSifter.

What VMSifter Is#

VMSifter is a CPU instruction testing framework. It injects x86 instructions into controlled execution environments, collects execution results (VMEXIT reason, registers, performance counters), and outputs structured CSV data.

Core Goals#

Instruction discovery: Enumerate the full x86 instruction space on real hardware, identifying valid encodings, their lengths, and behavior
Differential testing: Execute the same instruction set across multiple backends (hypervisors, emulators, bare metal) and compare results to surface discrepancies — undocumented behavior, errata, emulator bugs, spec violations
Microarchitectural fingerprinting: Use performance counter deltas to characterize instruction behavior at the microarchitectural level

Two Axes, Three Phases#

VMSifter’s workloads are organized along two independent axes, producing three distinct phases.

Discovery axis: per CPU model#

The Tunnel fuzzer explores the x86 instruction space byte-by-byte, using EPT execute faults to distinguish complete from incomplete instructions. This requires real-time feedback from an execution backend — the next probe depends on the previous result.

Requires an EPT-capable backend (Xen VM forking, or KVM)
Tightly coupled fuzzer ↔ injector loop
Output: CSV of all discovered valid instruction encodings

The relevant variable is CPU model, not hypervisor. The same physical CPU produces the same instruction set regardless of whether Xen or KVM intercepts the VMEXIT — the silicon decodes the same way. Running Tunnel on Xen on an Intel Ice Lake and on KVM on the same Ice Lake discovers the same instructions; the hypervisor only affects how results are reported, not what the CPU accepts.

Therefore: run Tunnel once per CPU model (Intel SKL, Intel ICL, AMD Zen3, …), on whichever hypervisor is available. Running Tunnel on a second hypervisor on the same CPU adds no discovery value.

Emulators have their own software decoders, so in theory Tunnel could discover what each emulator considers valid — including “false positive” instructions the emulator wrongly accepts. In practice, this is not worth the cost: emulators are orders of magnitude slower than real hardware with VM forking, and CSV replay already catches the important direction (does the emulator handle all real instructions correctly?). The marginal value of discovering emulator-only bogus instructions rarely justifies the runtime.

flowchart LR T1["Tunnel + Xen (Intel ICL)"] --> C1["catalog_intel_icl.csv"] T2["Tunnel + KVM (AMD Zen3)"] --> C2["catalog_amd_zen3.csv"] T3["Tunnel + Xen (Intel SKL)"] --> C3["catalog_intel_skl.csv"] C1 & C2 & C3 --> U["master_instruction_catalog.csv"]

Testing axis: per backend#

Once a master instruction catalog exists (from Tunnel discovery, manual curation, or external sources like Intel XED’s instruction tables), the remaining work is pure replay: inject each instruction, collect results, dump CSV. The next instruction does not depend on the previous result.

Any backend works: hypervisors (Xen, KVM), emulators (QEMU, Unicorn, Simics, Bochs), decoders (XED, bddisasm, Zydis), bare metal
No fuzzer ↔ injector coupling needed — just iterate over the input CSV
Output: per-backend CSV result files
Run across as many backends as desired

This is what enables differential testing: replay the same instruction catalog on every backend and diff the outputs.

flowchart LR CAT["master_instruction_catalog.csv"] --> X["Xen (Intel ICL)"] CAT --> K["KVM (AMD Zen3)"] CAT --> Q["QEMU TCG"] CAT --> B["Bochs"] CAT --> XED["XED"] CAT --> BD["bddisasm"]

Phase 3: Analysis (external)#

Comparing CSV result files across backends, CPUs, and configurations. This is not VMSifter’s job — it should happen in external tooling better suited for data analysis (pandas, DuckDB, Jupyter notebooks, custom diffing tools, visualization).

VMSifter’s responsibility ends at producing clean, well-structured CSV output.

Scope Boundary: What Lives Inside VMSifter#

The guiding principle: VMSifter handles backends that require privileged, stateful VM lifecycle management. Everything that is a userspace function-call-per-instruction belongs in separate tooling. The CSV file format is the interface between VMSifter and external tools.

flowchart TD subgraph VMSifter ["VMSifter (privileged VM lifecycle)"] FZ["Fuzzers: Tunnel, Random, CSV, Drizzler"] IJ["Injectors: Xen, KVM"] end VMSifter -->|"CSV files"| D["Decoders (XED, bddisasm, Zydis)"] VMSifter -->|"CSV files"| EM["Emulators (Unicorn, QEMU, Bochs, Simics)"] VMSifter -->|"CSV files"| AN["Analysis (pandas, DuckDB, Jupyter)"]

Inside VMSifter: Hypervisor Injectors#

Hypervisor-based execution requires privileged, stateful VM lifecycle management — exactly what VMSifter’s scaffolding exists to handle: VM creation/forking, CPU pinning, VMEXIT interception, register canary management, EPT/NPT permission control, performance counter MSR access, and cleanup.

Backend	Status	EPT/NPT	VM Forking	Notes
Xen	Current	Yes (EPT)	Yes (`xc_memshr_fork`)	Primary backend, requires custom Xen patches
KVM	Planned	Yes (EPT + NPT)	See below	Mainline Linux, no patches needed

Why KVM belongs inside VMSifter:

Same class of problem as Xen: privileged VM lifecycle, VMEXIT interception, register/MSR management
Shares VMSifter infrastructure: PCPU allocation, CPU pinning, worker parallelism, CSV output
Primary motivation: enables Tunnel discovery on AMD CPUs (NPT), which Xen’s current C injector doesn’t support
Secondary benefit: no custom hypervisor patches — ships with mainline Linux, simpler deployment

KVM implementation challenges:

The current C injector is deeply Xen-specific (xc_memshr_fork(), Xen domctl for MSRs, XTF for guest setup). A KVM injector would need:

VM creation via ioctl on /dev/kvm instead of Xen toolstack
Guest setup without XTF (minimal flat binary or direct register/memory setup via KVM API)
State reset mechanism — this is the key challenge:

Approach	Speed	Complexity	Notes
Dirty page tracking + register reset	Fast	Moderate	`KVM_GET_DIRTY_LOG` to find modified pages, restore only those. Approximate fork-reset.
Full snapshot/restore	Slow	Simple	Save all state, restore after each instruction. Correct but expensive.
KVM + userfaultfd	Fast	High	COW-like behavior via userspace fault handling. Closest to Xen fork semantics.

Hyper-V: Low priority. Windows-only, different ecosystem. Could be a future injector if there’s demand.

Outside VMSifter: Disassemblers / Decoders#

Disassemblers are pure functions: bytes → metadata. They don’t execute instructions, don’t need VMs, sockets, or privileges. A standalone Python script iterating over CSV rows processes millions of instructions per second. VMSifter’s scaffolding would be pure overhead.

These belong in a separate tool that reads VMSifter’s instruction CSV and produces decoder result CSVs.

Backend	Source	Output
XED (Intel)	Intel’s reference x86 encoder/decoder	Validity, instruction length, operand registers (read/written), EFLAGS modified, category, ISA set, operand width
bddisasm (Bitdefender)	Independent x86 decoder	Validity, length, operand details, CPUID feature flags required
Zydis (Zyantific)	Fast open-source decoder	Validity, length, operand details, accessed flags

XED is particularly interesting because it is Intel’s own reference decoder. When VMSifter executes an instruction on real Intel hardware and XED decodes the same bytes, any discrepancy is a strong signal:

Length mismatch: CPU decoded N bytes, XED says M → potential undocumented encoding or decoder bug
Register side effects: CPU modified registers that XED doesn’t list as outputs (or vice versa) → undocumented behavior or errata
EFLAGS delta: CPU modified flags that XED says shouldn’t be affected → microarchitectural leak or spec gap
Validity disagreement: CPU executes bytes that XED calls invalid (or CPU faults on bytes XED calls valid) → undocumented instruction or decoder gap

Outside VMSifter: Emulators#

Emulators are userspace-only and don’t benefit from VMSifter’s scaffolding. For CSV replay (the primary mode for emulators), the per-instruction loop is trivial — a thin wrapper around each emulator’s API:

# Unicorn example — the entire core loop
uc = Uc(UC_ARCH_X86, UC_MODE_64)
uc.mem_map(CODE_ADDR, 0x1000)
for insn_bytes in csv_reader:
    uc.mem_write(CODE_ADDR, insn_bytes)
    set_canary_registers(uc)
    try:
        uc.emu_start(CODE_ADDR, CODE_ADDR + len(insn_bytes), count=1)
    except UcError as e:
        fault_type = e.errno
    regs = read_all_registers(uc)
    csv_writer.write(insn_bytes, fault_type, reg_delta(regs))

What VMSifter would add: ProcessPoolExecutor parallelism — but a simple multiprocessing.Pool does the same without needing CPU pinning. What VMSifter would impose: Unix socket protocol, C subprocess, Xen-centric configuration — all overhead for something that’s a Python function call.

These belong in a separate tool that reads VMSifter’s instruction CSV and produces emulator result CSVs.

Backend	Fidelity	Speed	Notes
QEMU (TCG)	Instruction-level	Moderate	Full system emulation, well-understood gaps vs real HW
Unicorn	Instruction-level	Fast	Lightweight, API-friendly, based on QEMU TCG
Bochs	Cycle-approximate	Slow	Most faithful x86 emulator, good reference
Simics	Cycle-accurate	Slow	Commercial, high fidelity, scriptable

Outside VMSifter: Bare Metal / Ring 0#

Backend	Notes
Ring 0 (Linux kernel module)	Original sandsifter approach, no hypervisor needed
SMM	System Management Mode execution

These are execution backends but don’t involve VM lifecycle management. Whether they belong inside or outside VMSifter depends on how much shared infrastructure they’d use. This is an open question for later.

Outside VMSifter: Analysis#

Comparing CSV result files across backends, CPUs, and configurations. This is explicitly not VMSifter’s job. It should happen in external tooling better suited for data analysis (pandas, DuckDB, Jupyter notebooks, custom diffing tools, visualization).

CSV as the Universal Interface#

All backends — whether inside or outside VMSifter — produce and consume CSV files. The CSV schema must:

Handle both execution backends and decoder backends (optional columns)
Include backend metadata (which backend, which CPU model, which configuration)
Support performance counters (optional, only real hardware)
Support decoder-specific fields: operand details, EFLAGS, ISA set (optional, only decoders)
Be stable across VMSifter versions to allow cross-version comparison

Fuzzer × Injector Matrix (within VMSifter)#

Fuzzer	Requires Real-Time Feedback	Compatible Injectors
Tunnel	Yes (EPT/NPT faults)	Xen, KVM
Random	No	Xen, KVM
CSV (replay)	No	Xen, KVM
Drizzler	No (but needs memory region setup)	Xen, KVM

All fuzzer × injector combinations within VMSifter are valid. The Tunnel fuzzer requires EPT/NPT feedback, which all VMSifter injectors provide by definition (they’re hypervisors on real hardware).

End-to-End Differential Testing Flow#

flowchart LR subgraph Discovery ["Discovery (VMSifter)"] T1["Tunnel + Xen (Intel ICL)"] T2["Tunnel + KVM (AMD Zen3)"] T3["Tunnel + Xen (Intel SKL)"] end T1 & T2 & T3 -->|union| CAT["catalog.csv"] subgraph Testing CAT --> X["Xen (ICL)"] CAT --> K["KVM (Zen3)"] CAT --> Q["QEMU TCG"] CAT --> B["Bochs"] CAT --> XED["XED"] CAT --> BD["bddisasm"] end X & K & Q & B & XED & BD --> DIFF["External diff / analysis"]

The decoder path is especially powerful: XED can process the full instruction catalog in seconds and provide Intel’s reference answer for length, operands, flags, and validity — a fast, authoritative baseline to diff all execution results against.

Architectural Decisions#

VMSifter’s scope: fuzzers + hypervisor injectors + CSV#

VMSifter keeps instruction generation (fuzzers) and hypervisor-based injection. It does not grow to include emulator backends, disassembler backends, or analysis tooling. Its output boundary is CSV files.

Fuzzers are instruction sources (Tunnel for discovery, Random/CSV/Drizzler for testing)
Injectors are hypervisor backends requiring privileged VM lifecycle management (Xen, KVM)
CSV is the interface to everything else

Tunnel stays coupled to hypervisor injectors#

Tunnel’s algorithm depends on EPT/NPT execute faults for real-time feedback. This cannot be replaced by CSV replay. However, Tunnel only needs to run once per CPU model — the instruction set is a property of the silicon, not the hypervisor.

The only reason to add a hypervisor injector is to reach CPU models the existing injectors can’t (e.g., KVM for AMD with NPT). Adding emulator or decoder backends does not require Tunnel support — only CSV replay, which lives outside VMSifter.

The C injector needs refactoring for multi-hypervisor support#

The current C injector (src/main.c) is deeply Xen-specific. To support KVM, the injector layer needs to be split:

Common: instruction injection logic, register canary management, result message format, socket protocol
Xen-specific: xc_memshr_fork(), Xen domctl MSR access, XTF guest setup
KVM-specific: ioctl-based VM creation, KVM_GET/SET_REGS, dirty page tracking for state reset

This refactoring is a prerequisite for KVM support and would also clean up the current codebase.