engine.consensus

ConsensusVector - per-byte variance analysis across multiple dumps.

Core of the ‘Elimination via Variance’ approach: bytes that are identical across all runs are structural; bytes with high variance are key candidates.

The output is a 1D variance vector (one float per byte offset), computed via Welford’s online recurrence or a chunked two-pass estimator — the implicit N×d observation matrix is never materialized.

class ConsensusVector[source]

Bases: object

Per-byte variance vector across N dumps at the same phase.

__init__()[source]

property classifications: array: Per-byte classification codes (ByteClass IntEnum values).

build(dump_paths)[source]

Compute per-byte variance across all dump files.

Applies a chunked two-pass estimator via core.variance.compute_variance; numerically stable and peak-memory-bounded by CHUNK_BYTES regardless of dump size.

Parameters:: dump_paths (List[Path])
Return type:: None

build_from_sources(sources, normalize=False)[source]

Build consensus from DumpSource objects.

Native MSL sources use ASLR-aware region alignment when normalize is True. Raw, mixed, or imported-MSL sources fall back to flat bytes.

Parameters:

sources (List)
normalize (bool)

Return type:

None

build_incremental(size)[source]

Begin an incremental consensus build of size bytes per dump.

Use add_source to fold dumps in one at a time and finalize to materialize the variance vector and classifications.

Parameters:: size (int)
Return type:: None

add_source(source)[source]

Fold one dump into an incremental build. Returns live stats.

Accepts a raw bytes buffer or any DumpSource-like object exposing read_all(). The first dump seen is cached as reference_bytes for downstream consumers. Returns (num_dumps, mean_variance, max_variance).

Return type:: Tuple[int, float, float]

get_live_variance()[source]

Return the current variance vector.

During an incremental build this reflects the Welford state at the moment of the call; after finalize() it returns the materialized vector. Public accessor so API/UI layers do not reach into the private Welford accumulator.

Return type:: numpy.ndarray

welford_state()[source]: Return (mean, m2, n) — must be called BEFORE finalize().

finalize()[source]

Materialize variance and classifications from the Welford accumulator and release the incremental state.

Return type:: None

get_static_regions(min_length=32)[source]

Find contiguous static (invariant) byte regions.

Parameters:: min_length (int)
Return type:: List[StaticRegion]

get_volatile_regions(min_length=16)[source]

Find contiguous high-variance (key_candidate) regions.

Parameters:: min_length (int)
Return type:: List[StaticRegion]

get_aligned_candidates(block_size=32, alignment=16, density_threshold=0.75, min_length=16)[source]

Find alignment-filtered KEY_CANDIDATE regions.

Like get_volatile_regions but with additional alignment filtering: only keeps candidate blocks that are dense and aligned.

Parameters:

block_size (int)
alignment (int)
density_threshold (float)
min_length (int)

Return type:

List[StaticRegion]

classification_counts()[source]

Count bytes in each classification category.

Return type:: Dict[str, int]

ConsensusMatrix: alias of ConsensusVector

class ByteClass[source]

Bases: IntEnum

Byte-position classification based on cross-run variance.

INVARIANT = 0

STRUCTURAL = 1

POINTER = 2

KEY_CANDIDATE = 3

__new__(value)