harvester

DumpIngestor - load dump files with optional metadata.

class DumpIngestor[source]

Bases: object

Load and organize dump files for analysis.

Provides structured access to the dataset tree with optional sidecar metadata enrichment.

__init__(root, keylog_filename='keylog.csv')[source]
Parameters:
  • root (Path)

  • keylog_filename (str)

scan()[source]

Perform a fast scan of the dataset tree.

Return type:

DatasetInfo

property dataset_info: DatasetInfo | None
load_library_runs(tls_version, scenario, library, max_runs=10, template=None)[source]

Load run directories for a specific library.

Parameters:
  • tls_version (str)

  • scenario (str)

  • library (str)

  • max_runs (int)

Return type:

List[RunDirectory]

load_dump_data(dump_path)[source]

Load raw dump data from a file.

Parameters:

dump_path (Path)

Return type:

bytes

get_dump_paths_for_phase(runs, phase)[source]

Collect all dump file paths for a given phase across runs.

Parameters:
Return type:

List[Path]

list_libraries(tls_version, scenario)[source]

List available libraries for a version/scenario.

Parameters:
  • tls_version (str)

  • scenario (str)

Return type:

List[str]

list_scenarios(tls_version)[source]

List available scenarios for a TLS version.

Parameters:

tls_version (str)

Return type:

List[str]

SidecarParser - parse JSON/YAML metadata sidecars.

class SidecarParser[source]

Bases: object

Parse sidecar metadata files associated with dump directories.

Supports JSON (.json) and plain-text (.meta) sidecar formats. Sidecars provide extra context: library versions, build flags, capture environment, analysis notes.

SIDECAR_EXTENSIONS = ['.json', '.meta']
static find_sidecar(run_dir)[source]

Find a sidecar metadata file in a run directory.

Parameters:

run_dir (Path)

Return type:

Path | None

static parse(sidecar_path)[source]

Parse a sidecar file and return its contents as a dict.

Parameters:

sidecar_path (Path)

Return type:

Dict[str, Any]

MetadataStore - aggregate metadata across runs using Polars.

class MetadataStore[source]

Bases: object

Aggregate and query metadata across multiple runs.

Uses Polars DataFrames for efficient filtering and aggregation of run metadata, sidecar data, and analysis results.

__init__()[source]
add_run(run, sidecar=None)[source]

Register a run directory with optional sidecar metadata.

Parameters:
Return type:

None

get_runs_for_library(library)[source]

Get all run records for a specific library.

Parameters:

library (str)

Return type:

List[Dict]

summary()[source]

Summary statistics across all registered runs.

Return type:

Dict[str, Any]

filter_by(**kwargs)[source]

Filter records by field values.

Return type:

List[Dict]