Training¶
Training in HyperBench is orchestrated via MultiModelTrainer (Lightning under the hood).
This page outlines the typical training pipeline. For a complete runnable script, see: - examples/gcn.py - examples/early_stopping.py
Typical pipeline¶
- Load a dataset (built-in or from HIF).
- Split it (train/val/test).
- Add negative samples.
- Enrich node features.
- Create dataloaders.
- Configure one or more models.
- Train and evaluate.
Minimal end-to-end skeleton¶
from hyperbench.data import AlgebraDataset, DataLoader, SamplingStrategy
from hyperbench.nn import LaplacianPositionalEncodingEnricher
from hyperbench.train import MultiModelTrainer, RandomNegativeSampler
from hyperbench.types import ModelConfig
from hyperbench.hlp import MLPHlpModule
dataset = AlgebraDataset(sampling_strategy=SamplingStrategy.HYPEREDGE)
train_ds, test_ds = dataset.split(
ratios=[0.8, 0.2],
shuffle=True,
seed=42,
node_space_setting="transductive",
)
train_ds, val_ds = train_ds.split(
ratios=[0.875, 0.125],
shuffle=True,
seed=42,
node_space_setting="transductive",
)
# Add negatives (example strategy; tune per use-case)
neg = RandomNegativeSampler(
num_negative_samples=train_ds.hdata.num_hyperedges,
num_nodes_per_sample=int(train_ds.stats()["avg_degree_hyperedge"]),
)
train_ds = train_ds.add_negative_samples(neg, seed=42)
val_ds = val_ds.add_negative_samples(neg, seed=42)
test_ds = test_ds.add_negative_samples(neg, seed=42)
# Enrich node features
train_ds.enrich_node_features(
enricher=LaplacianPositionalEncodingEnricher(
num_features=32,
num_nodes=train_ds.hdata.num_nodes,
),
enrichment_mode="replace",
)
val_ds.enrich_node_features_from(train_ds)
test_ds.enrich_node_features_from(train_ds)
# Dataloaders
train_loader = DataLoader(train_ds, batch_size=128, shuffle=False)
val_loader = DataLoader(val_ds, sample_full_hypergraph=True, shuffle=False)
test_loader = DataLoader(test_ds, sample_full_hypergraph=True, shuffle=False)
# Model(s)
model = MLPHlpModule(
encoder_config={
"in_channels": 32,
"out_channels": 32,
"hidden_channels": 64,
"num_layers": 3,
"drop_rate": 0.3,
},
aggregation="mean",
)
configs = [ModelConfig(name="mlp", version="mean", model=model)]
with MultiModelTrainer(
model_configs=configs,
max_epochs=50,
accelerator="auto",
enable_checkpointing=False,
) as trainer:
trainer.fit_all(train_dataloader=train_loader, val_dataloader=val_loader)
trainer.test_all(dataloader=test_loader)
Next steps¶
- Comparing multiple models consistently: Benchmarking.
- Outputs and logging: Loggers.
- Visualizing runs: TensorBoard.