Overview
Aegis BMD is Elston Industries' flagship ballistic missile defense simulation platform. It models a complete intercept engagement chain — from initial radar detection through threat discrimination, intercept solution computation, and kinetic engagement — using a fully autonomous AI agent trained via reinforcement learning.
The system is designed as both a research platform for exploring RL-based autonomous weapons control and a demonstration of low-latency, high-confidence engagement decision-making.
Reinforcement Learning Engine
Algorithm: GRPO
The interceptor control agent is trained using Group Relative Policy Optimization (GRPO), a variant of PPO that normalizes advantages across a sampled group of trajectories rather than using a value function baseline.
| Parameter | Value |
|---|---|
| Policy Architecture | Multi-layer Perceptron (MLP) |
| Hidden Layers | 3 × 256 units |
| Activation | ReLU |
| Group Size | 16 trajectories |
| Clip Epsilon | 0.2 |
| Entropy Coefficient | 0.01 |
Reward Structure
The reward function is shaped to incentivize:
- Intercept success — large positive reward on kill
- Proximity at closest approach — continuous shaping reward
- Fuel efficiency — penalty on excessive thrust usage
- Time-to-intercept — bonus for fast engagements
Training Environment
The simulation runs at 60Hz physics with stochastic threat injection. Threats vary in:
- Launch azimuth and elevation
- Ballistic coefficient
- Terminal maneuver profiles (ballistic, boost-glide, hypersonic skip)
Threat Discrimination
The platform includes a multi-hypothesis tracker that maintains a probability distribution over threat class (ballistic, maneuvering, decoy) using a Kalman filter bank with interacting multiple model (IMM) estimation.
track_update(obs):
for each model m in {ballistic, maneuver, decoy}:
predict(m)
update_likelihood(m, obs)
mix_probabilities()
return fused_state_estimate
Discrimination confidence gates engagement authority — the interceptor will not commit until P(threat) > 0.92.
Intercept Geometry
The engagement planner solves a proportional navigation intercept problem in 3D space using a predicted intercept point (PIP) algorithm:
- Estimate threat state via tracker
- Propagate threat trajectory forward using current model
- Solve intercept geometry for interceptor launch angle and time-of-flight
- Validate flyout kinematics against interceptor energy budget
- Commit launch
The guidance law transitions from proportional navigation to augmented PN in the terminal phase to compensate for late-breaking threat maneuvers.
Performance
All figures are simulation-derived and represent median performance across 10,000 Monte Carlo engagements in the standard threat library.
- Single-shot kill probability (Pk): 99.7%
- Mean time to intercept decision: 38ms
- Worst-case intercept latency: 49ms
- False commit rate: < 0.1%
- Decoy discrimination accuracy: 97.3%
Stack
- Physics engine: C++ core (Box2D-derived, extended for 3D ballistics)
- RL training framework: Python (PyTorch), exported to ONNX for inference
- Frontend / visualization: JavaScript + WebGL
- Inference runtime: ONNX Runtime (C++ binding), < 2ms per forward pass