When we started building Deepcomet AI, we quickly realized that existing programming languages weren’t designed for the AI-native era. They treat tensors as library constructs, automatic differentiation as an afterthought, and hardware acceleration as an opaque optimization.
We needed something different. So we built Aurelia.
The Problem with Status Quo
In Python, tensors are PyTorch or TensorFlow objects. In C++, you might use Eigen or custom CUDA kernels. The language has no awareness of what a tensor is — it’s just another library.
This leads to several problems:
- No compile-time shape checking — Runtime errors when matrix dimensions don’t match
- Opaque performance — The compiler can’t optimize across library boundaries
- Difficult hardware targeting — Writing GPU kernels requires learning entirely new programming models
- Ad-hoc differentiation —
autogradworks, but it’s a layer on top, not part of the language
First-Class Tensors
In Aurelia, tensors are primitive types, just like int or float:
// Tensor is a native type with shape information
let tensor = Tensor<f32>[3, 3]
let image = Tensor<u8>[224, 224, 3]
// Shape mismatch is a compile-time error
let a = Tensor<f32>[3, 4]
let b = Tensor<f32>[4, 5]
let c = a @ b // OK: result is Tensor<f32>[3, 5]
let d = a @ a // Compile error: incompatible shapes
The type system tracks tensor shapes statically, catching dimension mismatches before your program ever runs.
Automatic Differentiation
Differentiation is built into the language, not bolted on:
// Define a differentiable function
fn loss_fn(model: Model, data: Batch) -> Tensor<f32> {
let prediction = model.forward(data.input)
CrossEntropy(prediction, data.label)
}
// Compute gradients automatically
let gradients = autodiff(loss_fn) { model, data =>
loss_fn(model, data).backward()
}
// Apply gradients
optimizer.step(gradients)
No tape tracking, no requires_grad flags, no manual backward passes. The compiler handles everything.
Memory Management
Aurelia uses a hybrid ownership model:
- Zero-cost borrow checking for CPU memory
- Region-based allocation for predictable GPU memory patterns
- NPU-aware memory mapping for direct neural processing unit access
// Ownership transfer to GPU
let gpu_tensor = tensor.to_device(Device.GPU)
// Region-based allocation for training loops
region TrainingMemory {
let batch = load_batch()
let loss = model.forward(batch)
optimizer.step(loss.backward())
} // All temporary allocations freed here
Compilation Targets
Aurelia compiles to multiple backends from a single source:
| Target | Backend | Use Case |
|---|---|---|
| CPU | LLVM | General computation |
| GPU | CUDA/ROCm | Parallel tensor ops |
| NPU | Custom bytecode | Neural inference |
| Web | WebAssembly | Browser deployment |
What’s Next?
We’re currently working on:
- The Forge — A transpiler that converts Python/PyTorch code to Aurelia
- Interactive Playground — Browser-based Aurelia editor with instant feedback
- VS Code Extension — Syntax highlighting, type checking, and debugging
Aurelia is still early, but we believe it’s the right foundation for AI-native systems programming.
Interested in Aurelia? Read the full language documentation or contribute on GitHub.