Search Blog Posts

Designing Aurelia for AI-Native Systems

How we built a systems programming language with first-class tensor primitives and automatic differentiation.

Designing Aurelia for AI-Native Systems

When we started building Deepcomet AI, we quickly realized that existing programming languages weren’t designed for the AI-native era. They treat tensors as library constructs, automatic differentiation as an afterthought, and hardware acceleration as an opaque optimization.

We needed something different. So we built Aurelia.

The Problem with Status Quo

In Python, tensors are PyTorch or TensorFlow objects. In C++, you might use Eigen or custom CUDA kernels. The language has no awareness of what a tensor is — it’s just another library.

This leads to several problems:

  1. No compile-time shape checking — Runtime errors when matrix dimensions don’t match
  2. Opaque performance — The compiler can’t optimize across library boundaries
  3. Difficult hardware targeting — Writing GPU kernels requires learning entirely new programming models
  4. Ad-hoc differentiationautograd works, but it’s a layer on top, not part of the language

First-Class Tensors

In Aurelia, tensors are primitive types, just like int or float:

// Tensor is a native type with shape information
let tensor = Tensor<f32>[3, 3]
let image = Tensor<u8>[224, 224, 3]

// Shape mismatch is a compile-time error
let a = Tensor<f32>[3, 4]
let b = Tensor<f32>[4, 5]
let c = a @ b  // OK: result is Tensor<f32>[3, 5]
let d = a @ a  // Compile error: incompatible shapes

The type system tracks tensor shapes statically, catching dimension mismatches before your program ever runs.

Automatic Differentiation

Differentiation is built into the language, not bolted on:

// Define a differentiable function
fn loss_fn(model: Model, data: Batch) -> Tensor<f32> {
    let prediction = model.forward(data.input)
    CrossEntropy(prediction, data.label)
}

// Compute gradients automatically
let gradients = autodiff(loss_fn) { model, data =>
    loss_fn(model, data).backward()
}

// Apply gradients
optimizer.step(gradients)

No tape tracking, no requires_grad flags, no manual backward passes. The compiler handles everything.

Memory Management

Aurelia uses a hybrid ownership model:

  • Zero-cost borrow checking for CPU memory
  • Region-based allocation for predictable GPU memory patterns
  • NPU-aware memory mapping for direct neural processing unit access
// Ownership transfer to GPU
let gpu_tensor = tensor.to_device(Device.GPU)

// Region-based allocation for training loops
region TrainingMemory {
    let batch = load_batch()
    let loss = model.forward(batch)
    optimizer.step(loss.backward())
} // All temporary allocations freed here

Compilation Targets

Aurelia compiles to multiple backends from a single source:

Target Backend Use Case
CPU LLVM General computation
GPU CUDA/ROCm Parallel tensor ops
NPU Custom bytecode Neural inference
Web WebAssembly Browser deployment

What’s Next?

We’re currently working on:

  • The Forge — A transpiler that converts Python/PyTorch code to Aurelia
  • Interactive Playground — Browser-based Aurelia editor with instant feedback
  • VS Code Extension — Syntax highlighting, type checking, and debugging

Aurelia is still early, but we believe it’s the right foundation for AI-native systems programming.


Interested in Aurelia? Read the full language documentation or contribute on GitHub.

Nehal Aditya
Nehal Aditya Founder & CEO at Deepcomet AI. Building autonomous systems and AI-native computing platforms.