Designing Aurelia for AI-Native Systems

When we started building Deepcomet AI, we quickly realized that existing programming languages weren’t designed for the AI-native era. They treat tensors as library constructs, automatic differentiation as an afterthought, and hardware acceleration as an opaque optimization.

We needed something different. So we built Aurelia.

The Problem with Status Quo

In Python, tensors are PyTorch or TensorFlow objects. In C++, you might use Eigen or custom CUDA kernels. The language has no awareness of what a tensor is — it’s just another library.

This leads to several problems:

No compile-time shape checking — Runtime errors when matrix dimensions don’t match
Opaque performance — The compiler can’t optimize across library boundaries
Difficult hardware targeting — Writing GPU kernels requires learning entirely new programming models
Ad-hoc differentiation — autograd works, but it’s a layer on top, not part of the language

First-Class Tensors

In Aurelia, tensors are primitive types, just like int or float:

// Tensor is a native type with shape information
let tensor = Tensor<f32>[3, 3]
let image = Tensor<u8>[224, 224, 3]

// Shape mismatch is a compile-time error
let a = Tensor<f32>[3, 4]
let b = Tensor<f32>[4, 5]
let c = a @ b  // OK: result is Tensor<f32>[3, 5]
let d = a @ a  // Compile error: incompatible shapes

The type system tracks tensor shapes statically, catching dimension mismatches before your program ever runs.

Automatic Differentiation

Differentiation is built into the language, not bolted on:

// Define a differentiable function
fn loss_fn(model: Model, data: Batch) -> Tensor<f32> {
    let prediction = model.forward(data.input)
    CrossEntropy(prediction, data.label)
}

// Compute gradients automatically
let gradients = autodiff(loss_fn) { model, data =>
    loss_fn(model, data).backward()
}

// Apply gradients
optimizer.step(gradients)

No tape tracking, no requires_grad flags, no manual backward passes. The compiler handles everything.

Memory Management

Aurelia uses a hybrid ownership model:

Zero-cost borrow checking for CPU memory
Region-based allocation for predictable GPU memory patterns
NPU-aware memory mapping for direct neural processing unit access

// Ownership transfer to GPU
let gpu_tensor = tensor.to_device(Device.GPU)

// Region-based allocation for training loops
region TrainingMemory {
    let batch = load_batch()
    let loss = model.forward(batch)
    optimizer.step(loss.backward())
} // All temporary allocations freed here

Compilation Targets

Aurelia compiles to multiple backends from a single source:

Target	Backend	Use Case
CPU	LLVM	General computation
GPU	CUDA/ROCm	Parallel tensor ops
NPU	Custom bytecode	Neural inference
Web	WebAssembly	Browser deployment

What’s Next?

We’re currently working on:

The Forge — A transpiler that converts Python/PyTorch code to Aurelia
Interactive Playground — Browser-based Aurelia editor with instant feedback
VS Code Extension — Syntax highlighting, type checking, and debugging

Aurelia is still early, but we believe it’s the right foundation for AI-native systems programming.

Interested in Aurelia? Read the full language documentation or contribute on GitHub.

Nehal Aditya Follow Founder & CEO at Deepcomet AI. Building autonomous systems and AI-native computing platforms.

Search Blog Posts