SparseFlow™

High-performance compiler and runtime infrastructure for sparse and structured GPU compute

SparseFlow is a systems-level compiler and runtime platform developed by Maple Silicon, focused on executing structured sparsity efficiently across modern GPU architectures.

The platform is built around correctness-first transformations, memory-aware execution, and reproducible performance — enabling sparse compute to operate close to hardware limits rather than relying on heuristic optimizations.

Recent work demonstrates sustained, high-efficiency GPU kernel execution on Ampere Tensor Cores through low-level optimization. See benchmarks

ACTIVE DEVELOPMENT

← Back to Home View on GitHub Benchmark Results

📌 Proof at a glance

Sustained kernel performance 82.93 TFLOPS (RTX 4090 / Ada Tensor Cores)

Method PTX-level tuning + memory hierarchy optimization

Reproducibility Open-source code + benchmark command

Current Status

Active Systems Development & Validation

🔬
Systems Development & Validation

SparseFlow is in active systems development, with a focus on validating correctness, performance, and architectural soundness across sparse and structured compute workloads.

Recent milestones include sustained high-efficiency GPU kernel execution, reproducible performance measurement, and end-to-end compiler-to-runtime validation.

Designing structured sparsity-aware compiler passes with correctness guarantees
Validating GPU execution paths through low-level, memory-aware kernel design
Measuring performance using reproducible, architecture-grounded benchmarks
Building a foundation for automating performance-critical decisions in sparse compute pipelines

The Problem

⚠️
Fragmented Sparsity Support

Structured sparsity is increasingly supported by modern hardware, but its practical adoption remains fragmented across frameworks, compilers, and runtimes.

As a result, sparse execution often relies on fragile, hardware-specific implementations that are difficult to verify, reproduce, or maintain at scale.

Structurally sparse models are frequently executed as dense workloads, leaving potential efficiency gains unrealized.

Why SparseFlow

SparseFlow approaches sparsity as a systems problem, not a collection of isolated optimizations.

By integrating compiler transformations, runtime execution, and performance validation into a single pipeline, SparseFlow aims to make sparse and structured compute predictable, reproducible, and scalable across hardware generations.

The SparseFlow Approach

🎯
Compiler-First Philosophy

SparseFlow treats structured sparsity as a first-class compiler concern.

The platform is designed around the following principles:

📐

Explicit Representation

Structured N:M sparsity patterns are represented explicitly in the compiler IR, making sparsity intent visible throughout the compilation pipeline.

✓

Constraint Verification

All required constraints are verified before any sparse transformation, ensuring optimizations are only applied when safety can be guaranteed.

🔄

Controlled Rewriting

Dense operations are rewritten to sparse equivalents only when verification passes, maintaining correctness as the primary goal.

🛡️

Guaranteed Fallback

A fallback path to dense execution is always available when verification fails, ensuring no correctness compromises.

This approach prioritizes correctness and transparency, making sparsity behavior predictable, auditable, and easier to reason about across different targets.

Technical Scope

🔧
Current Capabilities

The current technical scope of SparseFlow includes:

MLIR-based compiler infrastructure for sparsity propagation and verification
Initial support for structured N:M sparsity patterns (starting with 2:4)
CPU and GPU validation paths, with reproducible correctness testing
Deterministic test cases and baseline comparisons
A modular design intended to support future GPU and accelerator backends

SparseFlow is not a hardware-specific solution. The goal is to provide a portable, compiler-level foundation that can integrate with multiple runtime and backend environments.

📋
Current Limits

SparseFlow is an active systems effort. At this stage, the following areas are under development:

Full production deployment at scale
Performance guarantees across all workloads
Complete framework integration
Full GPU backend coverage across all operators

These areas are being approached incrementally, following correctness validation and reproducible benchmarking.

Next Steps

🗺️
Development Roadmap

The next phase of SparseFlow development focuses on validation and expansion:

Expanding correctness testing across a broader set of models and shapes
Establishing reproducible benchmarking against dense baselines
Validating sparse lowering behavior on GPU hardware using controlled test environments
Extending support to additional N:M sparsity configurations
Improving documentation and developer-facing visibility into compiler decisions

These steps are aimed at building confidence in the architecture before pursuing broader adoption claims.

About Maple Silicon Inc.

Maple Silicon Inc. is a Canadian technology company focused on compiler infrastructure and systems-level optimization for machine learning and high-performance computing workloads.

SparseFlow™ is the company's initial platform, built as part of ongoing engineering efforts into structured sparsity and efficient execution.

💬
Contact

Maple Silicon Inc. is open to collaboration, pilot evaluations, and discussions related to Canadian innovation and funding programs, including NRC IRAP.

We're open to 1–2 pilot evaluations with teams exploring structured sparsity or compiler/runtime assessment.

Email: info@maplesilicon.co