Debrief #2: Cactus Compute

Henry Ndubuaku had an offer from Nvidia. He turned it down to build Cactus: an engine that runs AI on smartphones the way a calculator runs math, without sending anything to a server. Apple's Neural Engine does something similar for Apple's own apps. Ndubuaku built it for everyone else. A $200 Pixel 6a runs Cactus at 80 tokens per second while draining 10 percent of its battery per hour. To put this into perspective, a recent paper (Zhang & Huang, 2025) used an iPhone 15 Pro (much better hardware than the Pixel 6a) on a 1 billion parameter language model using one of the most widely used frameworks and only achieved 17 tokens per second. The 10% battery drain is equally impressive; the phone battery would completely drain in a few hours without its optimization process.

The Timing No One Saw

Current frameworks that are designed for executing mobile machine learning models, like TensorFlow Lite, PyTorch Mobile, and Core ML all target the same static vision tasks. They recognize an image or detect an object, but language models behave differently, because they need to remember conversations while generating words one at a time. They’ll also need to store past exchanges so that the phones don’t restart with every reply. There weren’t any frameworks designed for this, so Cactus wrote kernels, or the lowest-level code telling a processor how to handle a task, from scratch. The aforementioned mobile frameworks were all created before the Language model era, and Cactus is being built specifically for the new era.

The obvious advantage that this has over its alternative is threefold: cost, time, data. If mobile apps had to route AI through servers, they must pay per query, wait up to two seconds per round-trip, and risk exposing user data to the external server. Cactus solves all three. Once a developer decides to download a model to the device, running it costs nothing.

Why They Got Funded

Ndubuaku and co-founder Roman Shemet met through YC's founder matching program in London. Before Cactus, Ndubuaku was an AI research engineer building hardware-aware models for real-time video compression. This work shaped how he thinks about performance on constrained hardware and solidified his resolve to make the most out of little. He ultimately turned down Nvidia because the more urgent problem, as opposed to selling chips, was how to make AI run on the devices that everyone already owns.

YC's Summer 2025 batch funded the round, joined by Oxford's Seed Fund and FCVC, the firm behind Slack, Coinbase, and GitLab. What’s more, 62 tech CTOs, VPs, and Directors invested collectively under shared terms. These investors are the same people who authorize tech infrastructure purchases at their respective companies; the cactus fundraising is brilliant in that their investors are their buyers.

What's Disruptive. What Isn't.

Cactus sells to developers, not to consumers. Engineers embed it into their apps so users get AI features without cloud latency, API costs, or privacy exposure.

The real edge is the kernel architecture. CPUs and NPUs (Neural Processing Units are the dedicated AI chips inside modern smartphones, built for fast and battery-efficient calculations) both run Cactus's custom-written code. The result was a Pixel 6a at 80 tokens per second, and complete responses from AI in under one second from outdated mobile devices. One toolkit that could be integrated into iOS, Android, macOS, and wearables.

However, over one-fourth of the world uses Apple, and modern iPhones have neural engines, a chip dedicated to doing AI calculations efficiently. Third-party developers like Cactus can’t get access to this Neural Engine must go through Apple’s framework for machine learning to access the neural engine. TLDR: Cactus can’t exceed what the framework Apple imposed allows, giving Apple’s own integration systems, which can access neural engines directly, a leg up.

The Go-To-Market

Ndubuaku and Shemet released Cactus as open source, and the GitHub repo attracted co-maintainers from UCLA, Yale, UPenn, and Imperial College. Soon followed the 62 tech executives who invested.

The lesson here, I believe, is to release the tool first. Those who need it will find you.

The Blind Spot

Cactus has to constantly adjust to the operating systems that different phone developers install for their corresponding ecosystems. Additionally, Cactus's core framework is free and open-source. In their public GitHub code, there exists the access key for a Pro enterprise tier, but pricing and contracting terms aren’t made publicly available. It’s difficult to say, from an outside perspective, whether or not the revenue matches the benefits it provides to developers.

The Question

Henry Ndubuaku, at what point does cross-platform become an unassailable moat, and what does Cactus build before Apple closes the iOS performance gap?

The Takeaway

Doing things that don't scale is certainly powerful, but what about B2B companies? You don't pitch to 62 tech executives (it'd be difficult to find enough willing to give you the time of day). You release a free tool, let their team find it, and ask them to invest. Any founder building developer tools needs to read that sequence again.

Debrief #2: Cactus Compute

Related articles

Five Things Every Successful Startup Has in Common (Across Every Sector)

Debrief #5: Arcads.ai