This article was originally published on Medium here.

“Startups” in semiconductor chip design space had been a rarity since the dot-com crash in the early 2000s.

Chip design requires massive development cost as design cycles are multi-year long with dependence on (1) expensive EDA (Electronic Design Automation) tools for design and (2) foundries for manufacturing — both of which are highly advanced technologies with very few players in the world.

Long design cycles from the conception of an architecture specification to its tapeout (tapeout is when a chip design is frozen & sent to a semiconductor foundry for manufacturing) plus time it takes to develop a SW stack to program new architectures further delays the point of revenue generation for such companies.

Initial high investment costs with delayed revenue and delayed improvement in gross-margin had caused major market consolidations after the 2000 dot-com crash and had made semiconductor chip startups less attractive for venture capital funding.

However the advent of AI in the last ~8 years with its unique computational requirement has exposed newer opportunities for domain-specific ASICs to be, once again, a high-risk-high-gain proposition for venture funding.

Introduction of Tensor Processing Unit (TPU), which is a chip designed specifically for Deep Learning (DL constitutes most of AI these days), by Google in 2017 demonstrated the possibility of building a domain-specific chip solution by a new player (new in terms of building ASICs) and cross validated the presence of a lucrative market for investors.

As seen in the chart below, 190% growth in funding for AI chip startups in 2017 and the steady increase since then (the 2020 column only represents funding in the first quarter of a pandemic year) is a reflection of renewed VC confidence in semiconductor chip startups. (Note: only logic processor startups in US are included in the chart below to highlight the point. SW, IP, memory, display drivers, sensors, MEMS, RF, power management, discretes or optoelectronics & processors are excluded.)

Semico Research Corp : Total funding by year in US based semiconductor chip startups (blue bars) and cumulative funding over past 5 years (green line)

Interestingly, majority of the new DL chip startups (interchangeably called AI chip startups), have targeted their chip solutions for Inference instead of Training.

As a brief recap, Deep Learning consists of two phases:

Training : A deep neural network is trained to perform a task by showing a large number of data examples. For example, a neural network is trained on 1.2 million images of the ImageNet dataset to learn 1000 categories of objects (eg: peacock, cricket, buckle etc.).

Inference : A trained deep neural network is deployed to perform the learned task on new data. For example, above network is deployed on Pinterest to tag new images which the network has never seen before.

Training is usually at least 3x more computationally intensive than Inference.

Training  = forward propagation + data gradient + weight gradient
Inference = forward propagation

Training & Inference phases have significantly different computational and memory requirements and hence it is possible to have chip solutions uniquely tailored for each task.

Training occurs in cloud or on giant supercomputers in premise today. Inference, dependent on the use case, can either occur in a datacenter or on edge devices closer to the point of data collection such as near IoT sensors or CCTVs for smart analytics or on autonomous robots.

Inference is the lower hanging fruit for a startup aspiring to target both inference and training solutions. This is proven by the fact that the first generation AI chip from new players such as Google, Graphcore, Habana (acquired by Intel), Tenstorrent among others were designed to target Inference and the subsequent generations are/will be meant for Training.

However the bulk of other AI chip startups are targeting only Edge Inference for numerous reasons, some of which are explained below.

Volume of deployment

One of the primary reasons is the sheer volume of the Edge Inference market which can be much larger than Training. A trained model is deployed for Inference on numerous endpoints based on the use case whereas Training can be performed repeatedly on the same supercomputer on-premise or in-cloud.

For example, once a model for a self-driving car has been trained on a giant cluster, it will be deployed for inference on the AI chips in millions of cars. Or for another example, once an automatic speech recognition model has been trained, it will be deployed for low latency inference across millions of smart devices like speakers, mobiles, refrigerators etc.

Diverse constraints = More Niches to explore

Based on different deployment use cases, inference chips can have different constraints or requirements.

The compute horsepower, acceptable chip area (form-factor) and power budget of an Inference chip performing movie recommendations on Netflix in cloud is quite different from an Inference chip in an autonomous bot doing pizza delivery which in turn is quite different from an Inference chip in a smart-assistant enabled speaker or mobile phone.

In the past, most of AI Inference used to happen in cloud but privacy concerns regarding personal data and demand for lower latencies (higher responsiveness) have created a push for inference to be performed on edge devices directly such as on mobile devices, smart wearables, smart speakers or autonomous cars.

This decentralization of inference computation from cloud to edge devices has opened up the opportunity for different players to focus specialization on different deployment tasks. The edge devices can choose to specialize only in specific endpoint applications unlike cloud devices which are required to demonstrate high efficiency performance on wider range of generic workloads running in a datacenter.

Datacenter Training is a harder battle for startups. High compute horsepower and flexibility of performance across a diverse set of workloads requires well integrated and mature software stack with very wide user adoption — which is even harder than designing new chip architectures.

Data Centers, amounting to massive infrastructure & maintenance costs, require a very high vote of confidence in the vendor which usually favors giant and old legacy providers over startups.

However, the proliferation of different niche edge AI use cases has created newer market segments which didn’t exist before and are ripe with opportunities for startups to capture.

Edge-based AI chipset market has been estimated to bring in 3.5x more revenue than cloud-based AI chips market by 2025 — $51.6 billion in revenue versus $14.6 billion respectively as seen in the chart below.

Datacenter chip space is bound to stay heavily guarded by giants such as NVIDIA, AMD and Intel who already have decades of lead in SW expertise and enterprise partnerships.

To gain a slice of this pie, a startup will have to demonstrate at least 10x improvement over existing chip solutions across a diversity of workloads and establish wide SW user adoption for customers to drop existing reliable partnerships — which is incredibly hard when the monetary runway is short & when existing players with larger resources are demonstrating continued performance improvement generation-after-generation; a problem currently being faced by one of the highest funded AI chip startup Graphcore.

Edge-based AI chip market on the other hand has many new niches to explore and build partnerships with, each of which requires specialization across narrow workloads which is a more achievable goal when funding is limited.

Edge-based AI chip market will continue to have high initial development cost but the promise of high volume deployment and opportunities to create & capture newer market niches would be like baking your own pie — which should be an obvious choice for hardware chip entrepreneurs & VCs.