Induct and organize the manufacturers of AI chips

Artificial intelligence (AI) is built on three pillars: hardware, algorithms, and data. Among these, hardware refers primarily to the chips that run AI algorithms. This article aims to summarize and organize the manufacturers of AI chips.

Currently, AI chips are mainly divided into two categories: server-side (cloud) and mobile-side (terminal). The server-side chips are responsible for executing AI algorithms in large-scale computing environments. These chips must support a wide range of network structures to ensure high accuracy and generalization ability of the algorithms. Additionally, they require high-precision floating-point arithmetic and must achieve performance levels of at least teraflops (10^12 floating-point operations per second), which typically means High Power consumption. To enhance performance, they often support array structures, allowing multiple chips to be combined into a single computing array for faster operations.

On the other hand, mobile AI chips have a different design philosophy. They prioritize low power consumption, requiring high computational efficiency. This means some loss of calculation accuracy is acceptable, and fixed-point arithmetic or network compression techniques are often used to speed up operations. These optimizations make mobile chips more energy-efficient but may sacrifice some precision compared to their server-side counterparts.

The following sections will introduce both server-side and mobile-side AI chips. Some manufacturers produce both types, so there isn't always a strict distinction between them.

In the cloud server field, Nvidia's GPUs have become an essential part of AI infrastructure, and it's no exaggeration to say they are leading the market. According to reports, over 3,000 AI startups worldwide use Nvidia's hardware platforms. The stock market has responded positively as wellâ€”Nvidia's share price, once known for gaming chips, rose from $30 to over $120 in just over a decade. In February 2017, Nvidia reported a 55% year-on-year revenue increase, with net profit reaching $655 million, up 216% compared to the previous year.

As the dominant force in the PC era, Intel missed the mobile internet wave. However, in the AI era, it hasnâ€™t given up. It has actively invested in new technologies. After acquiring Altera, Intel introduced FPGA-based deep learning accelerator cards for cloud use. Similarly, the acquisition of Nervana further strengthened its position in the cloud. On the mobile side, Intel acquired Movidius. Letâ€™s start with Nervana and then discuss Movidius.

Nervana was founded in 2014 in San Francisco and raised $24.4 million from 20 different investors, including Draper Fisher Jurvetson (DFJ). The Nervana Engine, launched in 2017, is an ASIC chip optimized for deep learning. It uses High Bandwidth Memory, offering both high capacity and speed, with 32GB of on-chip storage and 8TB/s memory access. Nervana also provides cloud-based AI services, claiming to be the fastest in the industry, used by financial institutions, healthcare providers, and government agencies. Their new chips aim to keep the Nervana cloud platform at the forefront for years to come.

IBM has long been associated with Watson and has invested heavily in real-world applications. It also developed TrueNorth, a brain-inspired chip designed to mimic the human brain. TrueNorth is part of IBM's SyNapse project, aiming to break the traditional von Neumann architecture. Instead of separating memory and processing, TrueNorth integrates them, enabling local information processing and efficient communication between neurons. A prototype in 2011 had 256 neurons and could play Pong. By 2014, IBM had scaled this to 4096 cores with 1 million neurons, consuming less than 70mW. This chip can recognize objects in video footage at 30fps with 80% accuracy, far outperforming traditional notebooks in speed and energy efficiency.

In 2016, Google announced the development of TPU (Tensor Processing Unit), a custom chip designed specifically for machine learning. By reducing computational accuracy and optimizing transistor usage, TPUs can perform more operations per second, making machine learning models run faster. Google embeds TPUs into servers via PCIe interfaces. While GPUs are versatile, TPUs are ASICs designed for specific tasks, offering higher performance but at a higher cost.

China's Cambrian Technology, backed by the Chinese Academy of Sciences, develops specialized chips for deep learning. Its DianNaoYu instruction set improves performance by two orders of magnitude over x86 CPUs. The Cambrian series includes three processors: DianNao for neural networks, DaDianNao for large-scale networks, and PuDianNao for diverse algorithms. Cambricon-1A, launched in 2016, is the first commercial AI chip for smartphones, drones, and smart cars.

ARM recently introduced the DynamIQ architecture, promising a 50x improvement in AI performance over the next few years. This architecture allows up to eight cores to be configured for AI-specific tasks, and ARM will provide software libraries to optimize AI workloads. Based on the big.LITTLE architecture, DynamIQ reduces power consumption while improving performance, and future Cortex-A chips will adopt this technology.

In September 2016, Intel acquired Movidius, a company specializing in vision processing chips. Its Myriad2 processor is used in devices like Google Tango, DJI drones, and FLIR cameras. Movidius's chips are energy-efficient and suitable for embedded systems, enabling advanced visual computing in various applications.

CEVA, a provider of DSP IP, offers the CEVA-XM4 and XM6 for image and computer vision. XM6 supports deep learning with improved performance and lower power consumption. Features like scatter-gather and sliding-window processing help accelerate image and machine learning algorithms, targeting smartphones, autonomous vehicles, and security systems.

Eyeriss, a research project from MIT, is not a company but has received significant media attention. It is designed to be 10 times more efficient than a typical mobile GPU, using a unique memory architecture to reduce data transfer between cores and memory. Eyeriss is ideal for face and speech recognition in IoT devices like smartphones and robots.

Zhongxing Micro launched China's first embedded neural network processor (NPU) in 2016, called "Starlight Intelligence No.1." It achieves 98% accuracy in face recognition and consumes only 400mW. This chip is used in security cameras and is expected to expand to in-car cameras, drones, and industrial applications.

Horizon Robotics, founded by Yu Kai, focuses on building AI platforms based on deep neural networks. It develops both software and chips, aiming to create low-power solutions for environmental awareness, human-computer interaction, and decision-making. Its NPU chip is designed to boost performance by 2â€“3 orders of magnitude, supporting its OS and enabling advanced AI applications.

Shenjian Technology, founded by Tsinghua University researchers, produces the Deep Processing Unit (DPU). It aims to match GPU performance while consuming less power. Its products are based on FPGA platforms and are used in drones, security monitoring, and robotics. The embedded version outperforms Nvidia TK1, while the server-side version matches K40 at a fraction of the cost and power consumption.

The AI era has arrived. As companies race to innovate, who will stand out and lead the next generation? Only time will tell.

Automobile Connector

HuiZhou Antenk Electronics Co., LTD , https://www.atkconn.com