How Chinese AI Startup DeepSeek Made a Model that Rivals OpenAI

January 25, 2025

On January 20, DeepSeek, a relatively unknown AI research lab from China, released an open source model that’s quickly become the talk of the town in Silicon Valley. According to a paper authored by the company, DeepSeek-R1 beats the industry’s leading models like OpenAI o1 on several math and reasoning benchmarks. In fact, on many metrics that matter—capability, cost, openness—DeepSeek is giving Western AI giants a run for their money.

DeepSeek’s success points to an unintended outcome of the tech cold war between the US and China. US export controls have severely curtailed the ability of Chinese tech firms to compete on AI in the Western way—that is, infinitely scaling up by buying more chips and training for a longer period of time. As a result, most Chinese companies have focused on downstream applications rather than building their own models. But with its latest release, DeepSeek proves that there’s another way to win: by revamping the foundational structure of AI models and using limited resources more efficiently.

“Unlike many Chinese AI firms that rely heavily on access to advanced hardware, DeepSeek has focused on maximizing software-driven resource optimization,” explains Marina Zhang, an associate professor at the University of Technology Sydney, who studies Chinese innovations. “DeepSeek has embraced open source methods, pooling collective expertise and fostering collaborative innovation. This approach not only mitigates resource constraints but also accelerates the development of cutting-edge technologies, setting DeepSeek apart from more insular competitors.”

So who is behind the AI startup? And why are they suddenly releasing an industry-leading model and giving it away for free? WIRED talked to experts on China’s AI industry and read detailed interviews with DeepSeek founder Liang Wenfeng to piece together the story behind the firm’s meteoric rise. DeepSeek did not respond to several inquiries sent by WIRED.

A Star Hedge Fund in China

Even within the Chinese AI industry, DeepSeek is an unconventional player. It started as Fire-Flyer, a deep-learning research branch of High-Flyer, one of China’s best-performing quantitative hedge funds. Founded in 2015, the hedge fund quickly rose to prominence in China, becoming the first quant hedge fund to raise over 100 billion RMB (around $15 billion). (Since 2021, the number has dipped to around $8 billion, though High-Flyer remains one of the most important quant hedge funds in the country.)

For years, High-Flyer had been stockpiling GPUs and building Fire-Flyer supercomputers to analyze financial data. Then, in 2023, Liang, who has a master’s degree in computer science, decided to pour the fund’s resources into a new company called DeepSeek that would build its own cutting-edge models—and hopefully develop artificial general intelligence. It was as if Jane Street had decided to become an AI startup and burn its cash on scientific research.

Bold vision. But somehow, it worked. “DeepSeek represents a new generation of Chinese tech companies that prioritize long-term technological advancement over quick commercialization,” says Zhang.

Liang told the Chinese tech publication 36Kr that the decision was driven by scientific curiosity rather than a desire to turn a profit. “I wouldn’t be able to find a commercial reason [for founding DeepSeek] even if you ask me to,” he explained. “Because it’s not worth it commercially. Basic science research has a very low return-on-investment ratio. When OpenAI’s early investors gave it money, they sure weren’t thinking about how much return they would get. Rather, it was that they really wanted to do this thing.”

Today, DeepSeek is one of the only leading AI firms in China that doesn’t rely on funding from tech giants like Baidu, Alibaba, or ByteDance.

A Young Group of Geniuses Eager to Prove Themselves

According to Liang, when he put together DeepSeek’s research team, he was not looking for experienced engineers to build a consumer-facing product. Instead, he focused on PhD students from China’s top universities, including Peking University and Tsinghua University, who were eager to prove themselves. Many had been published in top journals and won awards at international academic conferences, but lacked industry experience, according to the Chinese tech publication QBitAI.

“Our core technical positions are mostly filled by people who graduated this year or in the past one or two years,” Liang told 36Kr in 2023. The hiring strategy helped create a collaborative company culture where people were free to use ample computing resources to pursue unorthodox research projects. It’s a starkly different way of operating from established internet companies in China, where teams are often competing for resources. (A recent example: ByteDance accused a former intern—a prestigious academic award winner, no less—of sabotaging his colleagues’ work in order to hoard more computing resources for his team.)

Liang said that students can be a better fit for high-investment, low-profit research. “Most people, when they are young, can devote themselves completely to a mission without utilitarian considerations,” he explained. His pitch to prospective hires is that DeepSeek was created to “solve the hardest questions in the world.”

The fact that these young researchers are almost entirely educated in China adds to their drive, experts say. “This younger generation also embodies a sense of patriotism, particularly as they navigate US restrictions and choke points in critical hardware and software technologies,” explains Zhang. “Their determination to overcome these barriers reflects not only personal ambition but also a broader commitment to advancing China’s position as a global innovation leader.”

Innovation Born out of a Crisis

In October 2022, the US government started putting together export controls that severely restricted Chinese AI companies from accessing cutting-edge chips like Nvidia’s H100. The move presented a problem for DeepSeek. The firm had started out with a stockpile of 10,000 H100’s, but it needed more to compete with firms like OpenAI and Meta. “The problem we are facing has never been funding, but the export control on advanced chips,” Liang told 36Kr in a second interview in 2024.

DeepSeek had to come up with more efficient methods to train its models. “They optimized their model architecture using a battery of engineering tricks—custom communication schemes between chips, reducing the size of fields to save memory, and innovative use of the mix-of-models approach,” says Wendy Chang, a software engineer turned policy analyst at the Mercator Institute for China Studies. “Many of these approaches aren’t new ideas, but combining them successfully to produce a cutting-edge model is a remarkable feat.”

DeepSeek has also made significant progress on Multi-head Latent Attention (MLA) and Mixture-of-Experts, two technical designs that make DeepSeek models more cost-effective by requiring fewer computing resources to train. In fact, DeepSeek’s latest model is so efficient that it required one-tenth the computing power of Meta’s comparable Llama 3.1 model to train, according to the research institution Epoch AI.

DeepSeek’s willingness to share these innovations with the public has earned it considerable goodwill within the global AI research community. For many Chinese AI companies, developing open source models is the only way to play catch-up with their Western counterparts, because it attracts more users and contributors, which in turn help the models grow. “They’ve now demonstrated that cutting-edge models can be built using less, though still a lot of, money and that the current norms of model-building leave plenty of room for optimization,” Chang says. “We are sure to see a lot more attempts in this direction going forward.”

The news could spell trouble for the current US export controls that focus on creating computing resource bottlenecks. “Existing estimates of how much AI computing power China has, and what they can achieve with it, could be upended,” Chang says.