How a Chinese start-up is changing how AI models are trained and outperforming OpenAI, Meta
January 2, 2025
DeepSeek’s model boasts an impressive 671 billion parameters, placing it on par with some of the most advanced models globally. Yet, it was developed at a fraction of the cost incurred by giants like Meta and OpenAI, requiring only $5.58 million and 2.78 million GPU hours
read more
Chinese start-up DeepSeek is making waves in AI developers all over the world, with the release of its latest large language model (LLM), DeepSeek V3. Launched in December 2025, this model has been hailed as a game-changer for its remarkable efficiency in development and cost-effectiveness. The Hangzhou-based company has quickly become a standout player in the global AI community, showcasing innovative strategies to overcome resource constraints and geopolitical challenges.
DeepSeek’s model boasts an impressive 671 billion parameters, placing it on par with some of the most advanced models globally. Yet, it was developed at a fraction of the cost incurred by giants like Meta and OpenAI, requiring only $5.58 million and 2.78 million GPU hours. These figures are a stark contrast to Meta’s Llama 3.1, which needed 30.8 million GPU hours and more advanced hardware to train. DeepSeek’s success highlights the rapid advancements of Chinese AI firms, even under US semiconductor sanctions.
Advertisement
Revolutionary approach to LLM training
DeepSeek attributes its efficiency to a novel architecture designed for cost-effective training. By leveraging NVIDIA’s H800 GPUs, customised for the Chinese market, the company optimised its resources to achieve results that rival those of much larger players. This pragmatic approach underscores the potential of resource constraints to drive innovation, as noted by industry experts like NVIDIA’s Jim Fan and OpenAI’s Andrej Karpathy.
Fan commended DeepSeek for demonstrating how limited resources can lead to groundbreaking achievements in AI. Similarly, Jia Yangqing, founder of Lepton AI, praised the start-up’s ability to produce world-class outcomes through intelligent research and strategic investments. DeepSeek’s early acquisition of over 10,000 GPUs, prior to US export restrictions, laid the groundwork for its success.
DeepSeek and controversies
DeepSeek has embraced open-source principles, making its models accessible to the global community. Its V1 model remains the most popular on Hugging Face, a leading platform for machine learning and open-source AI tools. This openness has put pressure on commercial AI developers to accelerate their own innovations.
However, DeepSeek V3 has faced criticism for occasional identity confusion, mistakenly identifying itself as OpenAI’s ChatGPT during certain queries. Experts attribute this issue to “GPT contamination” in training data, a common problem across many AI models. While such errors are not unique to DeepSeek, they have sparked discussions about the challenges of ensuring model accuracy and identity integrity.
A new era for AI development
DeepSeek’s rise signals a shift in the AI landscape, demonstrating that innovative approaches can rival the dominance of tech giants. Despite geopolitical hurdles, the start-up’s achievements underscore the potential for Chinese AI firms to lead in the global market. With strong backing from High Flyer Quant and a team of young, capable developers, DeepSeek is poised to continue disrupting the field.
As the AI community watches closely, DeepSeek’s journey serves as a testament to the power of ingenuity and adaptability in shaping the future of artificial intelligence.
Editor’s Picks
End of Article
Search
RECENT PRESS RELEASES
Related Post