This article, originally published in Chinese, was excerpted from the notes of a discussion that took place in Beijing on Sunday 26 Jan 2025. The discussion was organised by Guangmi Li who is CEO of Shixiang Cap, an investment services company focused on deep research. 

The offline discussion drew more than 10 people on a Sunday afternoon. The viewpoints of the notes are of the participants.

 

Section 1: The Mysterious DeepSeek

“The most important thing for DeepSeek is to push intelligence forward.”

1. The Founder and CEO:
The founder and CEO, Liang Wenfeng, is the core figure of DeepSeek. Unlike Sam Altman, Liang is highly knowledgeable in technology.

2. Reputation and Challenges:
DeepSeek has a good reputation because it was the first to release implementations of MoE, o1, and others. It gained an advantage by acting early, but whether it can achieve the best results remains to be seen. The upcoming challenge lies in resource limitations, meaning they must focus their limited resources on the most promising areas. The team’s research capabilities and culture are strong; with an additional 100,000 to 200,000 GPUs, they could achieve even more.

3. Long Context Improvement:
Between the preview and official release phases, DeepSeek made rapid improvements in long-context capabilities. Its 10K long context can be achieved using very conventional methods.

4. Computing Resources:
The CEO of Scale.ai claimed that DeepSeek has 50,000 GPUs, but in reality, it has far fewer. Based on public information, DeepSeek owns around 10,000 older A100 GPUs and potentially 3,000 H800 GPUs acquired before the export restrictions. DeepSeek places great emphasis on compliance and has not purchased any non-compliant GPUs, meaning their resources are relatively limited. In contrast, the U.S. large techs tend to use GPUs in a much more extravagant manner.

5. Focus Strategy:
DeepSeek has concentrated all its efforts on a very narrow focus, abandoning many subsequent areas such as multimodality. The goal is not merely to serve people but to develop intelligence itself, which may also be the key to their success.

6. DeepSeek’s Business Model:
In a way, quant can be considered DeepSeek’s business model. Its parent company, quant fund High-Flyer, was a product of the previous machine learning wave. The most important thing for DeepSeek is pushing the boundaries of intelligence; financial gain and commercialization are not top priorities. China needs leading AI labs to explore solutions that can surpass OpenAI. The journey towards intelligence is long, and as differentiation starts again this year, new breakthroughs are bound to emerge.

7. Talent Development:
From a purely technical perspective, DeepSeek functions like a “Whampoa Military Academy” for AI talent, significantly contributing to the talent ecosystem.

8. AI Lab Business Models:
The business models for AI labs in the U.S. are not great either. Currently, AI lacks a solid commercial model, which may need to be addressed in the future. Liang Wenfeng is ambitious—DeepSeek is not fixated on specific forms but is committed to advancing AGI.

9. Technical Efficiency:
A key takeaway from DeepSeek’s research papers is that many techniques focus on reducing hardware costs. In several major scaling directions, DeepSeek’s methods can effectively lower expenses.

10. Computing Power Demand:
In the long term, compute power will not be a limiting factor, but in the short term, the focus is on improving AI efficiency. Demand remains strong, and all major players are facing shortages in computing power.

11. DeepSeek’s Organizational Approach:

  1. When investing, top-tier talent is usually selected. However, DeepSeek’s model, which comprises talented young graduates from domestic universities, demonstrates that with good collaboration, abilities can gradually improve. Whether losing key personnel would disrupt their advantage is uncertain, but for now, it does not seem to have a significant impact. 

    2. There is plenty of money in the market, but DeepSeek’s core strength lies in its cultural organization. DeepSeek’s research culture is similar to ByteDance’s—fundamental and deep-rooted. The quality of culture depends on having sufficient funding and long-term vision. A solid business model is crucial for sustaining cultural values, and both DeepSeek and ByteDance have strong business models.

12. Why is DeepSeek Catching Up So Quickly?

  1. The demand for reasoning models requires higher-quality data and training. Pursuing long-text and multimodal capabilities from scratch to catch up with a closed-source model is more challenging. However, reasoning models do not require major architectural changes, making reasoning a more feasible target for catching up.
  2. The reason DeepSeek’s R1 model was able to catch up quickly may be because the task itself was not overly difficult. Reinforcement learning (RL) mainly helps models make more accurate choices. R1 did not surpass the efficiency of Consensus 32; instead, it spent 32 times the effort, effectively changing parallel exploration to a serial process. This did not push the boundaries of intelligence but made the process more accessible. 

Section 2: Explorers vs. Followers

“AI is like a step function; followers require 10 times less compute.”

13. AI as a Step Function:
AI development resembles a step function, where the compute requirements for followers have decreased tenfold. The compute costs for followers have never been too high, but explorers still need to train numerous models. The exploration of new algorithms and architectures will never stop. Behind every step function, there are significant investments by many people, meaning compute resources will continue to advance, with considerable investments also going into product development. Beyond reasoning, there are many other areas that are equally compute-intensive. The compute resources consumed by explorers might not always be visible, but without such investments, the next major leap might not occur. Many are also dissatisfied with current architectures and RL methods and will continue to push forward.

14. Threshold of Compute Needs:
When exploring new directions, using 10,000 GPUs may not always be significantly better than using 1,000 GPUs, but there is likely a threshold—if only 100 GPUs are available, achieving results would be highly unlikely due to the extended time required for each iteration.

15. Driving Progress in Physics vs. AI:
In physics, progress is driven by both academic researchers and industry labs. Academic researchers explore multiple directions without immediate return expectations, whereas industry labs focus more on efficiency improvement.

16. Efficiency Considerations:
From the perspective of explorers and followers, smaller companies with fewer GPUs must prioritize efficiency, while larger companies focus on how to obtain models faster. Many efficiency-improving methods suitable for 2,000-GPU clusters may not work well on 10,000-GPU setups, where stability becomes a higher priority.

17. CUDA Ecosystem Advantage:
The strength of the CUDA ecosystem lies in its extensive and comprehensive operator support. In contrast, companies like Huawei have achieved breakthroughs by focusing on commonly used operators, leveraging their late-mover advantage. Given access to 100,000 GPUs, the cost of leading in AI becomes significantly high, while being a follower is more efficient. The key question is how to decide between the two. The next major direction for followers in China could be multimodal AI, especially considering that GPT-5 has yet to be released internationally.

 

Section 3: Technical Detail 1: SFT (Supervised fine tuning)

“No need for SFT at the inference stage.”

18. The Biggest Shock from DeepSeek:
The most surprising aspect of DeepSeek is not its open-source nature or low cost, but the fact that SFT (Supervised Fine-Tuning) is no longer necessary for inference. However, outside of inference tasks, SFT may still be required. This raises important questions: Has DeepSeek introduced a new paradigm or architecture that makes model training more efficient with data? Or does it enable faster iteration of model performance?

19. SFT’s Role in DeepSeek-R1:
DeepSeek-R1 demonstrates that SFT-based distillation has significant benefits. While DeepSeek-R1 does not entirely eliminate SFT, it only applies it during the third phase, followed by RLHF (Reinforcement Learning with Human Feedback) for alignment.

20. R1’s Training Approach:
R1 is essentially trained through SFT, with the unique aspect being that its data is generated by an RLHF-trained model. This shows that complex methods may not be necessary; with a sufficiently good methodology, SFT distillation alone can be effective.

21.GRPO’s Core Idea:
The key to GRPO lies in having an intelligent base model. A single prompt generation requires 16 attempts to achieve a correct answer. The combination of a high-quality base model and verification represents DeepSeek R1’s approach. Math and coding tasks are well-suited because they are easier to verify, but theoretically, this process can be applied to other domains to ultimately achieve a generalized RL model.

22. Emergent CoT Without SFT in R1-Zero:
In R1-Zero, Chain of Thought (CoT) reasoning emerged without the use of SFT. Over time, the CoT process lengthens, making it an intriguing emergent property. SFT acts more as an assistive technique—it is possible to generate results without it, but its inclusion accelerates the process.

23. Implications for Smaller AI Players:
This indicates that many smaller AI vendors can leverage SFT to distill large models effectively, achieving great results. However, SFT has not been completely abandoned in the development of R1.

24. LLMs and the Turing Machine Perspective:
Theoretically, an LLM with an infinitely long CoT can be considered a Turing machine, capable of solving highly complex computational problems. However, CoT is essentially an intermediate search result, continuously sampling potential outputs to converge on the correct solution. To achieve accurate results, the model must perform computations, with CoT serving as an essential intermediate step. Whether it is seen as emergence or fundamental computation, CoT plays a critical role.

25. Long Context in DeepSeek Papers:
Although long-context capabilities were not explicitly mentioned in DeepSeek’s papers, user experience suggests significant improvements in context length between R1-preview and R1. It is suspected that improvements were made using Long2Short CoT techniques. During the third phase, CoT was likely used in SFT training, but removed in the final generation stage. The released version may have used a cleaner CoT dataset for SFT.

26. Types of SFT Data:
SFT data can be categorized into two types:

    • Cold-start data, which provides the model with a good strategy and initialization, improving its exploration capability. In RL, an optimization goal often aligns closely with the initial strategy.
    • Post-RL generated data, which is combined with additional data and used for further SFT on the base model. Each domain has its own data processing pipeline, and the model’s capabilities originate from the base model. Distillation in this context is lossless, meaning combining multiple domains could lead to generalization.

27. Data Efficiency in R1:
It is unclear how efficient the R1 data pipeline is. It is speculated that OpenAI has implemented similar optimizations for data efficiency, such as fine-tuning. In R1’s third phase, the model was not trained directly using RL-generated data; instead, the data was used for SFT, resulting in R1. The dataset includes 600K reasoning data and 200K non-reasoning data. The second-phase model likely needed to demonstrate reasoning capabilities in domains outside of its examples, leading to the collection of reasoning data. The non-reasoning data, part of V3 SFT data, enabled the model to infer a CoT. With only 800K data samples, the process appears highly efficient.

 

 

Section 4: Technical Detail 2: Data

“DeepSeek places great importance on data annotation.”

28. The Future of Scale.AI:
Scale.AI may not necessarily fail, as reinforcement learning (RL) is required across various domains. Math and coding remain common areas where expert annotation is still needed. However, data annotation is becoming increasingly complex, but the market demand for it will persist.

29.Challenges in Multimodal Data:
In training, the effectiveness of multimodal data is almost imperceptible, or the cost is too high. Currently, there is no solid evidence proving its utility, although future opportunities may arise.

30. DeepSeek’s Focus on Data Annotation:
DeepSeek highly values data annotation, with reports suggesting that Liang Wenfeng himself participates in the labeling process. In AI, besides algorithms and techniques, data accuracy is crucial. Tesla’s annotation costs are almost 20 times higher than those of Chinese autonomous driving companies. China’s autonomous driving data has evolved from being extensive to highly refined, ultimately realizing the need to hire drivers with extensive experience and skills—something Tesla has focused on from the beginning. For Tesla’s humanoid robots, they employed individuals with exceptionally well-coordinated cerebellums for annotation, resulting in smooth actions, whereas the individuals selected in China lacked this finesse. DeepSeek’s investment in data annotation is one of the key factors behind its model efficiency.

 

Section 5: Technical Detail 3: Distillation

“The downside of distillation is reduced model diversity.”

31. Potential Pitfalls of Distillation:
If the biggest technical challenges in model training are ignored in favor of distillation, it could lead to significant setbacks when the next generation of technology emerges.

32. Mismatch Between Large and Small Models:
Large models and small models have inherent capability mismatches. Distilling from a large to a small model, as in a teacher-student paradigm, is a true form of distillation. However, if a model that lacks Chinese proficiency is distilled with Chinese data, performance may degrade. Nevertheless, distilling small models has shown significant improvements—after distillation, further reinforcement learning can result in substantial growth, even when using mismatched data.

33. Reduced Diversity:
The major downside of distillation is the loss of model diversity, which can limit the upper performance bound and prevent surpassing the strongest models. However, in the short term, distillation remains a viable path.

34. Distillation Hacks and RL Challenges:
Certain distillation techniques involve hacks. In earlier stages, models fine-tuned via instruction are often subjected to RL, where models first generate irrelevant ideas before arriving at the correct answer. This is due to subtle RL hacks—models might memorize questions during pretraining and appear to be reasoning when they are merely recalling answers. This is one of the hidden risks of distillation. If models are distilled without proper annotation, it could lead to overly simplistic solutions when applying Reinforcement Learning with Verifiable Rewards (RLVR), rather than fostering genuine problem-solving capabilities—something even OpenAI hasn’t fully resolved.

35. Long-Term Risks of Shortcut Approaches:
Taking shortcuts instead of working towards long-term technological vision could lead to unforeseen pitfalls. For instance, without a qualitative leap in long-context capabilities in this generation of technology, the ceiling for problem-solving may remain limited. R1-Zero might represent a better direction—starting from scratch without using o1-type data could be more effective. Simply following others’ technical approaches may not be the best strategy; more exploration is needed.

36. Future Ecosystem and Model Roles:
Other models have achieved good results with distillation, potentially leading to an ecosystem where models take on the roles of teachers and students. Excelling as a good “student” model could become a viable business model.

37. Impact of R1 on Business vs. Technology:
While the technological impact of R1 may not be as groundbreaking as AlphaGo, its commercial potential and market appeal surpass AlphaGo by a significant margin.

38. Challenges of Over-Reliance on Distillation:
Distillation occurs in two phases—if only distilling o1 or R1 models without building an independent system and verifiable rewards, over-reliance on distillation may develop. In the general AI field, distillation alone isn’t viable due to the lack of explicit rewards and challenges in obtaining special CoT (Chain of Thought) patterns. Furthermore, the initial distillation process often leaves traces—models distilled from OpenAI may carry residual annealing artifacts. The reason why zero models can achieve such capabilities purely through RL is directly related to the base model’s ability to reflect after annealing.

39. Quality of Internet Data:
It is difficult to believe that models purely trained on unprocessed internet data can achieve high performance without annealing, as high-quality data is scarce online.

40. Exploration by Top Labs:
Currently, only a handful of top AI labs are investigating the optimal number of annealing stages and data ratios required for efficient training. Whether distillation is used or not, it remains a form of reinforcement learning, and SFT (Supervised Fine-Tuning) is essentially behavioral imitation—a form of infinite reinforcement learning. However, solely relying on SFT has a low performance ceiling and can harm model diversity.

41. Impact on AI Startups:
AI startups in the primary market are highly excited about DeepSeek. If DeepSeek continues iterating, it could offer significant flexibility for companies that are not publicly traded. Additionally, DeepSeek has distilled several smaller versions that can run on mobile devices. If this approach proves successful, it could raise the ceiling for many AI applications.

42. Setting Clear Goals for Distillation:
Defining clear objectives for distillation is crucial. OpenAI does not rely on data distillation, and surpassing OpenAI requires moving beyond distillation.

43. Future Model Capabilities:
In the future, models may need to learn to “skip steps” in their reasoning, improving their ability to maximize performance within a fixed context length.

 

Section 6: Technical Detail 4: Process Reward

“The upper limit of process supervision is humans; the upper limit of result supervision is the model itself.”

44. Challenges of Process Reward:
Process reward may not necessarily be ineffective, but it is prone to reward hacking—where the model doesn’t actually learn anything but still achieves high reward scores. For example, if a model generates 1,000 solutions to a math problem and none are close to the correct answer, using RLVR (Reinforcement Learning with Verifiable Rewards) may not help in training the model effectively. However, if a moderately effective process reward is available, it might guide the model in the right direction. The key factors include the difficulty of the problem and the reliability of the process reward.

45. Estimating Process Scores:
In PRM (Process Reward Models), if process scores deviate from reality, they become highly susceptible to exploitation. In theory, process supervision is feasible, but the challenge lies in determining the strength of the process evaluation and how to assign rewards accordingly. Currently, even in result supervision, companies rely on extracted answers for evaluation, but no fully mature solution exists to prevent models from gaming the scoring system. Self-iteration by models is the easiest to exploit. Process supervision isn’t difficult to implement since it can be enumerated, and it may represent a promising direction that hasn’t been fully explored yet.

46. Process vs. Result Supervision:
The upper limit of process supervision is defined by humans—there are many things humans simply cannot foresee. In contrast, result supervision sets the upper limit for model performance.

47. AlphaZero’s Advantage:
AlphaZero’s effectiveness lies in the clear win-loss outcomes at the end of a game, allowing reward calculation based on probabilities. However, in the case of LLMs, it is not always clear whether continuous text generation will lead to the correct answer. This situation is somewhat similar to genetic algorithms, where the potential upper limit is high, but they are also susceptible to hacking.

48. Mathematics and Coding as First Steps:
One advantage AlphaGo had over AlphaZero was the fixed nature of Go’s rules. Similarly, models today are starting with math and coding because they are easier to validate. However, the quality of reinforcement learning (RL) outcomes heavily depends on the effectiveness of validation methods. If the rules aren’t robust enough, the model might exploit them—technically satisfying the rules but producing undesired outputs.

Section 7: Why Haven’t Other Companies Used DeepSeek’s Approach?

“Big tech companies need to keep their models low-profile.”

49. Company Focus:
OpenAI and Anthropic have not followed DeepSeek’s approach mainly due to their strategic focus. They may believe their available compute resources are better allocated to other areas.

50. DeepSeek’s Narrow Focus Advantage:
Unlike major companies, DeepSeek has concentrated on language models rather than multimodal AI, allowing it to achieve more focused results. While big tech companies have strong model capabilities, they must remain low-profile and cannot publish too many details. Currently, multimodality is not a top priority, as intelligence primarily stems from language and does not directly contribute to enhancing intelligence.


Section 8: The Divergence and Bets on AI Technology in 2025

“Beyond Transformers: Can we find new architectures?”

51. Technology Diversification in 2025:
AI models are expected to diverge in 2025. The most compelling vision is to continuously push the boundaries of intelligence, with multiple breakthrough paths, potentially involving synthetic data and alternative architectures.

52. Exploring New Architectures:
The first priority for 2025 is to look beyond Transformers and explore other architectures. Some initial research has already begun, which could reduce costs while simultaneously expanding the boundaries of intelligence. Additionally, RL has yet to fully realize its potential. On the product side, there is considerable interest in AI agents, though large-scale applications are still lacking.

53. Multimodal Challenges to ChatGPT:
By 2025, multimodal AI may produce products capable of challenging the current ChatGPT model.

54. DeepSeek’s Cost-Effective Approach:
The low-cost, high-performance outcomes of R1 and V3 demonstrate that this is a viable direction. However, this approach does not conflict with the alternative strategy of scaling up hardware and model parameters. Due to resource constraints in China, the former approach is the only viable option.

55. Scaling Laws and DeepSeek’s Growth:

    • First, whether DeepSeek evolved from its base model by adhering to scaling laws.
    • Second, from a distillation perspective, DeepSeek follows the typical strategy of starting with a large model and scaling down, which benefits closed-source models that are growing larger.
    • Third, in the broader technological landscape, no reverse scaling indicators have appeared yet. If such indicators do emerge, they could challenge the scaling law paradigm. Furthermore, everything available in open-source models can be replicated within closed-source models while lowering costs, benefiting closed-source AI development.

56. Meta’s Replication Efforts:
Reports indicate that Meta is currently working on replicating DeepSeek’s results, but so far, it has not significantly impacted their infrastructure or long-term roadmap. In the long run, AI development must consider not only pushing boundaries but also cost efficiency, as lower costs enable more opportunities for innovation and broader adoption.


Section 9: Are Developers Migrating from Closed-Source Models to DeepSeek?

“Not yet.”

57. Current Developer Migration Trends:
Developers have not yet started migrating en masse from closed-source models to DeepSeek. This is largely because leading models currently have a strong advantage in coding instruction adherence. However, it remains uncertain whether this advantage will persist in the future.

58. Tool Use and DeepSeek’s Potential:
From a developer’s perspective, models like Claude-3.5-Sonnet have been specifically trained for tool use, making them highly suitable for building AI agents. DeepSeek does not currently offer similar capabilities, but it presents significant potential for expansion.

59. DeepSeek’s Current Value for Users:
For large-scale model users, DeepSeek V2 has already met most of their needs. While R1 has improved processing speed, it has not provided substantial additional value. In some cases, when deep reasoning is required, tasks that were previously answered correctly are now being answered incorrectly.

60. Engineering Approaches in Model Selection:
When choosing AI models, users typically simplify their problems using engineering methods. The year 2025 may be a pivotal year for AI applications, with various industries leveraging existing capabilities. However, the industry may soon hit a bottleneck, as most everyday tasks do not require highly intelligent models.

61. The Current Role of RL:
Presently, reinforcement learning (RL) effectively handles tasks with standard answers, but it has not achieved breakthroughs beyond what AlphaZero accomplished—in fact, it might be simpler. Distillation helps address the challenge of standard answer availability. When standard answers exist, RL methods can be highly effective, which explains why distillation and RL have seen rapid progress.

62. Underestimated Demand for Intelligence:
Human demand for intelligence is vastly underestimated. Critical challenges, such as cancer treatment and SpaceX’s heat shield materials, remain unsolved. While current AI tasks are primarily automation problems, there is significant potential for future breakthroughs, and the pursuit of intelligence must continue without pause.

 

Section 10: The OpenAI Stargate US$500B Narrative and Compute Demands

63. Skepticism Around OpenAI’s 500B Narrative:
The emergence of DeepSeek has led to doubts about NVIDIA and OpenAI’s ambitious US$500 billion (500B) compute plan. There is no clear assessment yet on whether such a scale is truly necessary, and OpenAI’s 500B initiative appears to be a strategic move to secure its future.

64. Financial Concerns Over 500B Investment:
There are concerns about OpenAI’s massive infrastructure investments, as it remains a commercial entity. If it resorts to significant borrowing, it could pose financial risks.

65. Feasibility of the 500B Plan:
A 500B investment is an astonishing figure and may take 4 to 5 years to fully realize. The key players—SoftBank and OpenAI—play distinct roles, with SoftBank providing the capital and OpenAI contributing the technology. However, SoftBank’s current cash reserves are insufficient to support such an investment, and they may need to leverage existing assets as collateral. OpenAI itself is not financially abundant, with most contributions coming from technology partners rather than financial backers. This makes the full execution of the 500B plan highly challenging.

66. Rationale Behind 500B Compute Demand:
Despite skepticism, the need for large-scale compute in AI exploration is logical. The cost of trial and error, including human and investment costs, is extremely high. The roadmap from o1 to R1 was not easy, but at least the expected results were somewhat predictable. Observing intermediate features offers direction, making it easier to replicate others’ final outputs. However, those working on the frontlines of next-generation AI research face the highest resource demands, whereas followers can progress without bearing exploration costs. If companies like Google or Anthropic succeed in their research, they will likely take the lead in AI.

67. Future Hardware Directions:
Anthropic may eventually replace all inference operations with TPUs or AWS custom chips.

68. Compute Efficiency and Chinese Companies:
Previously, Chinese companies were constrained by compute resources, but recent advancements demonstrate significant potential for more efficient AI models. These models may not require massive GPUs but instead rely on more customized chips, such as those from AMD or ASICs. From an investment perspective, while NVIDIA’s moat remains strong, ASICs present a growing opportunity.

69. DeepSeek’s Impact on the Compute Landscape:
DeepSeek’s achievements are not directly related to compute power but rather to its efficiency, making the U.S. acknowledge China’s AI capabilities. However, NVIDIA’s vulnerability is not due to DeepSeek alone—AI’s continued growth ensures NVIDIA’s relevance. NVIDIA’s strength lies in its ecosystem, which has been built over time. In rapidly evolving technological landscapes, ecosystems play a critical role. The real challenge will arise when AI technology matures, much like electricity, turning into a standardized commodity. At that point, specialized ASIC chips tailored for specific applications could emerge, posing potential competition to NVIDIA.

Second 11: Impact on the Public Market

“Short-term pressure, long-term narrative continues.”

70. Short-Term Market Impact:
DeepSeek has caused a significant short-term shock in the U.S. AI industry, impacting stock prices. The slowing demand for pretraining, coupled with the insufficient scaling of post-training and inference, has created a narrative gap for related companies, affecting short-term trading sentiment.

71. DeepSeek’s Compute Efficiency:
DeepSeek primarily focuses on FP8, whereas the U.S. predominantly uses FP16. DeepSeek’s standout feature is its ability to maximize efficiency with limited compute resources. Last Friday, DeepSeek’s news gained significant traction in North America. While Mark Zuckerberg raised capital expenditure expectations for Meta, NVIDIA and TSMC stocks declined, whereas Broadcom saw an increase.

72. Market Sentiment Pressure:
In the short term, DeepSeek’s impact on stock prices and valuations will create pressure for compute-related companies and even energy companies in the secondary market. However, the long-term AI growth narrative remains intact.

73. Concerns Among Public Market Players:
Industry participants worry that NVIDIA’s transition from H-series to B-series chips could create an “air pocket” in demand. Coupled with DeepSeek’s rising influence, short-term stock pressure is inevitable. However, in the long run, this might present a better investment opportunity.

74. AI Market Potential:
The short-term market impact reflects sentiment around DeepSeek’s low-cost training, directly affecting stocks like NVIDIA. However, AI remains an incremental market with vast potential. In the long run, AI is just beginning, and as long as CUDA remains the preferred choice, hardware growth prospects remain strong.

 

Section 12: Open-Source vs. Closed-Source

“If performance is similar, open-source poses a challenge to closed-source.”

75. Debate Around Open-Source vs. Closed-Source:
DeepSeek’s prominence highlights the ongoing competition between open-source and closed-source AI development approaches.

76. Impact on Proprietary Models:
OpenAI and other leading players might begin hiding their best models. However, with DeepSeek’s release, other AI companies might find it difficult to keep their top models undisclosed.

77. Coexistence of Open and Closed Models:
DeepSeek has achieved significant cost optimizations, but companies like Amazon have yet to adjust their strategies and continue with their existing plans. Currently, open-source and closed-source models coexist without conflict. Universities and small labs are likely to prioritize DeepSeek due to its cost-effectiveness, but it does not pose direct competition to cloud providers, who support both models. Cloud ecosystems remain unchanged. However, DeepSeek still lags behind Anthropic in areas such as tool use and AI safety, which are critical for gaining long-term acceptance in Western markets.

78. Pressure on Market Margins:
Open-source AI can control market margins. If open-source models achieve 95% of closed-source capabilities, companies may opt for them over costly closed-source options. If performance becomes nearly identical, closed-source models will face a significant challenge.

 

Section 13: DeepSeek’s Broader Impact

“Vision is more important than technology.”

79. China’s AI Perception Shift:
DeepSeek’s rise has made the world realize China’s AI capabilities. Previously, it was believed that China lagged behind the U.S. by two years. However, DeepSeek has shown that the gap is now only 3 to 9 months, with China even surpassing in some areas.

80. Proven Resilience Against Tech Restrictions:
Historically, China has overcome U.S. technological restrictions through intense competition, and AI is proving to be no exception. DeepSeek’s success demonstrates this resilience.

81. Gradual Success, Not Sudden:
DeepSeek’s success wasn’t an overnight breakthrough. The R1 results were impressive and resonated with key figures across various levels in the U.S.

82. Challenges of Pioneering AI Research:
While DeepSeek has built upon existing advancements, pushing the frontier of AI still demands significant time and human resources. The success of R1 does not guarantee future cost reductions in training.

83. China’s Strength in Engineering Efficiency:
As an AI follower, China can leverage its engineering capabilities to achieve results with fewer resources. The ability of Chinese AI teams to produce impactful models with limited compute power provides resilience against resource constraints and could potentially outpace competitors in efficiency.

84. Future of AI Reasoning:
Currently, China’s AI efforts are largely focused on replicating existing technological approaches. OpenAI introduced reasoning with its o1 model, and the next competition among AI labs will be about who can introduce the next major reasoning breakthrough. Infinite-length reasoning could be the next grand vision.

85. The Core Differentiator Among AI Labs:
The fundamental difference between AI labs lies not just in their technical capabilities but in their future vision.

86. Vision Over Technology:
Ultimately, vision is more important than technology itself.