1. Introduction & Context
What is DeepSeek? DeepSeek is a Chinese artificial intelligence model – specifically a cutting-edge large language model (LLM) – that has recently burst onto the global AI scene. It’s essentially a chatbot/AI assistant powered by a foundational language model, much like OpenAI’s ChatGPT or Google’s models, but developed by a relatively small Chinese startup. DeepSeek’s claim to fame is that it offers ChatGPT-level capabilities at a fraction of the cost and compute that such models traditionally require. In other words, it can understand and generate human-like text, answer questions, solve problems, and even write code on par with the best models – all while being far more resource-efficient. This combination of high performance and low cost is what makes DeepSeek stand out.
Origin and Development: DeepSeek is developed by a Hangzhou-based AI startup founded in 2023 by entrepreneur Liang Wenfeng. Liang leads the company (with a parent firm called High-Flyer) and has attracted top AI researchers from leading Chinese universities to work on the project. Unlike China’s tech giants, DeepSeek is a lean operation – more like a research lab than a big corporation – staffed largely by young PhDs and engineers focused on a singular mission. The team rapidly iterated on its models; for example, DeepSeek’s V3 model (an earlier version) was reportedly developed in just two months on a budget under US $6 million. This is minuscule compared to the billions of dollars that companies like OpenAI, Google, or Meta have poured into their flagship models. Such frugal innovation was possible due to new techniques (discussed later) that squeeze more out of limited computing resources.
China’s LLM Trajectory: Until recently, China was not considered a leader in the largest LLMs, primarily because training these models demands massive computing power and cutting-edge semiconductor chips – areas dominated by U.S. firms. American companies enjoyed essentially unlimited access to top-tier NVIDIA GPUs and huge cloud data centers, giving them an edge. U.S. export controls also tried to deny China access to the latest AI chips, hoping to stall China’s progress. As a result, China’s AI efforts were often thought to lag in the ultra-large-scale models. That began to change as Chinese researchers sought clever ways around these limitations. DeepSeek is the clearest example: its January 2025 release of a new model blew apart the notion that only U.S. labs could build the best AI. On January 20, 2025, DeepSeek unveiled its R1 model, a high-performance LLM that achieved results comparable to OpenAI’s best – but built using fewer or less advanced chips. This demonstrated that Beijing’s AI community could innovate around hardware restrictions by improving efficiency. In effect, China found a way to “do more with less,” closing much of the gap in a very short time.
Why the Recent Attention? DeepSeek has grabbed headlines worldwide over the past few months for several reasons:
-
Rivaling Top Western Models: Tests show DeepSeek’s latest models performing on par with OpenAI’s ChatGPT (based on GPT-4) on many benchmarks. In fact, DeepSeek-R1 matched OpenAI’s model in certain challenging categories – for example, tying a top OpenAI model for first place in a difficult “style control” task on an international leaderboard. This shocked many experts who assumed U.S. models were at least a generation ahead.
-
Open-Source Release: In a surprising move, the startup open-sourced its R1 model under an MIT license, releasing the model’s weights to the public. This means anyone can inspect or use DeepSeek’s technology. Open-sourcing a state-of-the-art model was unexpected (even Western leaders like GPT-4 are closed-source), and it endeared DeepSeek to the global research community while raising eyebrows in rival companies.
-
Ultra-Low Cost: DeepSeek’s efficiency has translated into incredibly low usage costs. Notably, an earlier version (DeepSeek-V2) was offered for as little as ¥1 (₹11 or $0.14) per 1 million tokens of processing. For perspective, 1 million tokens is roughly 750,000 words – an enormous amount of text – so charging only a few paisa for that is virtually free. This unprecedented cheap access triggered what media called an “AI model price war” in China, forcing big players like Alibaba to slash their AI service prices by up to 97% to compete.
-
Market Impact: The tech industry took DeepSeek very seriously. When news broke of its capabilities and low-cost approach, it rattled financial markets. Investors began questioning whether companies like OpenAI, Microsoft, Google, etc., who spend tens of billions on AI, could be disrupted. Share prices of major AI chip and cloud companies dropped on fears that DeepSeek’s approach might reduce demand for expensive hardware. For instance, NVIDIA’s stock fell ~15-17% in late January amid concerns that its high-end GPUs might not be as essential if efficient models prevail. In China, DeepSeek’s launch was celebrated almost like a Sputnik moment – signaling that the balance of tech power might be shifting.
To summarize, DeepSeek is a Chinese LLM that has rapidly emerged to challenge the dominance of Western AI models. It’s gotten attention for matching the best AI brains at a much lower cost, for being open-source, and for symbolizing China’s newfound ability to innovate under hardware constraints.
2. Model Breakdown & Differentiation
DeepSeek’s Model Lineup: DeepSeek isn’t a single model, but rather a family of models and variants developed by the startup. Key offerings include:
-
DeepSeek V3: The flagship base model (third iteration of their main LLM). This is a general-purpose language model used for chat and various tasks. DeepSeek-V3 is the model that powers their AI assistant (chatbot) as of early 2025. It’s comparable to GPT-4 or Google’s PaLM in scope. Notably, V3 uses a Mixture-of-Experts (MoE) architecture with an enormous total parameter count (671 billion parameters, of which ~37B are active per input). (By comparison, OpenAI’s GPT-4 size is not public but estimated at ~1.5T params dense, and Meta’s Llama 2 is 70B dense; DeepSeek’s MoE effectively splits a huge model into “experts” so that only a fraction is used at a time, increasing efficiency).
-
DeepSeek R1: A specialized “Reasoning” model released in Jan 2025. R1 is tuned for complex logical and analytical tasks – essentially DeepSeek’s answer to models like GPT-4’s advanced reasoning (OpenAI’s rumored internal “o3” model). It was launched to address a gap: when OpenAI unveiled an advanced reasoning model that slightly outperformed DeepSeek V3 in things like math and problem-solving, DeepSeek quickly responded with R1. R1 impressed experts by matching OpenAI’s model on key reasoning benchmarks, causing many to view it as a turning point. Like V3, R1 was made openly available (in fact open-sourced), which is extraordinary. Think of R1 as a “smart problem-solver” variant of DeepSeek’s tech.
-
DeepSeek Coder: A version of the model fine-tuned for computer programming tasks. The company has a Coder model (and a second version, Coder V2) focusing on writing and debugging code. This is analogous to OpenAI’s Codex or Google’s Codey – models that specialize in generating software. Given that DeepSeek’s base models already perform well in coding benchmarks, the Coder variant likely further improves accuracy in languages like Python, Java, etc.
-
DeepSeek Math: A specialized model for mathematical problem solving. Math word problems and competitive math exam questions are notoriously challenging for AI. The Math model presumably is tuned on such data to excel at step-by-step reasoning for math (which ties into R1’s purpose as well). DeepSeek’s models have shown outstanding math performance – for instance, on the MATH benchmark and even recent contest problems, they outscored GPT-4 by a wide margin in accuracy.
-
DeepSeek VL: The “VL” likely stands for Vision-Language, indicating a multimodal model. This suggests DeepSeek is also working on or has released a model that can handle images or visual inputs along with text (similar to how OpenAI has GPT-4 Vision or how Google’s Gemini is expected to be multimodal). Details are sparse, but having a VL model means DeepSeek is branching into image understanding or generation territory – e.g., describing images, or maybe creating images from text, etc.
-
Other Versions: DeepSeek V2 (an earlier 2024 model) and intermediate versions like V2.5 were also released openly. V2 was the model that in mid-2024 triggered a price war due to its openness and low cost. There’s also reference to “DeepSeek LLM” which likely refers to their foundational model repository. In practical terms, DeepSeek’s main public-facing product is a chatbot (DeepSeek Chat) that uses V3/R1 behind the scenes, available via a free app, web interface, or API. The specialized models (Coder, Math, etc.) are more for research and fine-tuning, and they show the modular approach DeepSeek is taking (one core architecture, adapted for different domains).
Architecture & Training: All indications are that DeepSeek’s models are based on the Transformer architecture – the same fundamental neural network design underpinning virtually all modern LLMs (from GPT-3 to BERT to Llama). DeepSeek’s twist is using a Mixture-of-Experts (MoE) Transformer at massive scale. In an MoE model, instead of one monolithic neural network, you have multiple sub-models (“experts”) and a gating mechanism that routes each query to the most relevant experts. This way, you can have an extremely large total parameter count, but any given query only activates a subset of those parameters, saving computation. For example, DeepSeek V3 has 671 billion total parameters across many experts, but only ~37 billion are “active” for a given input. By comparison, a dense model like GPT-4 (if it were ~1T params) would have to churn through all trillion for every input. This architecture likely contributes to DeepSeek’s breakthrough in speed: the company claims V3 achieved a “significant breakthrough in inference speed” over previous models. In practice, users have found DeepSeek’s responses to come quite fast, which is a big plus for an AI assistant.
DeepSeek’s training approach also involved some clever efficiency hacks. According to reports, NVIDIA (the GPU maker) praised DeepSeek’s use of “test-time scaling” – a method to boost performance during deployment without retraining. Test-time scaling might involve dynamically adjusting model layers or ensemble techniques when the model is generating answers, to get better accuracy from the same model. Additionally, DeepSeek likely made use of extensive fine-tuning on both English and Chinese data, as well as specialized data for coding and math. This targeted training allowed it to excel in areas where many models struggle (we see this in its benchmark scores, e.g. high marks on coding challenges and math competitions). In summary, DeepSeek’s uniqueness lies not in reinventing how neural networks work, but in pushing known ideas (like MoE Transformers, scaling laws, fine-tuning) to new extremes to maximize output for minimal input.
Comparison with ChatGPT (OpenAI), Claude (Anthropic), Gemini (Google), Mistral etc.: DeepSeek inevitably invites comparison with both the established AI chatbots (ChatGPT, Claude 2) and newer entrants (Google’s Gemini, the open-source Mistral model). Let’s break this comparison down by key qualities:
-
Speed: DeepSeek is widely regarded as faster in inference than most big models. The developers explicitly highlight its speed advantage – V3 tops the inference speed leaderboard among open models. Thanks to the MoE design and efficient coding, DeepSeek can generate answers with fewer delays. In practical terms, users noticed that DeepSeek’s responses pop out quickly, whereas something like GPT-4 (ChatGPT’s highest setting) can be a bit slower due to its size. This speed edge makes DeepSeek attractive for real-time applications. Mistral (an open model from Europe) was also optimized for efficiency, but since Mistral’s largest public model is only 7B parameters, it’s fast but far less capable than DeepSeek. Compared to Claude, which is optimized for longer context rather than speed, DeepSeek likely feels more snappy. Google’s Gemini is still an evolving project (as of early 2025) but is expected to be powerful; however, if Gemini is dense and large, DeepSeek might still win on speed per query. In essence, DeepSeek delivers high-end performance without the usual lag, which is a significant differentiator.
-
Intelligence & Reasoning: In terms of raw problem-solving intelligence, DeepSeek has proven itself roughly on par with the best – and in some niches, even superior. Benchmark evaluations show DeepSeek-V3 scoring in the same league as GPT-4 on knowledge and reasoning tests. For instance, on a standard academic exam benchmark (MMLU), DeepSeek V3’s accuracy (around 88-89% on the English test) is comparable to GPT-4’s. When it comes to complex reasoning puzzles or math problems, DeepSeek’s specialized R1 model really shines. DeepSeek-R1 was able to tie OpenAI’s leading model for first place in a reasoning-heavy category (controlling style in responses) on the Arena leaderboard. It ranked third overall across all categories, just slightly behind the top two models (one of which is likely GPT-4 or its variant). This means DeepSeek can handle multi-step logical tasks, comprehension, and difficult questions nearly as well as ChatGPT’s latest versions. Compared to Anthropic’s Claude, which is quite strong in reasoning and very good at lengthy coherent answers, DeepSeek appears to be at least as good if not better on many benchmarks. Claude 2 excels at things like summarization and has a huge context window, but it hasn’t been shown to surpass GPT-4 in raw problem-solving; DeepSeek, however, has matched GPT-4 level in many respects. Google’s Gemini is a bit of a wildcard – it’s Google’s next-gen model expected to be top-tier, but until it’s fully seen, we only know that DeepSeek has forced Google to hurry (Google reportedly redeployed resources to accelerate Gemini after seeing competitors like OpenAI and now DeepSeek). Bottom line: DeepSeek is considered as “smart” as ChatGPT/GPT-4 on most tasks, which is a huge accomplishment, and its reasoning-specialized version R1 closes the gap in areas it initially trailed.
-
Factual Accuracy: On factual knowledge and truthfulness, DeepSeek again is comparable to other big models – which is to say, it’s very knowledgeable but not perfect. It was trained on vast datasets likely including Wikipedia, books, Chinese and English web content, etc., so it has a broad base of world knowledge. For example, on trivia and exam question benchmarks (like MMLU mentioned above), it scores in the high 80s% range similar to GPT-4, indicating strong factual recall. However, like its peers, DeepSeek can still hallucinate incorrect information or make errors when asked about things outside its training data or when asked in tricky ways. One tech commentator noted that DeepSeek is “as flawed” as other leading models in this regard – meaning it doesn’t miraculously solve the hallucination problem or always know when it’s uncertain. It has removed some mystique around AI by showing that even a cheaply-built model can be as capable and as prone to mistakes as the famous models. In practice, users testing DeepSeek have found it will sometimes refuse to answer (more on that in biases) or give an incorrect answer just as ChatGPT might. So, its factual accuracy is very high relative to most systems, but roughly on the same level as GPT-4/Claude (all of which still require user caution for critical facts).
-
Coding Ability: DeepSeek performs impressively in coding tasks. Early benchmarks showed DeepSeek-V3 could write and debug code nearly on par with OpenAI’s Codex/GPT. In standard programming tests like HumanEval (which measures if it can write correct solutions to coding problems), DeepSeek-V3 achieved about 82.6% pass@1, which is actually slightly higher than GPT-4’s 80.5% on the same test. This is remarkable – it suggests DeepSeek might even have a slight edge in certain coding benchmarks. Additionally, on more challenging coding competitions (like Codeforces problems), DeepSeek dramatically outperformed GPT-4 (placing in roughly the 52nd percentile vs GPT-4’s ~24th), indicating it can handle competitive programming logic well. The existence of a DeepSeek Coder model further implies dedicated training on large code repositories, which would only improve its coding reliability. So, compared to ChatGPT (which can code decently but does make mistakes) and Claude (also decent but generally not better than GPT-4 in coding), DeepSeek may actually be one of the best AI coding assistants currently available. Google’s Gemini is expected to have strong coding skills too (since Google has its own AlphaCode research), but again DeepSeek has set a high bar for an open model. The strength in coding is one of DeepSeek’s big selling points for real-world use, especially since it’s open-source – developers can potentially self-host it to help with programming tasks without sending code to a third-party cloud.
-
Strengths (Special Qualities): DeepSeek’s major strengths can be summed up as efficiency, openness, and bilingual excellence. It is special for achieving near state-of-the-art performance efficiently – using fewer resources and older hardware. Reports say the model was trained on a mix of available GPUs, not the absolute newest ones, yet it rivaled models that used far more compute. This efficiency implies others (smaller labs, startups) could replicate or build on DeepSeek’s methods, potentially democratizing AI development. Second, DeepSeek being open-source is a huge distinction. Its weights are available, meaning researchers around the world have audited its code and confirmed the company’s claims (something we can’t do with closed models like GPT-4). This fosters trust and a community of developers excitedly porting DeepSeek onto different platforms and improving it. The tech community is excited because an open model that is as good as ChatGPT breaks the monopoly of a few big players – it’s analogous to when open-source Linux matched the capabilities of proprietary UNIX, changing the industry dynamics. Third, DeepSeek is designed with both Chinese and English in mind. Unlike many Western models that are English-first, DeepSeek was trained heavily on Chinese content too, making it very fluent in Chinese and knowledgeable about Chinese facts/culture. Benchmarks in Chinese (like the C-Eval exam) show DeepSeek outperforming GPT-4 by a significant margin. So for applications in Chinese language or bilingual use, it has an edge. Additionally, DeepSeek’s architecture (MoE) means it could be more adaptable – experts can be added or adjusted for various domains (e.g., one could add more “experts” for medical knowledge, etc.). All these factors have the tech community excited that DeepSeek represents a new wave of high-performance open models that anyone can build upon, potentially accelerating innovation globally .
-
Weaknesses and Skepticism: Despite the hype, there are reasons the community remains somewhat cautious or skeptical about DeepSeek. One concern is content bias and censorship. Being developed in China, DeepSeek must comply with Chinese regulations when deployed there – and indeed users have observed the public DeepSeek chatbot censors certain topics. For example, when asked about sensitive issues like the Tiananmen Square events or Taiwan independence, DeepSeek will refuse to answer or even appear to start answering then delete its response in real-time . Its non-response rate to politically sensitive questions is notably high, presumably due to built-in filters aligned with Chinese law . This raises concerns for international users about biases – will the model avoid certain factual topics or skew answers to align with state narratives? Some experts warn that using a Chinese-origin AI could inadvertently spread censored or biased information if not carefully managed . Another practical weakness is safety and trust: because the model is open and can be self-hosted, there’s no central moderation like OpenAI has. This means if someone uses the base model without adjustments, it might generate harmful content or misinformation if prompted, just as any LLM might. The open-source community can try to put guardrails, but it’s an ongoing challenge. Also, since DeepSeek is relatively new, it hasn’t been battle-tested by years of diverse user interactions like ChatGPT has. There may be unknown failure modes or robustness issues that haven’t surfaced yet. On the technical side, while MoE is efficient, it can sometimes be harder to train or fine-tune compared to a dense model – some skeptics wonder if DeepSeek’s performance can keep improving or if it will hit scaling difficulties. Additionally, reliance on older hardware could become a bottleneck if they try to push much further; currently they still depend on Nvidia GPUs (some U.S.-origin tech) which might get harder to obtain if export controls tighten . Finally, voices in the industry caution that DeepSeek’s rise might be temporary if U.S. companies respond by doubling down – in other words, OpenAI, Google, etc., with their vast resources, may soon leapfrog DeepSeek again, especially since they’ll now be motivated to be more efficient too. DeepSeek has essentially proven a point, but the race continues. It’s telling that observers stop short of calling this achievement a true “Sputnik moment” because it’s not a one-time win – AI advancement is iterative, and DeepSeek still in part builds on U.S. tech and research (for instance, it wouldn’t exist without the transformer innovations from Google, or the hardware from Nvidia) .
So while it’s a game-changer, it’s not “game over” for the existing leaders.
Open-Source Status: Yes – DeepSeek is open-source, at least the core models like R1 (and seemingly V2/V3 as well). The company released DeepSeek-R1 under the MIT open license, making the model weights publicly available for download . This means developers worldwide can use DeepSeek for free, customize it, or even commercialize their own applications on top of it (with minimal restrictions). The open release was confirmed by multiple sources and is a deliberate strategy by DeepSeek to foster an ecosystem. It mirrors what Meta did with LLaMA (though LLaMA had a more restrictive license initially) and what smaller firms like Mistral AI are doing . DeepSeek’s open models join the likes of LLaMA 2, Qwen (Alibaba’s model which also was open-sourced), and Mistral in the movement towards open AI. For users, this openness is a huge plus: it ensures transparency (one can audit for any backdoors or biases in the weights), and it allows use of the model without sending data to a third-party service. However, note that DeepSeek the service (app/API) is also offered by the company – using that is not open-source (that’s just a hosted version of the model). But one could, for example, take the R1 model checkpoints and run them on their own server or cloud if they have sufficient GPU power. This open availability is partly why the tech community is so excited – it’s not just a breakthrough trapped behind corporate walls; it’s knowledge shared with the world, which could spur many derivative projects.
Unique Training Approaches: We touched on some, but to summarize the special sauce: DeepSeek’s team prioritized resource efficiency at every step. They utilized Mixture-of-Experts architecture, which is relatively rare at this scale (previously only Google had done something similar with Switch Transformers, and those weren’t fully productized). They likely used intense model compression and optimization techniques – possibly things like low-precision computation (e.g. bfloat16 or INT8 quantization) to speed up training and inference, and maybe gradient-checkpointing to fit in memory. The mention of test-time scaling indicates they might scale up the model’s active components only during inference to boost accuracy, which is a clever trick to avoid full cost during training. Another aspect is training data curation: to perform well on benchmarks and coding, DeepSeek must have assembled high-quality datasets (including multilingual data and coding repositories). Doing this on a limited budget suggests they leveraged a lot of open data and possibly synthetic data generation (using smaller models to generate additional training data, a known technique to amplify data). It’s also quite possible they benefited from the open research of others – for instance, they could incorporate ideas from Meta’s LLaMA or Google’s research publications to guide their training, essentially standing on the shoulders of giants for free. In short, the training was “budget-conscious” but very targeted to hit key performance metrics. This is a shift from the brute-force approach of “train a giant model on everything and hope it’s good at everything” – instead, DeepSeek’s approach was more surgical (focus on being excellent at the most important benchmarks and use cases). This strategy paid off dramatically, showing that smart engineering can beat sheer spending.
Compute Efficiency: By all accounts, DeepSeek is significantly more compute-efficient than models like GPT-4. It achieved similar performance using what appears to be an order of magnitude less compute power. One report noted that DeepSeek-V3 was built for under $6M , whereas OpenAI’s GPT-4 is rumored to have cost tens of millions (if not more) in compute. Additionally, DeepSeek can run on less “sophisticated chips” – implying you don’t need the absolute latest GPUs to use it . In fact, the fuss in the market was partly because DeepSeek seemed to require fewer or lower-end GPUs to achieve its results, challenging the assumption that only huge GPU clusters can make top models . DeepSeek found new efficiencies to get more out of the hardware . One can think of it this way: If GPT-4 is a gas-guzzling sports car that goes 200 km/h, DeepSeek found a way to go 200 km/h with a fuel-efficient engine – maybe a hybrid that uses 1/5th the fuel. This is a big deal for compute scalability. It means more actors (universities, startups, even individuals with some resources) could potentially train or fine-tune their own versions without needing an entire data center. That said, running something like DeepSeek-R1 still isn’t trivial – the model is large, and to host it with low latency you’d still want multiple GPUs. But relative to what was thought necessary, it’s much more accessible. The compute efficiency is perhaps DeepSeek’s most important contribution to AI tech: by showing it’s possible, it will push others to develop similarly efficient training algorithms and architectures going forward .
To sum up, DeepSeek offers a range of LLMs (general, reasoning, coding, etc.) that are transformer-based, with a focus on a mixture-of-experts design. It stands shoulder-to-shoulder with models like ChatGPT/GPT-4 in most capabilities – from answering questions to writing code – but distinguishes itself via speed and cost-efficiency. Its openness and Chinese-language proficiency add to its unique profile. While it shares the same fundamental tech base as other LLMs, DeepSeek’s clever training strategy and architecture tweaks make it leaner and thus potentially more disruptive.
3. Impact & Adoption
Who is using DeepSeek? In the few weeks since its release, DeepSeek has attracted a lot of interest from various quarters:
-
Individual Users: Millions of curious users worldwide have tried DeepSeek through its free app or web chat. Within China, it garnered a big user base quickly, as people flocked to see how it compares to ChatGPT (which is restricted in China). Internationally, tech enthusiasts and researchers downloaded the open-source model to experiment. The app even climbed app store charts in some regions. In essence, it’s become one of the first Chinese tech products to gain significant global usage purely on merit (as opposed to, say, TikTok which was entertainment-focused). User feedback often notes that for general purposes, DeepSeek’s responses are on par with ChatGPT – which is quite encouraging for adoption.
-
Researchers & Developers: The open-source nature means researchers worldwide have started using DeepSeek as a base for their own projects. We see activity on GitHub and AI forums where developers integrate DeepSeek into tools, or fine-tune it on niche data (because having the weights allows fine-tuning for, say, biomedical domain or other languages). It’s likely being adopted as a research benchmark as well – similar to how Meta’s LLaMA was used in hundreds of projects. The AI community’s excitement suggests many are swapping out their default models for DeepSeek in prototyping new applications, especially if they need something they can run locally or customize.
-
Companies and Platforms: In China, major tech companies quickly took note of DeepSeek’s capabilities. Instead of using only their in-house models, some may consider integrating or at least testing DeepSeek for certain services. There’s a report that ByteDance (owner of TikTok) updated its flagship AI model just two days after DeepSeek-R1’s release, claiming it beat OpenAI’s model on a key benchmark – a move seen as a response to DeepSeek’s challenge . Alibaba, another giant, rushed out an improved version of its Qwen model (Qwen 2.5-Max) during Lunar New Year, explicitly claiming it “surpassed DeepSeek-V3” . This competitive scramble implies that DeepSeek set a new bar that all the domestic players are now trying to meet or exceed . While these companies might not adopt DeepSeek directly (since they want their own IP), they are certainly influenced by it – e.g., adopting similar cost structures or open-sourcing strategies. Internationally, no big enterprise except Perplexity has formally adopted DeepSeek in products yet (it’s very new), and some are cautious (for reasons discussed below). However, smaller AI startups or non-profits outside China might well use DeepSeek as their model of choice due to its permissive license and strong performance. For instance, an educational platform could use DeepSeek to power a tutoring chatbot, avoiding API fees of OpenAI. We also see some data science platforms adding DeepSeek to their roster of available models given its popularity.
Traction Outside China: DeepSeek is indeed gaining traction beyond China, but with a mix of enthusiasm and wariness. On one hand, engineers around the world have welcomed an open competitor to Western models. Downloads of the model and usage of the API from overseas have been high – in fact, reportedly a significant portion of DeepSeek app users were from outside China early on . Its ability to converse in English at a high level made it attractive globally. On the other hand, governments and security experts outside China are casting a cautious eye. Several countries have already moved to limit DeepSeek’s usage in sensitive contexts. Australia, Italy, and Taiwan have banned DeepSeek on government devices or networks due to national security concerns . Australia’s directive, for example, cited an “unacceptable risk” in that DeepSeek (being a Chinese platform) could expose government data or be influenced by a foreign state . Similarly, some U.S. federal agencies reportedly advised against using DeepSeek on work systems, and South Korea has taken steps to block it on official devices as well . These actions mirror the bans on apps like TikTok – reflecting geopolitical trust issues rather than a problem with the AI’s function itself.
For the general public and businesses, the practical concerns influencing adoption are:
-
Data Privacy & Security: Since DeepSeek is a Chinese-origin AI, some worry that using its services might expose data to Chinese servers or oversight. The Sunday Guardian (an Indian publication) raised that data collected by DeepSeek could potentially be accessed by Chinese authorities under China’s laws . Although DeepSeek open-sourced the model (meaning you can run it yourself without sending data to China), many non-technical users will interact via the official app or API, which likely operates on servers in China or under the company’s control. This has made companies handling sensitive data (healthcare, legal, government) hesitate to use DeepSeek’s API, similar to how they’re cautious about any cloud service from a foreign jurisdiction. In essence, trust is a barrier – not trust in the model’s skill, but trust in the ecosystem around it.
-
Bias and Censorship: As mentioned, DeepSeek’s public version filters certain content. Outside China, this is seen as a form of bias – the model might refuse perfectly legitimate queries (e.g., historical or political questions) due to its built-in censorship. This can frustrate users who expect an uncensored AI. Moreover, there’s the opposite risk: if someone removes those filters (which one could do with the open model), the AI might produce content against Chinese law or general ethical norms, which could be problematic in China or if the model is later updated. So there’s a bit of cultural and regulatory mismatch to navigate. Businesses might fear that the model’s responses could be subtly biased towards Chinese perspectives on contentious issues, which could pose PR or compliance issues if those outputs go public.
-
Regulatory Compliance: DeepSeek as a service must follow Chinese AI regulations (we’ll detail those soon). This means it has certain safety features and content rules hard-coded. But those rules might conflict with, say, European free expression standards or requirements to explain decisions. Also, Europe’s upcoming AI Act might require transparency that a Chinese service may not provide. So companies in regulated industries (finance, healthcare) have to consider compliance – for instance, can they explain and document why the model gave a certain answer if audited? With OpenAI or Anthropic, these companies are in active dialogue with Western regulators; with a Chinese model, there’s less clarity. Nevertheless, tech-savvy adopters can sidestep some issues by using the open model internally – that resolves data privacy (data stays in-house) and allows them to apply their own moderation policies.
English vs Chinese Content Handling: DeepSeek is quite bilingual. It was trained on both English and Chinese corpora, and as a result it handles both languages proficiently. In English, as noted, it scores nearly as well as top English-trained models on benchmarks . In Chinese, it actually outperforms many Western models (since those often underperform on non-English tasks). For example, on the C-Eval Chinese academic exam benchmark, DeepSeek scored ~86.5%, beating GPT-4 which scored ~76% . This indicates DeepSeek has deep knowledge of Chinese history, literature, and can understand Chinese queries with nuance. The flip side is that its behavior might diverge based on language: in Chinese, it will be very strictly aligned with domestic content regulations (it likely has been explicitly tuned to avoid any politically sensitive outputs in Chinese). In English, it might be a bit more permissive on general topics but could still balk at things that are red flags universally (e.g., extremism, violence) or Chinese-sensitive topics if it recognizes them.
One interesting observation from users: When asked sensitive questions in English, DeepSeek sometimes answers a bit and then stops or gives a vague response – which suggests its content filter isn’t purely keyword-based but concept-based (and might trigger regardless of language). For instance, a user asking in English about “Tiananmen Square 1989” reported that the bot generated a few sentences and then the text disappeared, implying it self-censored once it realized the topic . So the model’s dual training means it carries over certain restrictions even in English conversation. However, for typical use (non-political, everyday questions, coding, etc.), it’s equally fluent and effective in both languages. This dual strength actually makes it quite valuable: a multinational company could use one model to support both English and Chinese customers, which is harder to do with, say, GPT-4 (which can do Chinese but not as expertly, and with no guarantee of complying to Chinese norms).
Biases and Safety: No AI model is free from bias, and DeepSeek is no exception. Given its training data, it likely inherits biases present in both Western and Chinese internet text. On social issues or factual narratives, it might skew depending on the predominant source. One concern raised by experts is that because it’s open-source, bad actors could fine-tune DeepSeek to generate disinformation or hateful content more easily than with closed models (which have some guardrails) – this is a risk with any open model (it was said of Meta’s LLaMA as well). There’s also the matter of Chinese government influence – while there’s no evidence of a backdoor, the perception of risk is there. Some U.S. commentators have half-jokingly asked: “If I use DeepSeek, am I chatting with an AI agent that might secretly be loyal to the CCP’s principles?” – Most likely not in any direct sense, but the concern underscores the trust gap.
Effect on the AI Race: DeepSeek’s emergence is a significant moment in the AI race. It signals that China is rapidly closing the gap with the U.S. in top-tier AI capabilities. Only a year ago, the consensus was that the most advanced LLMs (GPT-4, etc.) were a generation ahead of Chinese models like Baidu’s ERNIE or Alibaba’s Tongyi. Now, DeepSeek has leapfrogged into a leading position, at least temporarily. This accelerates the bipolarity in AI – no longer is cutting-edge AI monopolized by Silicon Valley; Beijing (and by extension the open-source community) has a foothold at the summit . The coordinated release of several Chinese models around the same time as DeepSeek R1 (apparently encouraged by the Chinese government to showcase strength) further cements the idea that China is mounting a concerted challenge . If U.S. policymakers thought export controls on chips would maintain a 2-3 year technology lead, DeepSeek showed that strategy is not foolproof .
From an industry perspective, DeepSeek has introduced price competition and open-source pressure. OpenAI and others may now need to consider lowering their pricing or offering more transparency due to this new competitor. In China, we already saw companies racing to cut costs and open source parts of their models after DeepSeek appeared (Alibaba open-sourced parts of Qwen, Baidu announced it would open-source a version of ERNIE by mid-2025 ). In the West, OpenAI might not open-source GPT-4, but they might release more details or expedite GPT-5’s development to keep the performance crown. Similarly, Anthropic (Claude’s creator) will want to show that their safety-focused model can also keep up in performance and not lose relevance. Google’s Gemini launch will be watched closely to see if it clearly outperforms DeepSeek; if not, that’s a huge PR win for the Chinese startup.
Competitive Challenge for Western Firms: Yes, DeepSeek poses a direct competitive challenge. For OpenAI, it undercuts their main advantage (having the most capable model). If a free or cheap open model is as good as ChatGPT, why would developers pay OpenAI API fees or why would consumers stick to ChatGPT Plus? This could chip away at OpenAI’s market share, especially for cost-sensitive users or those wanting more control over the model. It also challenges OpenAI on the narrative: so far, OpenAI has been seen as leading the AI frontier, but now a tiny startup achieved similar results – that might force OpenAI to justify its massive expenditures and perhaps refine its approach (maybe by incorporating some of DeepSeek’s efficiency tricks). Anthropic’s Claude, which competes on being a safer, more transparent model, now faces an open model that anyone can inspect – arguably even more transparent. Claude is also closed-source and not cheap for API use; some users may opt for DeepSeek if safety is not their top concern. Anthropic might respond by emphasizing Claude’s guardrails (DeepSeek could be portrayed as “uncertified” and risky for enterprises), or by accelerating their own model improvements. Google with Gemini (and its existing PaLM 2 model) is in a slightly different position because Google has huge compute and lots of proprietary data (like search data). Google might still outdo DeepSeek in some areas, but the gap is certainly smaller. Also, Google has been trying to position itself as the leader in AI research (with DeepMind, etc.) – now they have a new competitor to watch. We might see Google adopting more open science approaches (they already collaborated in the BigScience project for BLOOM, an open model; maybe they’ll engage more in open collaborations to not let independent open models steal the show). Mistral AI (a French startup) and others in the open-source community actually might benefit from DeepSeek’s rise: it validates their ethos that small teams can compete with big labs. Mistral’s first model (7B) was impressive at that scale; they plan larger models in 2024/2025. They could possibly incorporate techniques from DeepSeek (if published) to enhance their own. However, one could also say DeepSeek “stole some thunder” from Western open models by jumping ahead so dramatically. Ultimately, all these companies now have to consider DeepSeek as part of the competitive landscape, much like they consider each other. It’s no longer unthinkable that the best AI model could come from a Chinese startup.
One important impact is on the economics of AI: DeepSeek showed that a lean team could achieve what only giant teams did before, which might lead to a deflation in the “AI arms race” spending. If efficiency can be replicated, we may not see $4 billion training runs as the only way – more teams might try smaller budget, smart training runs. This could lower barriers to entry globally. But it could also mean big firms pivot to focusing on data advantages or specialized hardware to differentiate, since pure model performance might commoditize. The AI race is thus shifting from one of pure capability to one of accessibility and trust. China’s entry via DeepSeek ups the ante for the U.S. to not just have the best model, but to ensure it remains widely used and trusted.
In short, DeepSeek’s arrival has internationalized the AI competition at the highest level. For users and companies, it presents a new attractive option (especially due to cost and customizability), but also raises questions about security and bias that each adopter must weigh. It’s a bold demonstration that the U.S. lead in AI can be challenged much sooner than many expected, and it likely marks the beginning of a more multipolar AI world.
4. Future Outlook & Opinion
What’s Next for DeepSeek? Given its meteoric rise, all eyes are on what DeepSeek will do next. The company and its founder Liang Wenfeng have stated that their ultimate goal is to achieve AGI (Artificial General Intelligence) – in other words, they’re not resting on just matching GPT-4; they want to push towards an AI that can perform any intellectual task a human can. In the near future, we can expect:
-
Continual Model Improvements: DeepSeek will likely iterate quickly on its models (perhaps a V4 or R2 model in the coming months). These could bring moderate boosts in accuracy or new capabilities. For instance, we might see an extended context version (to allow longer inputs/outputs, something Claude and GPT-4 have done). Or an improved multimodal model DeepSeek-VL that can handle images more adeptly, given the importance of vision (think of OpenAI’s inclusion of vision in GPT-4 or Google’s focus on Gemini being multimodal). Since they already have a VL research model, a public release or demo of image understanding could be on the roadmap.
-
Optimization and Democratization: DeepSeek might work on reducing the resource requirements further so that smaller deployments are possible. Already efficient, they might target getting the model running on a single high-end GPU (which would be huge for spreading adoption). If they manage to compress or distill their big model into a smaller one (like a 13B or 7B parameter model with decent performance), it could be a game-changer for wide accessibility. They’ve open-sourced, so the community might do this as well (for example, techniques like LoRA fine-tuning or quantization could yield 4-bit or 8-bit versions of DeepSeek that hobbyists can run).
-
Applications & Ecosystem: DeepSeek the company will likely build out its ecosystem – better developer tools, a more robust API platform, perhaps targeted solutions for enterprise. They have an API now, and as interest grows, they might provide fine-tuning services or domain-specific models (like a model for medicine, law, etc.). Essentially, to sustain itself financially, DeepSeek may adopt a model similar to OpenAI’s: keep core research open-ish but sell value-added services. Given the open source, they might also position as an AI service provider in China, capturing clients who might otherwise use Baidu or Alibaba’s models.
-
Global Collaboration or Isolation? A big question: will DeepSeek collaborate internationally or become more siloed? On one hand, their open approach suggests a willingness to collaborate with the global AI community. They could partner with Western labs or join initiatives (for example, one can imagine them participating in an open benchmark contest or contributing to open datasets). On the other hand, rising geopolitical tension might make open collaboration tricky. The Chinese government is clearly proud of DeepSeek – reports suggest the simultaneous launch of multiple models was orchestrated by Beijing to showcase strength . DeepSeek might receive more state support (funding, computing resources) going forward. If that comes with strings attached, the company might prioritize national goals (e.g., focus on Chinese language, government projects) over international cooperation. However, since their approach thus far has been open, I lean toward them continuing to engage the open-source community, as that is also a savvy way to benefit from global talent.
-
Commercial Scaling vs. Research Focus: The startup will also face a choice of whether to scale up commercially (more users, more revenue) or remain research-heavy. Liang Wenfeng has somewhat downplayed commercial priorities like pricing wars, emphasizing the AGI mission . If they stick to that ethos, they might continue releasing advanced models freely to maintain the lead and gain influence, rather than monetizing aggressively. This strategy could be supported by government grants or strategic investors in China who see value in the “national AI champion” approach. It’s a bit reminiscent of how some national projects are run: focus on tech achievement, worry about profit later. On the flip side, too much focus on research without a business model can be risky for a startup’s survival – but given the fanfare and likely support around DeepSeek, they probably have no shortage of funding now.
Can it challenge OpenAI’s dominance long-term? Potentially, yes. In the near term, DeepSeek has already challenged OpenAI by matching their flagship model’s performance. To truly challenge OpenAI’s dominance, DeepSeek would need to maintain a rapid pace of innovation and expand its adoption. OpenAI has a head start in deployments (ChatGPT integrated into a lot of products, big customer base via Azure, etc.), so catching up there is non-trivial. But consider that technology leadership can shift quickly – if OpenAI’s next model (GPT-5 or “o2” or whatever naming) is only marginally better and DeepSeek or another Chinese model matches it again at lower cost, OpenAI’s mystique would be further eroded.
One factor is the open vs closed dynamic: OpenAI’s models are closed, which has worked so far because they were clearly ahead. If open models like DeepSeek reach parity, many developers will prefer the open ones for flexibility and cost. This could form a kind of swarm competition against OpenAI – many open-source contributors collectively improving non-OpenAI models. OpenAI might then find itself competing not just with one company, but with an entire open ecosystem (which DeepSeek is now a part of). This is similar to how Linux and Android (open ecosystems) eventually dominated certain tech domains over proprietary incumbents. It’s too early to say if AI will follow that pattern, but the signs are there.
On the other hand, OpenAI still has advantages: huge proprietary datasets, extremely deep research expertise, and integration with Microsoft’s products (like Office, Windows, etc.) that guarantee a user base. DeepSeek is not at that scale of influence yet. Also, OpenAI might respond by upping the ante – perhaps introducing more multimodal capabilities, better factual accuracy, or novel features (like memory, personalization) that DeepSeek doesn’t yet have. OpenAI could also leverage the trust factor: Western businesses/governments might simply refuse to use a Chinese model for sensitive work, sticking with OpenAI for that reason alone. So OpenAI’s dominance, especially in English and enterprise domain, won’t disappear overnight.
In summary, DeepSeek could become a formidable long-term competitor to OpenAI if it continues to innovate and if the open model paradigm gains ground. It’s not guaranteed – but at the very least, it ensures OpenAI won’t go unchallenged. We may end up in a scenario where for general AI tasks, people have two top choices: one from the US (OpenAI/Anthropic/Google) and one from China (DeepSeek or its successors), analogous to how in telecom you had Western vs Chinese suppliers. That itself is a shift from a U.S.-only field. If DeepSeek achieves some breakthrough (say, something approaching AGI capabilities) before OpenAI does, then yes, it could leapfrog into the leader position. Right now, they are neck-and-neck in tech, but behind in deployment.
Chinese Laws & Regulations around LLMs: China has instituted specific regulations governing generative AI and LLMs. In mid-2023, the Interim Measures for Generative AI Services were introduced (effective August 15, 2023) . These rules require any AI service that’s available to the public in China to adhere to certain guidelines:
-
Content Censorship: The AI must not produce content that harms national security, promotes terrorism, incites secession, or violates the constitution, etc. Essentially, it must align with Chinese censorship rules (“core socialist values” is a term often mentioned in these regulations). That’s why DeepSeek censors topics like Tiananmen – it’s following the law. Services have to filter training data and outputs for prohibited content.
-
Data and Privacy: There are provisions about training data needing to be from legitimate sources and respecting intellectual property. Also, user data and input confidentiality must be protected. For instance, if a user inputs personal info, the AI provider must not misuse it.
-
Registration and Licensing: AI providers in China must register their algorithms with government authorities. They might have to pass security assessments, especially if their model could influence public opinion. In fact, some high-profile models got government approval before launch. DeepSeek likely had to ensure compliance to get clearance for wide release.
-
Transparency: The regulations encourage labeling AI-generated content as such (to avoid confusion with human content). Also, the algorithms may need to be somewhat transparent to regulators (if not to the public). In practice, companies might need to disclose technical details to certain agencies if asked.
-
Liability: The provider is responsible for any misuse or harmful output. So companies must have mechanisms to prevent and address issues, or they can be held accountable. This strong liability probably motivates the heavy-handed filtering we see – they’d rather the model refuse many queries than accidentally allow a politically sensitive one.
Comparatively:
-
EU: The European Union is in the final stages of its AI Act, a comprehensive regulation that will likely come into force in 2025/2026. The AI Act classifies AI uses by risk. A general-purpose AI like an LLM isn’t outright banned, but if it’s used in a high-risk setting (e.g., law enforcement), certain obligations apply. The Act will require transparency (e.g., disclose AI-generated content), some level of explanation of how the model works, and mitigating biases. It doesn’t impose ideological content rules like China’s does, but it might require that models are evaluated for things like disinformation or hate speech risks. Also, privacy (GDPR) intersects – training data that includes personal data is a concern. In short, EU’s approach is more about safety, fairness, and accountability broadly, rather than censoring specific topics. If DeepSeek or any AI were to be offered in Europe, it would have to comply with those upcoming rules – possibly needing to reveal training data sources or performance metrics on certain benchmarks.
-
US: The United States has no blanket law for AI yet. The approach has been more laissez-faire, with an emphasis on innovation. There are guidelines (the White House released a Blueprint for an AI Bill of Rights as a non-binding guide, NIST released an AI Risk Management Framework) and more recently some voluntary commitments extracted from AI companies to address safety. In late 2023, President Biden issued an Executive Order on AI that, among many things, tasks agencies with setting standards for safety testing of AI models and calls for mechanisms to review AI before public release (especially very powerful models). But again, these aren’t law yet – it’s more of a direction-setting. Also, the US focuses on export controls (as we saw, limiting chip exports to slow adversaries’ AI progress) . The US might eventually consider some licensing regime for very large models (some have suggested something akin to FDA approval for AI), but as of early 2025, companies like OpenAI are self-regulating with input from oversight bodies. So, an AI like DeepSeek in the US context is not illegal or anything; users and companies are free to use it. The main US stance is to ensure AI doesn’t violate existing laws (like discrimination laws, etc.) and to encourage responsible AI development with industry cooperation rather than heavy regulation (for now).
-
India: India currently does not have dedicated AI laws comparable to China’s or the EU’s proposed ones. The Indian government has taken a relatively hands-off regulatory approach so far, focusing on encouraging AI innovation and adoption. In 2023, the IT Ministry stated it was not looking to regulate AI too strictly yet, given it’s an evolving technology. Instead, they are working on an “India AI” program to foster research and also looking at ethical guidelines. India might leverage existing laws (like IT Act and data protection laws once in force) to cover misuse of AI (e.g., if AI content causes harm or involves personal data misuse, those laws apply). Additionally, for issues like misinformation, India has general mechanisms (e.g., social media takedowns) that could be extended to AI-generated content. In essence, Indian regulation of AI is currently light – there’s acknowledgement of needing responsible AI but also a desire not to stifle innovation in a field where India wants to catch up. We might see sectoral guidelines (for example, RBI could issue guidelines if banks use AI chatbots, etc., focusing on consumer protection and data security). But nothing as specific as China’s generative AI rules exists in India yet.
So, comparing:
-
China’s laws are strict and content-focused, effectively requiring censorship and licensing for LLMs, but also supportive in the sense that the government is actively encouraging AI development under these rules (they see it as strategic).
-
The EU’s approach is risk and rights-focused, more process-oriented (document your AI’s risks, ensure oversight, etc.), aiming to make AI trustworthy and fair, possibly at the cost of more bureaucracy for developers.
-
The US is innovation-focused with targeted controls, preferring not to broadly regulate model development/deployment yet, aside from specific measures like export bans or promoting voluntary standards.
-
India is application-focused and advisory at this point, with minimal regulation specifically for AI, though that could change as AI becomes more prevalent.
For a Chinese company like DeepSeek, Chinese regulation obviously directly affects it (hence the inbuilt censorship). If DeepSeek were to offer services in the EU, they might run into compliance challenges with data transparency (e.g., “what data was it trained on?” – something they haven’t fully disclosed) and provenance of output. In the US or India, user concerns (privacy, etc.) might be more of an issue than legal barriers.
Another aspect: China’s approach of open-sourcing models like DeepSeek R1 might in part be to sidestep some regulatory burdens – once weights are out globally, the model outside China can be used without the company being liable for those uses. Inside China, the company ensures the service is compliant. Outside, if someone fine-tunes it to remove filters, that’s out of Chinese jurisdiction. This means Chinese regulation, while strict internally, doesn’t stop Chinese labs from contributing to global open AI research – a savvy move that still advances their influence (and maybe even an implicit way to get around export controls by “exporting” a high-tech model openly).
Will DeepSeek continue improving? It’s reasonable to expect yes. The rate at which they went from V2 to V3 to R1 (all within about 8 months) is astonishing. With the momentum and presumably greater resources now, they are likely to continue a rapid improvement cycle. They might run into diminishing returns – going from parity with GPT-4 to surpassing it significantly may be tougher, especially if OpenAI is also moving forward – but at the very least they’ll refine what they have. One thing to watch is safety and alignment: as they improve, will they also invest in making the model safer and more aligned (like reducing hallucinations, adding explainability)? If they want to appeal to enterprise users, they might need to. This could involve more research on instruct-following and guardrails.
In conclusion for this section: DeepSeek’s future looks promising as a cutting-edge AI platform, and it certainly has the potential to remain a serious challenger to the likes of OpenAI. Chinese regulatory environment will shape its behavior domestically (keeping it on a short leash content-wise), while globally it will have to win trust to be widely adopted. The next year or two will likely determine if DeepSeek solidifies its place as a top-tier AI provider (possibly coexisting with Western models), or if it sparks an AI “arms race” where each side tries to leapfrog the other’s last move. Right now, it appears to be ushering in a more competitive and diverse AI ecosystem, which could benefit users worldwide with faster innovation and more choices.
5. Indian Context & Perspective
India’s Focus on AI Applications vs. Foundation Models: India has been very enthusiastic about AI’s potential, but largely in terms of applying AI to solve problems (in agriculture, healthcare, governance, etc.) rather than creating foundational AI models like DeepSeek or GPT-4 from scratch. There are a few reasons for this:
-
Cost and Complexity: Developing giant foundation models is extremely resource-intensive – it needs huge datasets, massive compute power, and deep research expertise. One prevailing opinion has been that India, with limited research funding in AI compared to the US/China, should not sink billions into reinventing what others have already built, but rather leverage those models for localized applications . Essentially, why spend, say, $500 million to create an Indian GPT-4, when we could spend far less to adapt existing open models to Indian languages and needs? India’s tech industry and government historically have been very cost-conscious and ROI-driven, which led to a strategy of focusing on downstream applications where immediate benefits are clear (like using AI for crop yield predictions or language translation services).
-
Talent Distribution: While India has a large pool of software engineers, the concentration of researchers working on core AI model architecture is relatively small. Many top Indian AI researchers end up working abroad or for foreign companies (brain drain to Silicon Valley, etc.). The kind of interdisciplinary AI research culture needed to build new models (with experts in neural networks, distributed computing, data curation, etc.) is still developing in India’s universities and institutes. So the talent has been more focused on implementing AI solutions (like building a chatbot for a bank using existing models) rather than fundamental model innovation. This is slowly changing with initiatives at IITs and the AI research centers being set up, but it’s a factor.
-
Industrial Orientation: India’s tech industry historically grew around IT services – companies like TCS, Infosys, Wipro made their fortunes doing software development and support for clients, not necessarily doing proprietary R&D on new tech. This DNA persists; large Indian tech firms have been implementers rather than inventors in many tech domains. As a result, there’s been a mindset of “let’s use what’s available (open source or licensed models) to build useful products” instead of “let’s create a new AI model from ground up.” Startups in India, too, often aim to apply AI in fintech, e-commerce, etc. – very few attempt building a new model given the capital needed.
-
Government Priorities: The Indian government’s digital initiatives (Digital India, etc.) have been more about deploying technology for citizen services (like using AI in DigiLocker, or in analyzing satellite imagery) rather than fundamental tech R&D. There have been some plans (like a proposal to create an “AIRAWAT” AI compute cloud, and the IndiaAI program to possibly develop a multilingual model), but until recently, these haven’t gotten the kind of funding that, say, ISRO gets for space missions. In contrast, China’s government heavily funded AI research and called for national champions in semiconductors and AI algorithms as strategic goals . India’s approach has been more hands-off, expecting the private sector to lead and the government to play facilitator. Consequently, there wasn’t a concerted push to make a foundational model in India.
Key Challenges Preventing an Indian DeepSeek: If we ask “why hasn’t an IIT or Infosys built something like DeepSeek?,” the challenges include:
-
Infrastructure & Compute: Building an LLM like DeepSeek requires access to supercomputing infrastructure with thousands of GPUs or TPUs running for weeks. India has some supercomputers (like the PARAM series, and a new AI supercomputer called AIRAWAT that’s ranked ~136th globally ), but they are not at the scale of what OpenAI or DeepMind use. Until very recently, no Indian institution had a dedicated AI training cluster of the top caliber. The government has now planned an HPC AI infrastructure with 10,000 GPUs as part of the IndiaAI mission , but implementing that will take time. Without this, Indian researchers simply didn’t have the raw compute to attempt such models.
-
Funding & Risk Appetite: Training a cutting-edge model is expensive (tens of millions of dollars). In India’s startup ecosystem, raising that kind of capital for pure research is hard. Investors prefer products with a clear market. Also, the government’s research grants in AI were relatively modest. For comparison, the French government and EU funded the BigScience project (which made the BLOOM model) with significant resources, treating it like a moonshot. India hasn’t yet allocated an equivalent “moonshot” budget for an AI model (though there are voices urging a change). Moreover, DeepSeek shows that even a startup can do it for $6M due to efficiency – but to have the confidence to even try, one needs a certain boldness. Chinese founder Liang had that vision and possibly some state support. Indian entrepreneurs might have been deterred by the prospect of competing with Google or OpenAI with so little.
-
Data & Language: One might think India has an advantage with data, being a populous country with many languages. But curating a massive high-quality dataset for training a model is non-trivial. Much of the open text data for training LLMs is in English or Chinese. India’s internet content in Hindi or Tamil, etc., is far less, and often bilingual (code-mixed) or not as extensively digitized. So an Indian model would likely train on a lot of English content too (like others). Indian-specific knowledge and vernacular understanding can be achieved by fine-tuning existing models or training medium-sized models specifically for those languages (which some initiatives like AI4Bharat have done, e.g., releasing BERT-like models for Indian languages). The incremental benefit of a wholly new model vs. using global models and fine-tuning might have seemed low for the effort required. So focusing on building applications (like translation systems, voice bots, etc.) with available tech was seen as a more direct way to help Indian users.
-
Market Incentives: The global market for foundational models is somewhat saturated by the big players (who also often open-source versions). An Indian company building one would need to compete globally to monetize it (since Indian companies or government could simply use the readily available ones). Without strong IP protection or some edge, it would be a tough sell. This dynamic may have dissuaded efforts – why build something that will be freely available from Big Tech by the time you finish? DeepSeek defied this by being first in a new efficiency class, but that’s a risky bet that not everyone can pull off.
-
Historical Precedents: Historically, India has occasionally lagged in developing core technologies domestically for similar reasons – for instance, in semiconductors, India never built a robust chip fabrication industry and chose to import or design chips elsewhere, focusing on software instead. In software products too, India was slow to create global products (though that is changing now with SaaS companies). The pattern often comes down to initial conditions and ecosystem – once you’re strong in one part of the value chain (services, applications), you double down on that, and it’s hard to pivot to a different part (like fundamental R&D) without a deliberate push.
Historical Parallels: To understand India’s positioning, a few analogies:
-
The IT Services vs. Product parallel: In the 90s and 2000s, India became an IT powerhouse by providing services (outsourced programming, BPO, etc.), but did not create a Microsoft or an Apple. We leveraged the global products to deliver solutions. Similarly, in AI, India is using global models to deliver solutions (like AI in e-governance, chatbots for banks, etc.) instead of building the core models. It took decades for Indian firms to start focusing on product innovation (now we have some successful product companies, but still fewer at global scale). AI might follow that trajectory unless deliberately accelerated.
-
The Supercomputer story: As a contrast, when faced with a technology denial (the US blocking Cray supercomputers in the 1980s), India decided to develop its own supercomputers. This led to the PARAM series by CDAC in the early 90s . That was a case where strategic need (weather modeling, nuclear simulations) forced India to build capacity. They succeeded commendably given the era’s constraints – Param was among the top supercomputers of its time and proved India’s capability . This shows that when a specific goal is set and backed by government support, India can create advanced tech domestically. However, in AI, such a clear-cut strategic embargo scenario didn’t happen (there was no ban on India using AI models), so the urgency wasn’t felt. Arguably, reliance on foreign AI could be a future strategic risk (data sovereignty, etc.), but it’s more subtle than an outright denial scenario.
-
The Space program: India’s space program (ISRO) is often cited as a successful indigenous effort – doing things in a cost-effective way (Mars mission at a fraction of NASA’s cost, etc.). One could see a parallel if India tried a “frugal AI moonshot”: DeepSeek itself is a frugal innovation example (like ISRO’s low-cost missions). If India’s ethos of jugaad (frugal innovation) were applied to AI with the right talent, perhaps an Indian DeepSeek is possible. But that requires a mission-like approach and a champion to drive it.
-
The Pharmaceuticals parallel: India is known as the “pharmacy of the world” for generic drugs – taking formulas developed elsewhere and manufacturing them cheaply at scale. In AI, India might end up playing a similar role: once open models exist (like generics), Indian companies can fine-tune and deploy them cheaply for various uses. For instance, an Indian company could take DeepSeek’s open model and create an “AI hotline doctor” for rural healthcare, which is a value-add application. This is analogous to making a generic drug formulation widely available. It’s valuable, but it’s not inventing the drug in the first place. The pharma example shows India’s strength in adoption and scale, but also highlights that original drug discovery (like foundational AI model discovery) has been less of India’s focus.
What should India do to engage in foundational AI research? Many experts in India are now indeed calling for a more active role in foundational models – recognizing that being solely a consumer could compromise long-term autonomy and opportunity . Here are steps and recommendations often discussed (some of which align with what policy think-tanks have suggested):
-
Invest in AI R&D and Infrastructure: The government and private sector need to put significant funding into AI research labs, similar to how they invest in space or defense research. This includes building large-scale computing infrastructure. Positive signs: the mention of a high-performance AI infrastructure with 10,000 GPUs under the IndiaAI Mission is encouraging. India should expedite the creation of such “AI factories,” akin to what Japan and others are doing . With compute in hand, researchers won’t be as constrained. Also, funding programs that specifically support long-term research (5-10 year horizon) on AI algorithms and models at academic institutions will grow the knowledge base.
-
Public-Private Collaboration: A likely successful model is where government, academia, and industry collaborate. For example, the AI4Bharat initiative at IIT Madras, which the NextIAS article cites , is a government-supported academic center that has produced language models for Indian languages (like a model called “Sarvam AI – Sarvam 1,” a multilingual large language model built with Nvidia’s help ). Encouraging more of these collaborations can pool expertise and resources. Indian tech companies can contribute by sharing data or funding, and academia can focus on innovation, with government as an enabler. Essentially, create an ecosystem that mirrors the synergy seen in places like the US (Stanford/MIT working with startups and DARPA, etc.) or China (academia with state funds and tech companies).
-
Define a Strategic Project (AI Mission): Formulate a mission akin to “India’s GPT” or a series of foundational models for specific needs (like an AI for agriculture, AI for defense, etc.). The NextIAS editorial suggests building foundation models for critical areas like national security, healthcare, etc., while using global models for less critical areas . This makes sense – prioritize sovereign models where data sensitivity or strategic autonomy is vital. For example, an AI that the military uses for intelligence analysis should likely be based on a model we fully control, not an API from abroad. So identifying those key domains and funding indigenous models for them is a path forward. Meanwhile, encourage use of open models for general use and contribute improvements back to the community.
-
Leverage Open Models (don’t reinvent unnecessarily): India can take advantage of the fact that many models are open-source. Instead of starting purely from scratch, we can start from something like LLaMA or DeepSeek (if R1 is MIT licensed, Indian researchers can use it as a starting point). This is akin to how ISRO didn’t literally reinvent all rocketry from zero but learned and then innovated on existing concepts. By building on open models, we save time and focus on localization and improvement. For instance, take an open model and train it further on Indian languages, or infuse it with Indian legal and cultural knowledge – essentially create a foundation model “Made for India”. Some work has begun: the mention of Sarvam 1 as India’s first homegrown large multilingual model – if this is already in collaboration with Nvidia, it’s a stepping stone. Scaling such efforts and ensuring they are truly competitive is key.
-
Encourage Advanced Research & AI Education: The pipeline of talent needs to grow. Encourage more students to pursue AI research (maybe scholarships, challenges, high-profile grand prizes for breakthroughs). Establish “Centers of Excellence” for AI in top institutes (which is being done to an extent) and ensure they have compute resources. Also, cross-disciplinary work (neuroscience, cognitive science, etc. feeding into AI) should be encouraged to potentially find novel approaches beyond transformers.
-
Retain and Attract Talent: Curbing brain drain by creating appealing opportunities in India is crucial. If the government or big industrial groups could create an OpenAI-like entity in India (with sufficient funding and freedom to research), some talent might stay or even return from abroad. We have seen signs of this in other fields when big missions are launched (e.g., some Indian scientists abroad returned to work on Chandrayaan moon mission or other high-impact projects).
-
Global Collaboration: India doesn’t have to do it alone. It can partner with countries who have similar goals. For example, a BRICS AI research collaboration or partnering with open-source initiatives in EU (like BigScience 2.0) could share the burden. This way, India contributes to global efforts and benefits from collective advancement. Such collaborations could also involve sharing diverse data for multilingual models, which would help an Indian model as well.
Does DeepSeek’s rise create opportunities or challenges for India?
-
Opportunities: DeepSeek’s open model is itself an opportunity for India. Indian researchers and companies can use DeepSeek today without needing to build a GPT-4 equivalent from scratch. This can accelerate AI adoption in India – for example, startups can incorporate DeepSeek into their products to provide high-end AI features at low cost (since it’s free/open, or cheap to run). It lowers the barrier for Indian innovators who might have been deterred by the cost of using GPT-4 (which has token charges and restrictions). Also, the techniques DeepSeek used (efficiency, MoE, etc.) provide a learning opportunity – Indian scientists can study how they achieved it and replicate or build upon those methods. It basically enriches the pool of knowledge from which Indian AI initiatives can draw. Moreover, the fact that DeepSeek showed how to succeed with limited resources might inspire Indian stakeholders – it aligns well with India’s knack for frugal engineering. This could galvanize support for trying something similar in India, now that we have a proof of concept that it’s possible outside Big Tech.
Also, as global AI becomes more multi-polar, Indian entities might find more room to collaborate or position themselves as neutral players. For instance, if Western companies are wary of using a Chinese model due to security concerns, Indian firms could step in to fine-tune and host an open model in a trusted environment, acting as service providers. If India can establish a reputation for AI expertise in customization and safe deployment, it can be an AI service hub globally (similar to its role in IT services, but up the value chain).
Another opportunity: addressing India-specific needs. DeepSeek or GPT-4, as global models, might not know local languages deeply (especially smaller languages) or local context. India can use them as bases to create AI that better handles local languages/dialects, benefitting its own population. With open models available, doing this is much easier than if one only had closed APIs. For example, an Indian team could take DeepSeek and train it further on a large Hindi corpus to create a top-tier Hindi chatbot, something not currently available at GPT-4 quality.
-
Challenges: DeepSeek’s ascent also poses some challenges for India. Firstly, it intensifies the competitive pressure to catch up. If China surges ahead in AI, it could widen tech gaps in other areas, influencing economic and strategic power. India might feel a greater urgency (and perhaps pressure from allies like the U.S.) to not let China solely dominate AI in Asia. There’s a risk of dependence: if Indian companies just use DeepSeek or Chinese models widely, down the line that could be as problematic as dependency on any single foreign tech. We might replace a U.S. tech dependency with a Chinese one, which has different geopolitical ramifications. For example, if Indian startups build products on DeepSeek and then someday the open model stops getting updated (or a non-open superior Chinese model emerges that they can’t use), they could be at a disadvantage. It’s similar to hardware: you wouldn’t want to rely only on one country’s supply for critical tech.
Security and privacy concerns also apply to India. If Indian users start using the DeepSeek app in large numbers (drawn by its capability and free nature), there could be risks of data being collected offshore. This is why some governments banned it on official devices. India would need to consider guidelines for its own government and sensitive sectors regarding such foreign AI apps. Already India banned TikTok and many Chinese apps in 2020 citing national security (data security) concerns. If DeepSeek becomes very popular, Indian authorities might face a dilemma: it’s a useful tool for people, but do we worry about where queries are going? The mitigating factor is that DeepSeek is open-source – Indian companies can host their own version (so perhaps encourage that, so data stays in country).
Another challenge is that DeepSeek raises the bar of what a successful AI model looks like. If India embarks on building its own model, it now has to meet or exceed DeepSeek’s efficiency or else it might seem pointless. That means Indian efforts must incorporate these latest techniques, which can be challenging if there’s a knowledge lag. Essentially, the competitive benchmark just got tougher – but that can be a healthy challenge if taken positively.
Lastly, there’s the issue of alignment with Indian values. Just as Western observers worry about Chinese bias in DeepSeek, Indian context has its own needs. A model created elsewhere won’t automatically respect all cultural sensitivities or ethical considerations unique to India. For instance, handling of religious content in a diverse country like India can be delicate. An imported model might not be tuned to that, potentially causing social flare-ups if it says something offensive inadvertently. So relying purely on foreign foundation models might force Indian developers to put extra work in aligning them to local values and legal requirements (e.g., hate speech laws, etc.). Developing some homegrown models could bake those considerations in from the ground up.
India’s Past Tech Experiences Informing Present AI Approach: Indeed, looking at past experiences:
-
In nuclear and space tech, India took the route of self-reliance due to strategic compulsions (and succeeded over time).
-
In telecom and computing, India mostly adopted and localized foreign tech (like mobile networks from Ericsson/Huawei, or using Windows/Android OS) rather than developing indigenous versions (though we have some domestic electronics, it’s not predominant).
-
The outcomes vary: self-reliance gave strategic autonomy but took long (ISRO’s success is decades in making), while adoption gave quick consumer benefits but created dependencies. In AI, India has to choose a balance. The consensus emerging is to not be left entirely behind in core AI tech, even if it means playing a collaborative role rather than solo.
In sum, DeepSeek’s rise is a wake-up call and an inspiration for India. It shows that with focused effort, a new player can crack into the top tier of AI – something that could motivate Indian stakeholders to act. It also provides a tool that India can use immediately to advance its own AI deployment. India now has an opportunity to leapfrog by leveraging open breakthroughs like DeepSeek and combining them with its own strengths (like its vast IT workforce and unique data) to both apply AI widely and begin developing some foundational capabilities of its own, particularly for Indian languages and needs. The challenge will be mobilizing the will and resources to do so in a timely manner, lest we remain primarily consumers in the AI revolution.
6. More Retail’s Perspective: Impact of DeepSeek’s Lower API Costs
The retail industry, including players like us, operate in a domain that isn’t highly regulated for AI use, which means adopting a new AI service is technically straightforward. Unlike sectors such as healthcare or finance, there are fewer compliance hurdles or ethical landmines in using AI for tasks like customer service, demand forecasting, or inventory management. This relative freedom puts the focus squarely on practicality and cost. DeepSeek’s drastic API cost reductions have effectively removed price as a barrier for experimentation. What was once a cutting-edge (and expensive) technology is now financially accessible, allowing retail tech teams to pilot advanced AI features without seeking huge budgets.
In theory, More Retail could plug DeepSeek’s language or vision models into its systems (for example, to power an internal chatbot or generate product descriptions) with minimal integration effort, since the technology is delivered via API and open-source frameworks. The low costs and open availability also mitigate vendor lock-in concerns – if it doesn’t work out, we can switch or self-host without sunk cost fallacies. In short, from an IT readiness standpoint, there is little holding us back: the technical implementation is straightforward and the economics now make sense.
That said, we are adopting a wait-and-watch approach despite the favorable technical and cost advantages. The key reason is our strong existing partnership with ChatGPT, which has proven extremely effective for our AI-based quality scans and personalization-at-scale needs. While DeepSeek's technical capabilities are impressive, ChatGPT brings the crucial advantage of being a battle-tested solution that continuously evolves with market needs. It has demonstrated not just technical excellence, but also the commercial resilience and stability that enterprises like ours depend on.
We will continue monitoring DeepSeek's progress and conducting focused pilots, particularly in areas where it might complement our existing AI infrastructure. However, the recent data leak incidents surrounding DeepSeek underscore why technical prowess alone isn't sufficient - we need partners with proven track records in enterprise-grade security, reliability, and support. OpenAI has invested years in building this trust and infrastructure, making it a significantly more dependable choice for critical retail operations.
This isn't just about cost savings - it's about selecting partners who have demonstrated they can scale and evolve with enterprise needs while maintaining the highest standards of security and reliability. While DeepSeek's lower costs are appealing, we'll evaluate its adoption based on evidence of sustained commercial success and enterprise-grade capabilities beyond pure technical performance.
Retailers have seen hype come and go before – whether it was basic chatbots, AR/VR stores, or other trends – and the lesson is to avoid jumping in unless the new tech clearly improves KPIs or customer experience. DeepSeek’s cheaper API makes trying the technology easy, but More Retail will adopt it in full scale only after it proves its ability to continue to evolve and provide a secure reliable infrastructure to build on, beyond the initial wow factor.
In Conclusion: My Perspective
Should businesses pay attention to DeepSeek? Absolutely. For any business involved in AI – whether building AI-powered products or using AI for analytics/customer service – DeepSeek is worth a close look. It represents a new competitive option in the landscape of AI models. If you’re a CTO or product manager, the arrival of DeepSeek means you are no longer limited to the offerings of a few Western companies; you have a potentially equally powerful model that is open-source and cost-effective. This can translate to significant savings and flexibility. For example, a company that spends large sums on API calls to GPT-4 might evaluate whether DeepSeek (self-hosted or via a cheaper provider) can deliver similar results for a fraction of the cost . In some coding tasks or domain-specific tasks, DeepSeek might even outperform the usual models (as seen in some code benchmarks) , which could improve productivity for software firms.
However, businesses should also weigh the considerations: using DeepSeek’s open version requires technical ability to deploy and maintain a large model, and using its hosted service involves sending data to a Chinese entity – which for some might be a non-starter due to compliance (especially in sectors like finance, healthcare, or government). So, the level of attention may vary: tech-savvy organizations and startups might jump on it quickly to experiment (because they can handle the self-hosting or are less worried about data sensitivity), whereas more conservative enterprises will study it, maybe run pilot tests in non-critical workflows, while keeping an eye on how it matures and what security assurances can be made.
Importantly, even if a business doesn’t plan to adopt DeepSeek immediately, they should pay attention because it will likely influence the market prices and features of other AI services. As noted, Alibaba had to cut prices because DeepSeek was so cheap . We might see OpenAI adjusting pricing or offering new tiers in response to open competitors. Anthropic might emphasize Claude’s longer context or different safety angle to differentiate from a raw powerhouse like DeepSeek. Google might incorporate some of DeepSeek’s efficiency ideas into its next releases to stay competitive. All this means more choice and better cost-benefit for businesses. So, from a strategic standpoint, DeepSeek’s emergence gives enterprises more bargaining power and alternative sourcing for AI solutions.
Is it worth using over existing models? If performance parity holds, yes – especially if cost or customizability are priorities. For instance:
-
If you need an AI model that you can deeply customize (fine-tune on your proprietary data, or integrate into your on-premises system for data privacy), DeepSeek as an open model is extremely attractive compared to a closed API like OpenAI’s. You can build your own ChatGPT on top of it, tailored to your domain, without your data ever leaving your control.
-
If cost is a major concern – say you’re a startup running on thin margins or a big company doing millions of AI queries a day – switching to an open model like DeepSeek could cut your costs dramatically . Even accounting for the overhead of running servers yourself or through a cloud, the token cost difference (fractions of a cent vs. multiple cents per thousand tokens) is huge.
-
If your use case involves Chinese language or market, DeepSeek might actually be a better choice because it’s built with Chinese context in mind and might handle local content more adeptly (and possibly with mandated filters which you anyway would need in China). A global company operating in China might need to use a model like DeepSeek or Baidu’s to comply with regulations there.
That said, if you highly value reliability, support, and safety, you might still lean towards established providers for now. OpenAI and others have dedicated support and more extensive content moderation systems – with DeepSeek, you’re a bit on your own (if self-hosting) or relying on a startup that doesn’t yet have a long service track record. Also, consider the evaluation aspect: businesses should run their own tests comparing outputs from DeepSeek vs. GPT-4 vs. Claude on their specific tasks. In many general tasks DeepSeek is equal, but maybe in some niche (e.g., a complex medical question) GPT-4’s additional training data gives it an edge in correctness. Initial independent evaluations have found DeepSeek “as capable, and as flawed, as other current leading models” – which implies there’s no clear winner in quality, so other factors (like cost, openness) might drive the decision.
DeepSeek’s Significance in a Global AI Landscape: The rise of DeepSeek signifies that the global AI landscape is no longer unipolar. For a while, it seemed like nearly all top-tier AI innovations were coming from the U.S. (and a bit from the UK with DeepMind). Now, we have a scenario reminiscent of the space race or supercomputer race: multiple nations at the frontier. This is healthy in many ways – competition spurs innovation, prevents any single entity from having too much control, and encourages diversity in approaches (open vs closed, different architectures, etc.). For the world, it means faster progress and more redundancy (if one model has a flaw or policy that limits it, another can fill the gap).
From an economic perspective, it could lead to lower costs and more standardization. When open models get good, AI capabilities become more of a commodity that many can access, rather than a premium product only a few sell. We might see AI increasingly embedded everywhere as it becomes cheaper. It also puts pressure on AI monopolies: for example, OpenAI had a de-facto monopoly on the most advanced chatbot for much of 2023; in 2024-25, that monopoly is effectively broken by open alternatives and now DeepSeek. A more level playing field can encourage more startups to build on AI (since they don’t fear being locked out by API costs or terms).
However, geopolitically, it introduces new risks and dynamics. The U.S. no longer can assume a big lead; they will likely respond with policies to boost their own AI (the $500B “Stargate” venture mentioned in Futuriom piece suggests large U.S. investments are being conceived ). We might see an AI tech Cold War of sorts, hopefully mainly economic/technological and not military. Countries around the world may have to pick ecosystems (just as some chose U.S. or Chinese for 5G networks). But if models are open, maybe that polarization can be avoided – companies can take the best from anywhere and run it themselves.
For India specifically, as asked, what does it mean and how should we react:
-
Significance for India: DeepSeek’s success underlines that lack of top-notch hardware is not an insurmountable barrier, and that an open collaborative approach can yield results. It also foreshadows that if India doesn’t step up efforts, it could be largely dependent on foreign AI (whether American or Chinese) for advanced capabilities. As highlighted by Indian analysts, relying entirely on foreign models has risks for data sovereignty and strategic autonomy . The government and businesses must recognize that foundational AI capability is becoming as important as, say, having indigenous satellite or missile tech – it’s a layer of technology that others will build on, and not having control over it could be a disadvantage in the long run.
-
Expectations from Policymakers - DeepSeek’s emergence can act as a spur to accelerate the IndiaAI mission. This could mean increasing funding for AI research, creating incentives for private companies to invest in large-scale AI (maybe through public-private partnerships or matching grants), and possibly setting specific goals (like “by 2025, have an Indian open-source LLM that is among top 5 in the world”). They should also update policies on data sharing – building AI needs data, so India could consider initiatives to compile large open datasets in Indian languages or domains for researchers. Additionally, policymakers need to craft a balanced regulatory stance: encouraging open innovation while protecting against misuse. Since China’s example shows government coordination can propel AI, India might consider a more hands-on approach in convening talent and resources to tackle this.
-
Indian Startups and Enterprises: They should seize the opportunity that models like DeepSeek offer. Startups can incorporate these models to build world-class products faster (for example, a startup working on legal AI can fine-tune DeepSeek on Indian case law and have a very capable legal advisor AI without training from scratch). Enterprises can experiment with self-hosting such models for internal use to reduce ongoing costs. At the same time, they should stay mindful of the risks (security, bias) and possibly advocate collectively for more local AI development. We might see Indian IT service giants start offering solutions based on open models – e.g., Infosys or TCS might package an enterprise AI assistant that is powered by DeepSeek under the hood but hosted in India with added security. That could be an offering to clients who don’t want to send data to OpenAI or abroad.
-
Skilling and Talent: The rise of open models means Indian developers can get hands-on experience with cutting-edge AI without needing to join a Google or OpenAI. This is a great time for our tech workforce to upskill in LLM fine-tuning, prompt engineering, and building on open AI frameworks. The more our talent engages, the more innovations can spring from here. It might be worth the government or NASSCOM (industry body) launching programs to train engineers in working with models like DeepSeek, LLaMA, etc., thereby creating a large pool of AI practitioners in the country.
DeepSeek is a pivotal development that breaks the U.S. near-monopoly on top AI models. It shows the power of open-source and efficient engineering in AI. For India, it provides both a tool and a template – an advanced AI we can use, and an example to emulate in pushing our own AI ambitions. Businesses should absolutely track and test DeepSeek as part of their AI strategy, as it may offer new advantages. And at the strategic level, India should recognize that the AI playing field is evolving fast: it’s time to actively participate in shaping it, not just as consumers but as contributors. That means doubling down on foundational AI research, fostering collaborations, and ensuring that the next breakthrough AI model might just as well come from a lab in Bengaluru or Mumbai as from Silicon Valley or Hangzhou. By embracing these changes proactively, India can turn DeepSeek’s rise from a potential challenge into a catalyst for its own AI journey – ensuring we don’t miss the bus in this critical technology epoch, but help steer it.
Updates (2025-02-16):
Between the time I wrote this and published this, being caught up in work, there have been several updates already....I pity those who talk about AI :)
Please see some key ones below:
Adoption
-
Adoption, Partnerships & Market Position: In the auto sector, multiple manufacturers have inked partnerships to embed DeepSeek’s models in next-gen vehicles. General Motors’ China JV (SAIC-GM) announced deep integration of DeepSeek AI into Cadillac and Buick infotainment systems for more human-like voice assistance. Mercedes-Benz’s Smart division (via its JV with Geely) will likewise deliver over-the-air updates enabling DeepSeek’s AI in upcoming electric models. Even Nissan’s Chinese joint venture is launching the N7 EV sedan with DeepSeek-R1 as the brains of its voice assistant – the first non-Chinese OEM to do so. Beyond automotive, DeepSeek’s own AI assistant app briefly became the top-rated free app on the US App Store, and enterprise pilots are underway in sectors like finance and software development (indicative of growing global interest). Industry leaders have taken note: OpenAI’s CEO Sam Altman lauded DeepSeek’s model as “impressive,” welcoming the competition, while others highlight the geopolitical significance of a Chinese open-source model reaching parity with Western counterparts.
-
Open-Source Ecosystem & API Updates: Because DeepSeek makes its model weights openly available, developers worldwide have built thousands of variant models based on its architecture within weeks of release. This has led to a wave of innovation (and some concerns about quality control), essentially “open-sourcing” cutting-edge AI in a way that Western firms have been reluctant to do. The company’s API has also evolved through community feedback: context caching was introduced to lower costs for repeated prompts, and the pricing model differentiates cache hits vs misses to reward frequent reuse. After an initial free trial period and steep promotional discounts (which ended on Feb 8, 2025), DeepSeek’s API pricing stabilized at still-low rates. Notably, some third parties have taken to self-hosting DeepSeek – for instance, India’s Krutrim AI now locally hosts DeepSeek-R1 on domestic servers to serve developers at ₹1 per million tokens, highlighting the global enthusiasm and community-driven adoption of DeepSeek’s models.
Deep Vision VL: Multimodal Vision-Language Model
Official Capabilities: Deep Vision VL (DeepSeek-VL) is DeepSeek’s open-source vision-language model designed for rich multimodal understanding. It can comprehend and generate responses from complex visual inputs combined with text, functioning as a multimodal AI assistant. Specifically, DeepSeek-VL can process a wide range of visual information – from logical diagrams and webpages to scientific charts, formulas, and natural images. It integrates an advanced vision encoder with the language model, enabling it to analyze high-resolution images (up to 1024×1024 pixels) while maintaining efficient token usage. The model demonstrates capabilities in visual question answering, optical character recognition (OCR), document and chart understanding, and visual grounding (identifying objects or regions described in text). For example, it can interpret a technical diagram or a photographed table and answer questions about them, describe the content of an image in detail, and follow instructions that reference both text and visual elements. DeepSeek-VL is available in at least two sizes (around 1.3B and 7B parameters), with the larger version offering more accurate and context-aware understanding. Both versions have been open-sourced to encourage broad adoption and further innovation on top of the model.
Janice Pro: Advanced Image Generation Model
Capabilities & Performance: Janice Pro (also referenced as Janus-Pro) is DeepSeek’s latest image generation model, representing a significant leap in multimodal AI by combining visual creativity with language understanding. It’s essentially DeepSeek’s answer to models like OpenAI’s DALL-E 3 and Stability AI’s Stable Diffusion – and by DeepSeek’s account, Janice Pro outperforms both on key image-generation benchmarks. The model (Janus-Pro-7B) was unveiled in late January 2025 as an upgrade over the earlier Janus model. Despite a relatively modest parameter count (~7 billion), Janice Pro produces highly detailed and visually appealing images from text prompts, surpassing the quality of many existing open-source generators. In a technical report, DeepSeek noted that Janice Pro achieved the top ranking on an industry leaderboard for text-to-image generation, thanks to improvements in image stability and fidelity. This was accomplished by augmenting the training dataset with 72 million high-quality synthetic images and carefully balancing them with real-world photos. The result is that Janice Pro can generate images that are not only sharp and richly detailed, but also more consistent (fewer glitches or bizarre artifacts) even for complex or lengthy prompts.
India’s AI Sector: Key Updates
-
Government Initiatives & Funding: In mid-February 2025, the Ministry of Electronics and IT (MeitY) launched a call for proposals to build indigenous foundation AI models as part of the national IndiaAI Mission. This program invites startups, research groups, and companies to develop large AI models (including language, vision, and multilingual models) trained on Indian data, with an ambitious 9–12 month timeline for completion once projects are approved. To back such efforts, substantial funding and infrastructure are being lined up. The Union Budget 2025–26, announced in early February, earmarked a ₹20,000 crore (≈$2.3 billion) “Deep Tech Fund” to spur next-generation startups and research, with AI development prominently in scope. This is in addition to specific allocations like ₹500 crore for a new AI Centre of Excellence in education and the ongoing $1.2+ billion IndiaAI program dedicated to AI R&D and compute infrastructure. Notably, the IndiaAI Mission itself is backed by ₹10,372 crore (≈$1.25 billion) and aims to deploy 10,000 high-end GPUs across India in the next two years, establishing a distributed national AI compute grid. By June 2025, the government expects to have cloud-accessible AI compute available domestically, following the empanelment of Indian data center and cloud providers to offer AI hardware on demand. This public-private cloud infrastructure is intended to make AI computing a “digital public resource” – accessible to startups, academia, and government projects at subsidized rates. Collectively, these initiatives signal a strategic push: India is funding home-grown AI capabilities (both models and machines) to reduce reliance on foreign AI and to leverage its vast pool of talent and data.
-
We’re also seeing international cooperation: On Feb 11, 2025, India co-chaired the AI Action Summit with France, underlining a joint commitment to AI governance and innovation (India will host the next summit). The India-France AI Declaration (Feb 12) outlined plans for collaborative research on AI’s societal impact and stressed building inclusive AI frameworks that benefit the Global South. Within India, major tech companies like Reliance Jio are partnering with global firms (e.g. Nvidia) to build large AI data centers and develop India-specific foundation models trained in local languages. This is complemented by policy discussions about creating a supportive ecosystem for AI development – from easier data access to possible regulatory sandboxes for AI testing. There’s also a narrative of AI as the next DPI (Digital Public Infrastructure): just as India did with UPI for payments, it is exploring open foundational AI services that many startups can build on. In terms of India’s role in global AI, there’s a clear shift: from primarily being an AI talent exporter or consumer of foreign models to becoming a co-creator of foundation models and a regional AI compute hub. This period since early February 2025 has thus been marked by optimism and concrete steps – hefty funding, missions, and partnerships – all aimed at propelling India into the upper echelon of AI-producing nations, with an emphasis on self-reliance and ethical, inclusive AI.
Sources:
-
Reuters – “Alibaba releases AI model it says surpasses DeepSeek”
-
CSIS – “DeepSeek’s Latest Breakthrough Is Redefining AI Race”
-
China Briefing – “China’s DeepSeek and its Open-Source AI Models”
-
Futuriom – “Why DeepSeek’s Rise Could Be Temporary”
-
DeepSeek Official Site – Model Benchmark Table
-
The Guardian (commentary) – via quote
-
NextIAS (Current Affairs) – “Need for India’s Sovereign AI Model”
-
Wikipedia – “Supercomputing in India” (PARAM story)

Ankit Goel
Chief Product And Technology Officer @ More Retail
(This article is AI-assisted. Views are personal)