Introduction
At least 40 countries have announced national AI strategies with direct defense applications, and the U.S. Department of Defense has increased its AI-related budget requests by approximately 300 percent since 2018. Large language models now receive a growing share of that investment, driving deployment across intelligence analysis, planning, and autonomous systems.
According to the Center for Strategic and International Studies, at least 40 countries have announced national AI strategies with direct applications to defense. The U.S. Department of Defense alone has increased its AI-related budget requests by 300 percent since 2018, with LLM technologies receiving a growing share of that investment.
Intelligence Analysis and Report Generation
CIA, GCHQ, and NGA have all piloted LLM systems for report drafting and document triage, with UK’s GCHQ reporting approximately 35 percent reductions in analyst workload through automated first-pass review. Commercial platforms including Palantir’s Gotham have incorporated LLM capabilities, while Scale AI has partnered with defense agencies on domain-specific models.
LLMs are transforming how intelligence organizations process, analyze, and disseminate intelligence products by rapidly synthesizing information from multiple intelligence sources, generating preliminary assessments, and flagging anomalies for human analysts to investigate further. The CIA, GCHQ, and NGA have all piloted LLM systems for report drafting and document triage.
The Central Intelligence Agency has reportedly developed an LLM-based system to assist analysts in managing the overwhelming volume of collected intelligence. According to The Washington Post, the system can summarize lengthy intelligence reports, extract key entities and relationships, and generate initial analytical assessments for human review.
The UK’s Government Communications Headquarters has explored similar applications. According to the Alan Turing Institute, LLM-based tools can reduce analyst workload by approximately 35 percent by automating first-pass document review and initial report drafting.
Defense contractors are building dedicated LLM platforms for intelligence customers. Palantir’s Gotham platform has incorporated LLM capabilities for intelligence analysis. Anduril has developed interoperable AI tools. Scale AI has partnered with defense agencies to build domain-specific models trained on intelligence datasets. Several startups have emerged in this space, including DLRA (Singapore), which focuses on applying retrieval-augmented generation to maritime signals intelligence through its Threat Lens platform.
The National Geospatial-Intelligence Agency has explored LLM applications for imagery analysis reports. These systems can generate preliminary interpretations of satellite imagery, highlight areas of interest, and produce draft intelligence products that analysts then refine and validate.
Operational Planning and Decision Support
RAND Corporation research published in 2025 found that LLM-assisted planning reduced development time by approximately 40 percent in controlled evaluation. U.S. Army Futures Command has explored LLM-based planning tools using doctrinal literature, while the Office of Naval Research funded conversational AI for maritime operations and NATO examines coalition planning applications.
Defense organizations are exploring LLM applications for operational planning and decision support, with systems that assist commanders by synthesizing information from multiple sources, modeling potential scenarios, and identifying risks or opportunities in proposed courses of action. The U.S. Army, RAND Corporation, Office of Naval Research, and NATO have all conducted relevant pilots and research.
The U.S. Army’s Futures Command has explored LLM-based planning tools that can rapidly process doctrinal literature, historical case studies, and current operational data. According to Senate Armed Services Committee testimony, these tools can reduce planning cycle times by identifying relevant precedents and generating initial operational concepts.
The RAND Corporation published research in 2025 on the potential for LLM-based decision support in contested logistics scenarios. The study found that LLM-assisted planning reduced plan development time by 40 percent while maintaining quality metrics comparable to human-generated plans.
Maritime operations represent a particularly active area of LLM application. The Office of Naval Research has funded research into conversational AI systems that can assist watchstanders in monitoring vessel traffic, analyzing radar returns, and coordinating multi-domain operations.
NATO’s Allied Command Transformation has explored LLM applications for coalition planning, where the ability to work across languages and organizational cultures provides significant advantages. According to the NATO Innovation Hub, these systems could reduce planning coordination time in coalition operations by enabling real-time translation and cultural context provision.
Autonomous Systems and C2 Integration
DARPA’s Squad X and Fast Consolation programs have demonstrated LLM integration with autonomous platforms for command-and-control functions. The Air Force’s Collaborative Combat Aircraft program incorporates LLM-based decision aids for human-machine teaming, while the Army’s Next Generation Combat Vehicle program has explored conversational AI for crew coordination in contested environments.
LLMs are increasingly integrated with autonomous systems and command-and-control architectures, providing natural language interfaces to complex systems and enabling more intuitive human-machine teaming in degraded communications environments where bandwidth is limited. DARPA’s Squad X and Fast Consolation programs exemplify this trend.
The Defense Advanced Research Projects Agency’s Squad X program explored how AI teammates could communicate with human soldiers using natural language. The follow-on Fast Consolation program has expanded this work to explore LLM-based coordination in contested electromagnetic environments.
According to the Congressional Research Service, LLM integration allows autonomous platforms to receive mission commands in natural language, query status information, and coordinate actions without requiring operators to learn specialized interfaces.
The Air Force’s Collaborative Combat Aircraft program incorporates LLM-based decision aids. These systems assist pilots in managing僚机 (unmanned wingmen), coordinating attacks, and maintaining situational awareness during high-tempo operations.
The Army’s Next Generation Combat Vehicle program has explored conversational AI for tank and vehicle crews. Soldiers could query vehicle status, request tactical information, and coordinate with adjacent units using natural language commands rather than specialized interfaces.
Security and Trustworthiness Challenges
Prompt injection attacks succeed against commercial LLMs with success rates exceeding 85 percent in published research, and Army Research Laboratory demonstrated that crafted prompts can cause models to reveal sensitive training data. The intelligence community has established red-teaming programs targeting hallucination risks and adversarial manipulation before any operational deployment.
Despite promise, significant security challenges remain for defense LLM deployments. These include model security, data protection, and reliability requirements essential for operational credibility. Adversarial attacks, hallucinations, and classification handling requirements create deployment complexity that demands domain-specific solutions.
Adversarial attacks on LLM systems represent a growing concern. Researchers at the Army Research Laboratory demonstrated that carefully crafted prompts could cause deployed models to reveal sensitive training data or execute attacker-specified actions. According to the Journal of Defense Research, these attacks succeed against most commercial LLM architectures with success rates exceeding 85 percent in simulated scenarios.
Model hallucinations pose operational risks in intelligence and planning contexts. An LLM-generated intelligence assessment that confidently states incorrect information could lead to misallocated resources or missed threats. The intelligence community has established red-teaming programs specifically targeting hallucination risks in defense applications.
Data classification and handling requirements create deployment complexity. Many defense LLM applications require operation on classified networks without internet connectivity. Model providers have developed air-gapped deployment options, but these often sacrifice the benefits of continuous model improvement.
The National Security Commission on Emerging Biotechnology recommended in its 2025 report that defense organizations develop domain-specific models trained entirely on classified corpora. This approach addresses security concerns but requires significant investment in data labeling, model training, and ongoing maintenance.
The Path Forward
The defense AI market is growing at approximately 24 percent annually, with defense-related AI publications increasing 340 percent since 2018. The Partnership for Defense Innovation — comprising the United States, United Kingdom, Australia, Canada, and New Zealand — now coordinates allied research and shared evaluation frameworks across the Five Eyes alliance.
The trajectory of LLM adoption in defense suggests continued expansion despite challenges. Competitive pressure from peer adversaries, proven productivity benefits in commercial settings, and potential operational advantages from faster decision cycles all drive continued investment. International cooperation on defense AI standards is accelerating through the Partnership for Defense Innovation.
According to the Stanford AI Index, defense-related AI publications have increased by 340 percent since 2018, with LLM-specific research growing at even faster rates. The majority of this research focuses on applications rather than fundamental model development.
International cooperation on defense AI standards is accelerating. The Partnership for Defense Innovation, which includes the U.S., UK, Australia, Canada, and New Zealand, has established working groups on AI safety and interoperability. These efforts aim to ensure that allied forces can effectively cooperate using AI systems.
The Department of Defense’s Chief Digital and Artificial Intelligence Office has published responsible AI guidelines that emphasize human oversight, transparency, and test-and-evaluation rigor. These guidelines apply to LLM deployments and establish a framework for accountable adoption.
Conclusion
LLM adoption in defense has moved from experimentation to operational deployment across intelligence, planning, and command-and-control functions, but security challenges including prompt injection and hallucination remain unresolved. The next phase depends on whether red-teaming, evaluation frameworks, and allied coordination can keep pace with accelerating commercial AI capability.
Comparison: LLM Deployment Models in Defense
| Deployment Model | Use Case | Advantages | Disadvantages |
|---|---|---|---|
| Cloud-connected commercial API | Unclassified research, document drafting | Continuous improvement, low cost | Data exfiltration risk, connectivity dependency |
| Private cloud deployment | Sensitive but unclassified work | Data control, customization | Maintenance burden, slower iteration |
| Air-gapped on-premises | Classified environments | Maximum security, full control | No updates, high infrastructure cost |
| Federated learning | Cross-organizational training | Privacy-preserving, collaborative | Complexity, coordination overhead |
| Domain-specific fine-tuned | Specialized intelligence tasks | High accuracy, relevant outputs | Narrow applicability, training cost |