The State of LLM Adoption in Defense

Introduction

Large language models are reshaping defense operations across intelligence analysis, operational planning, and command-and-control — backed by billions in new spending and programs that have moved from pilot to production in under three years, with the DoD doubling its AI budget to $1.8 billion by fiscal year 2024.

More than 70 countries and territories have published national AI strategies tracked by the OECD AI Policy Observatory, many with explicit defense applications. The U.S. Department of Defense requested $874 million for AI research, development, test, and evaluation in fiscal year 2022. By fiscal year 2024, that figure had climbed to $1.8 billion — more than doubling in two budget cycles. Large language models now command increasing investment shares across intelligence analysis, planning, and autonomous systems.

Intelligence Analysis and Report Generation

U.S. intelligence agencies have deployed LLM-powered tools at scale, with the CIA’s OSIRIS platform now serving thousands of analysts across all 18 intelligence community agencies for open-source intelligence triage and synthesis, while Microsoft built an air-gapped GPT-4 system handling Top Secret data.

The CIA entered what its officials call a “scale phase” of AI adoption. The agency developed OSIRIS, a generative AI platform that uses large language models to synthesize and present open-source intelligence, providing summaries and enabling analysts to interact with vast data volumes through a chatbot interface. Lakshmi Raman, the CIA’s Director of AI, described how agents use the technology to “classify and triage open-source events” and support “search and discover and do levels of natural language querying.”

Microsoft deployed an air-gapped GPT-4 model for the intelligence community — the first major LLM separated from the internet — designed to handle Top Secret data for up to 10,000 analysts across the IC, including NSA, FBI, and military-run agencies.

The UK’s Centre for Emerging Technology and Security (CETaS) at the Alan Turing Institute, commissioned jointly by the Joint Intelligence Organisation and GCHQ, published research on LLMs and intelligence analysis based on seven months of primary research across UK assessment bodies. The report found AI capable of identifying patterns, trends, and anomalies at speeds human analysts cannot match — while also cautioning that AI introduces new dimensions of uncertainty into intelligence assessments.

Defense contractors built dedicated platforms. Palantir’s Artificial Intelligence Platform (AIP) integrates large language models to allow operators to query real-time classified and unclassified data through natural language, with access controls enforcing classification markings and need-to-know restrictions. Scale AI partnered with agencies on domain-specific models trained on intelligence datasets. Singapore-based DLRA applies retrieval-augmented generation to maritime signals intelligence through its Threat Lens platform.

The National Geospatial-Intelligence Agency committed $708 million over seven years to train AI-driven computer vision systems that process satellite imagery and identify targets of interest, awarded to startup Enabled Intelligence in November 2025. NGA also launched the A-GAIM accreditation pilot to evaluate the methodology and robustness of GEOINT AI model development and testing.

Operational Planning and Decision Support

RAND Corporation research found AI capable of supporting mission planning in dynamic threat environments, while NATO integrated generative AI into coalition wargames including Red Dragon and Sentinel Vanguard, and the U.S. Navy invested $448 million in AI-driven autonomous maritime operations.

RAND’s 2024 study Understanding the Limits of Artificial Intelligence for Warfighters: Volume 5, Mission Planning examined AI-assisted route planning for air missions penetrating complex air defense environments. The researchers concluded that AI is “capable of helping out in some planning roles” and that applying it “will build capacity, experience, and user trust for future AI use” — though they emphasized AI requires dedicated simulation infrastructure investment to deliver results.

The U.S. Army’s Futures Command explored LLM-based planning tools processing doctrinal literature, historical cases, and operational data. The Navy invested $448 million in AI and autonomy programs to support shipbuilding and operational capabilities, while establishing an enlisted Robotics Warfare Specialist rating in 2024 to build a dedicated autonomous systems workforce.

NATO’s Allied Command Transformation integrated generative AI and large language models into wargames including Red Dragon and Sentinel Vanguard exercises to enhance decision-making and scenario realism. ACT also developed Multi-Domain Operations AI in partnership with the NATO Communications and Information Agency — a system that uses AI algorithms to process NATO doctrine and real-time operational data, allowing operators to ask complex questions in natural language and receive validated, actionable answers.

NATO released its revised AI strategy in July 2024, stating: “It is vital for NATO to use these technologies, where applicable, as soon as possible.” The strategy endorses six Principles of Responsible Use — Lawfulness, Responsibility and Accountability, Explainability and Traceability, Reliability, Governability, and Bias Mitigation — and directs member states to integrate AI into allied capabilities through the NATO Defence Planning Process.

Autonomous Systems and C2 Integration

The Air Force plans to spend more than $8.9 billion on Collaborative Combat Aircraft from fiscal years 2025 to 2029, fielding AI-powered autonomous wingmen that operate alongside manned fighters without continuous human supervision, with prototypes from General Atomics and Anduril already completing maiden flights.

DARPA’s Squad X Experimentation program demonstrated how AI teammates could collaborate with infantry squads using autonomous sensing and decision support. Field tests at the Air Ground Combat Center in Twentynine Palms paired U.S. Marines with unmanned air and ground systems, with contractors CACI and Lockheed Martin delivering operational prototypes.

The Air Force’s Collaborative Combat Aircraft (CCA) program represents the largest investment in LLM-adjacent autonomous systems. General Atomics’ YFQ-42A completed its maiden flight in August 2025, followed by Anduril’s YFQ-44A in October. Collins Aerospace and Shield AI are integrating a government-owned Autonomy Government Reference Architecture onto both platforms. The Air Force plans to field at least 1,000 CCAs carrying out strike, reconnaissance, electronic warfare, and decoy missions — with operators setting mission objectives rather than manually flying each aircraft.

LLMs provide natural language interfaces to complex command-and-control systems, enabling intuitive human-machine teaming in degraded communications environments. The integration allows autonomous platforms to receive mission commands, query status, and coordinate actions without specialized operator training.

Security and Trustworthiness Challenges

Prompt injection remains the number-one LLM security vulnerability according to OWASP, and a joint study by researchers at OpenAI, Anthropic, and Google DeepMind found that adaptive attacks bypassed all 12 tested defenses at rates above 90 percent, while NIST expanded its taxonomy to cover generative AI threats.

In October 2025, researchers from OpenAI, Anthropic, and Google DeepMind published “The Attacker Moves Second,” testing 12 published LLM defenses that originally reported near-zero attack success rates. Using adaptive attack techniques, the researchers achieved bypass rates above 90 percent on all 12 defenses. Prompt-based defenses fell at 95 to 98 percent success rates; training-based defenses that originally reported below 5 percent attack success saw rates climb to 96 to 100 percent under adaptive conditions.

“DOD cannot maintain its competitive advantage without transforming itself into an AI-ready and data-centric organization, with RAI [Responsible AI] as a prominent feature.”

— Deputy Secretary of Defense Kathleen Hicks, DoD Responsible AI Strategy and Implementation Pathway

NIST published the updated AI 100-2 E2025 report in March 2025, expanding its adversarial machine learning taxonomy to cover generative AI threats including direct prompt injection, indirect prompt injection, and AI agent vulnerabilities. The report — authored jointly by NIST, Northeastern University, Cisco, the UK AI Security Institute, and the U.S. AI Safety Institute — classifies supply chain attacks, direct prompting, and indirect prompt injection as the three primary attack vectors against generative AI.

Model hallucinations pose distinct operational risks. An LLM-generated intelligence assessment stating incorrect information confidently could lead to misallocated resources or missed threats. The intelligence community established red-teaming programs targeting hallucination risks before operational deployment.

Data classification requirements create deployment complexity. Many defense LLM applications require classified network operation without internet connectivity. Microsoft’s air-gapped deployment for the intelligence community addresses this constraint, though disconnected models sacrifice continuous improvement benefits.

The Path Forward

The global AI-in-military market reached $9.31 billion in 2024 and is projected to hit $19.29 billion by 2030 at a 13 percent CAGR, while the Stanford AI Index shows global AI publications nearly tripled from roughly 102,000 in 2013 to over 242,000 by 2023.

According to Grand View Research, the global artificial intelligence in military market was valued at $9.31 billion in 2024, with North America commanding 32.8 percent market share. The market is projected to reach $19.29 billion by 2030, growing at a CAGR of 13.0 percent.

The Stanford HAI AI Index 2025 report documents that between 2013 and 2023, total AI publications across computer science and related disciplines nearly tripled — from approximately 102,000 to over 242,000. Industry produced nearly 90 percent of notable AI models in 2024, up from 60 percent in 2023, while academia remained the leading source of highly cited research.

International cooperation accelerates through structures like AUKUS Pillar 2, where the United States, United Kingdom, and Australia identified AI as critical to future military capability. In 2023, the UK hosted the first AUKUS AI trial connecting allied drones in real-time to share sensor data and retrain AI models in flight.

The DoD’s Chief Digital and Artificial Intelligence Office published responsible AI guidelines built on six tenets: RAI governance, warfighter trust, AI product and acquisition lifecycle, requirements validation, responsible AI ecosystem, and AI workforce. The 2026 AI Acceleration Strategy directs CDAO to establish a delivery cadence enabling the latest AI models to be deployed within 30 days of public release.

Conclusion

Defense LLM adoption has crossed the threshold from experimentation to operational deployment, with the CIA running generative AI across all 18 intelligence agencies, the Air Force committing nearly $9 billion to autonomous wingmen, and NATO embedding language models into coalition wargames and planning tools.

LLM adoption in defense has moved from experimentation to operational deployment across intelligence, planning, and command-and-control functions. The CIA runs generative AI at scale across all 18 intelligence agencies. The Air Force is spending nearly $9 billion on autonomous wingmen. NATO is embedding LLMs into wargames and coalition planning tools.

Security challenges remain unresolved. Joint research from the three leading AI labs proved that every published defense against prompt injection can be bypassed by adaptive attackers. The next phase depends on whether red-teaming, evaluation frameworks, and allied coordination maintain pace with accelerating commercial AI capability.

Comparison: LLM Deployment Models in Defense

Defense organizations deploy LLMs across five distinct models — cloud-connected commercial APIs, private government clouds, air-gapped on-premise installations, federated allied networks, and domain-specific fine-tuned systems — each balancing classification level, update frequency, and latency against operational needs and the security constraints of handling sensitive military data.

Deployment Model	Description	Classification Level	Latency	Update Cycle	Example
Cloud-connected	Commercial LLM via secure cloud API	Unclassified / CUI	Low	Continuous	AWS GovCloud deployments
Private cloud	Dedicated instance in government cloud	Secret	Low-Medium	Periodic	Microsoft air-gapped GPT-4 for IC
Air-gapped on-premise	Model hosted on isolated classified network	Top Secret / SCI	Medium	Manual update	CIA OSIRIS platform
Federated	Distributed training across allied networks	Coalition shared	Variable	Coordinated	AUKUS AI trial sensor fusion
Domain-specific fine-tuned	Base model fine-tuned on defense corpora	Varies	Low	Retraining cycle	NGA Sequoia imagery models

FAQ

1. What are the primary use cases for LLMs in defense organizations?

Intelligence analysis and report generation, operational planning support, command-and-control integration with autonomous systems, open-source intelligence triage, and natural language interfaces for complex military systems. The CIA uses LLMs for analyst workflow acceleration, while NATO integrates them into coalition wargames and multi-domain operations planning.

2. Which defense organizations have publicly disclosed LLM programs?

The CIA disclosed its OSIRIS generative AI platform and AI scale phase. Microsoft confirmed an air-gapped GPT-4 deployment for the intelligence community. NGA awarded a $708 million AI training contract. NATO’s Allied Command Transformation integrated LLMs into wargaming. The Air Force’s CCA program incorporates AI autonomy packages built by Collins Aerospace and Shield AI.

3. What are the main security concerns with deploying LLMs in defense environments?

Prompt injection attacks, model hallucinations in high-stakes assessments, data classification handling, adversarial manipulation, and supply chain risks. A 2025 study by OpenAI, Anthropic, and Google DeepMind researchers found adaptive attacks bypassed all 12 tested LLM defenses at rates above 90 percent. NIST’s AI 100-2 E2025 report classifies prompt injection as a primary GenAI attack vector.

4. How do defense LLMs differ from commercial deployments?

Defense deployments typically require air-gapped operation on classified networks without internet access, need-to-know access controls enforced at the model level, classification marking integration, and formal accreditation processes like NGA’s A-GAIM pilot. Commercial models must be adapted for disconnected operation, which limits continuous learning and update cycles.

5. What is the current market size for defense AI and LLM applications?

Grand View Research valued the global AI-in-military market at $9.31 billion in 2024, projecting growth to $19.29 billion by 2030 at a 13.0 percent CAGR. North America held 32.8 percent market share. The DoD requested $1.8 billion specifically for AI R&D in fiscal year 2024, part of a broader $25.2 billion allocation for programs incorporating AI and autonomous systems in fiscal year 2025.

6. Are there successful LLM deployments in allied nations?

Yes. The UK’s GCHQ and Joint Intelligence Organisation commissioned CETaS research on LLMs for intelligence analysis. NATO integrated generative AI into Red Dragon and Sentinel Vanguard wargames. AUKUS partners conducted the first trilateral AI trial in 2023, connecting U.S., UK, and Australian drones for real-time AI model sharing. NATO’s revised 2024 AI strategy directs all member states to integrate AI through the Defence Planning Process.