AI Data Security, DSPM, RAG Permission Governance, and Enterprise Knowledge Base Security

Core Conclusions

Data security is moving from a "compliance afterthought" to the primary control plane for enterprise AI adoption. Microsoft 365 Copilot, Copilot connectors, Azure AI Search, Databricks Unity Catalog, Snowflake Horizon, AWS Bedrock Guardrails, and Google Cloud DSPM are all pushing "data permissions, labels, classification, auditing, and filtering" upstream into the AI call chain, rather than bolting security on after the model. Microsoft states explicitly that Copilot will only return organizational data the user has "at least view permission" to; connectors and Azure AI Search likewise place ACLs, permission filtering, and token-based authorization at the retrieval stage.
Once enterprise AI, RAG, and agents spread, the first thing exposed is not a "model capability" bottleneck but the "over-shared data surface" and the "ungoverned permission surface." Copilot, RAG, and agents do not redefine a new enterprise permission system; they amplify the misconfigured sharing and overly broad authorization already sitting in SharePoint, OneDrive, email, tickets, CRM, databases, data lakes, and knowledge bases. Microsoft, Azure AI Search, Elastic, Databricks, and Snowflake all stress document-level security, row filters, column masks, ACL/DLS, ABAC, and sensitivity labels as prerequisite capabilities in their official documentation.
DSPM, DLP, DDR, data access governance, and RAG permission governance are not parallel sectors; in the AI era they string together into one chain. DSPM finds "where the sensitive data is, who can access it, and whether it is exposed"; DLP intercepts "exfiltration"; DDR detects "anomalous access and movement"; access governance and RAG permission control ensure "only the right people and agents see the right content." Google Cloud DSPM, IBM Guardium, Rubrik DSPM, OpenText, Thales, Broadcom, Cloudflare, and Microsoft Purview are all converging on this chain.
AI agent security is fundamentally a crossover problem of "identity security + data permissions + tool-call governance." AI agents add large numbers of machine and non-human identities; CyberArk reports machine identities already reach 82:1, and its 2025 identity security report names AI as the leading source of new privileged identities; CrowdStrike's $740 million acquisition of SGNL places human, NHI, and AI identities directly inside a continuous identity control framework; Microsoft, Databricks, Cloudflare, and Anthropic all treat the permissions and auditing of agents, MCP, or connectors as core elements.
The first budgets to materialize are not "pure AI security stories" but four categories tightly bound to the existing data surface: Microsoft 365 / SharePoint / OneDrive permission cleanup and classification; multi-cloud / SaaS DSPM; GenAI DLP / prompt DLP; and document-level security for high-value RAG / enterprise search. These budgets connect directly to Copilots, knowledge-base assistants, customer-service assistants, developer assistants, internal search, and compliance auditing already in production.
Platform winners and AI-native challengers will coexist. The platform winners are mainly Microsoft, Google/Wiz, AWS, Snowflake, Databricks, Oracle, IBM, Palo Alto, CrowdStrike, and Cloudflare; the AI-native challengers are mainly Cyera, BigID, Sentra, Concentric AI, Securiti, Privacera, Veza, Noma Security, and Lasso Security. The former rely on distribution, identity, cloud, data platforms, and existing customer bases; the latter rely on stronger data discovery, data graphs, cross-cloud and cross-SaaS coverage, AI-native governance, and faster iteration.
There are not many listed companies that directly benefit with financial validation already in hand. Varonis, CyberArk (acquisition by Palo Alto completed), Snowflake, MongoDB, Elastic, Trend Micro, Cloudflare, CrowdStrike, Palo Alto, Microsoft, and Google Cloud all have clearer product-customer-platform paths; yet many of them do not break out "AI data security revenue" as a line item, and it shows up more as platform expansion, RPO/cRPO, large customer deals, and higher product attach rates.
Companies with a "strong AI data security narrative but insufficient revenue evidence" clearly outnumber the genuinely quantifiable beneficiaries. Cloudflare's AI-SPM / prompt protection, Palo Alto's Prisma AIRS, CrowdStrike's Falcon Data Security, BigID's AI governance, and companies like Noma/Lasso/Prompt Security move fast on product, but most do not publicly break out ARR or revenue contribution; for investment, weight "whether it is embedded in the strong distribution chain of an existing large platform" over release count alone.
The sectors with the greatest revenue elasticity are: multi-cloud DSPM, enterprise knowledge base permission governance, agent data access control, GenAI DLP, and data access governance. These sit closest to existing budgets and are most tightly bound to the real-world adoption of Copilot, internal search, data lakehouses, and externally connected SaaS systems. By contrast, vector database security, AI memory security, privacy-enhancing computation, and confidential computing remain proof-of-concept or specialized purchases at many enterprises.
The best long-term margins live not in scanning itself but in the "control plane." Specifically: the permission graph, label/classification engine, policy engine, retrieval authorization, access auditing, key and encryption orchestration, and a unified AI Gateway/Policy plane. These layers are inherently more software-like and platform-like, and easier to reuse across multiple use cases. Databricks Unity Catalog, Snowflake Horizon, Microsoft Purview, Oracle Deep Data Security, and Thales CipherTrust all reflect this.
The most bubble-prone valuations are: high-growth security platforms that talk "AI security" without clear revenue attribution; and the "high funding + low transparency" names among private AI-native security companies. Cyera has already risen to a $9 billion valuation in early 2026, and Databricks' Series K in 2025 valued it above $100 billion; the fundamentals of such names may be excellent, but the market has already paid for a great deal of expectation in advance.
Platform squeeze will be very pronounced. Google Cloud has already built DSPM into Security Command Center; Microsoft binds Purview, Copilot, Graph connectors, and Azure AI Search into one whole; Snowflake, Databricks, and Oracle are also unifying vector retrieval, catalog, labels, permissions, auditing, and AI Gateway into the data platform. Standalone tools that cannot deliver better graphs, precision, and remediation across cloud, SaaS, and data surfaces will be absorbed by the platforms.
Those at the highest risk of being disrupted are traditional point DLP, static data governance, and weak-graph compliance tools. Broadcom Symantec DLP, legacy email/endpoint DLP, tools that do catalog only without runtime access control, and tools that produce only passive compliance reports will all be squeezed by the platform-style unified "classification + permission + retrieval + logging + policy" control plane.
The biggest catalysts over the next 12–24 months are: production deployment of in-house Copilots and agents; product integration after Wiz is consolidated into Google Cloud; cross-selling of the AI data protection components of Palo Alto / CrowdStrike / Cloudflare; the native permissioning of Snowflake Horizon and Databricks Unity Catalog for agents/RAG; and a continued wave of large data security M&A.
The biggest risks over the next 12–24 months are: enterprise AI projects falling short on going into production; customers prioritizing models and compute over governance; platform-native features descending in price or going free; insufficient data classification precision causing false blocks; and cross-border data and AI regulation tightening in tandem. The EU AI Act will become fully applicable on August 2, 2026 (with exceptions for some provisions), the U.S. DOJ's bulk sensitive data rule has taken effect, and China's cross-border data rules continue to tighten.

Value Chain Landscape and Demand Re-rating

The reason data security becomes the core bottleneck once enterprise AI, RAG, and agents are deployed is not that enterprises suddenly "care more about security," but that AI turns what used to be dispersed, static, low-frequency data access into high-frequency, cross-system access amplified by reasoning. Microsoft states clearly that Copilot uses a user's files, email, chats, meetings, and other context through Microsoft Graph; as long as the user has permission, Copilot will use this content as grounding data. For enterprises, this means files that were historically "shared with too many people but never actively sought out" become highly reachable data that "a single natural-language question can surface and summarize."

RAG amplifies this problem further. Microsoft 365 Copilot connectors and Azure AI Search both make "document-level permissions" a core mechanism; Databricks Unity Catalog recommends using ABAC for centralized row/column filtering; Snowflake uses row access policies, dynamic masking, and tag-based masking; Elastic provides document-level / field-level security; Oracle's 26ai introduces Deep Data Security, binding row/column/cell-level access control directly to agentic AI. This is itself an industry signal: if permission inheritance were not a hard problem, platforms would not be rolling out these controls so densely. The difficulty here is that embedding, indexing, retrieval, rerank, and tool use often happen outside the original data, so ACLs, labels, identities, group information, and sensitivity metadata must be synchronized explicitly—otherwise the vector index becomes a "second data surface" detached from source-system authorization. This judgment is an industry inference based on each platform's product design.

The risk introduced by AI agents is more complex than RAG. RAG is mainly "read"; agents may "read + write + call tools + call APIs + persist memory + execute autonomously." Anthropic defines MCP as an open connection standard between AI tools and data sources; Databricks positions Unity AI Gateway as a unified governance layer across LLMs and MCP; Cloudflare released Mesh to protect the AI agent lifecycle; Palo Alto folded Portkey into Prisma AIRS; CrowdStrike, through SGNL, extends continuous identity control to AI identities. The industry's product roadmap already makes the point: agent security is not an extension of traditional prompt protection, but runtime data access governance.

The table below lays out the AI data security value chain from an investment perspective, focusing not on "who can tell a story" but on who sits in the control plane, who can charge for it, and who can build a platform moat.

Value Chain Position	Segment	Core Products	AI Data Security Driver	Revenue Model	Competitive Moat	Margin Profile	Representative Companies	Listing Status	Benefit Strength	Investment Elasticity
Data sources	Documents/email/chat/tickets	SharePoint, OneDrive, Teams, email, Confluence, Jira, etc.	Over-sharing amplified by Copilot/RAG	Existing SaaS add-on security subscription	Customer data lock-in + native permission model	High platform gross margin	Microsoft, Atlassian, ServiceNow, Salesforce, Box	Listed	High	Medium
SaaS data	SaaS access governance	SaaS DSPM, SaaS DLP, SaaS posture	Shadow AI, third-party apps, external sharing	seat/tenant/usage	API coverage + behavioral data	Medium-high	Microsoft Purview, Cloudflare, Reco, Grip, DoControl	Mixed	High	High
Cloud storage	Object storage sensitive data discovery	Macie, Google DSPM, Alibaba DSC	PII/PHI/PCI in S3/GCS/OSS	usage + scan volume	Cloud-native telemetry	Medium	AWS, Google Cloud, Alibaba Cloud	Mixed	High	Medium
Data lakehouse	Catalog/labels/permissions	Unity Catalog, Horizon	RAG/AI apps connect directly to the lakehouse	Platform bundling + higher tier	Native control on the data surface	High	Databricks, Snowflake	Mixed	Very high	Very high
Databases	Row/column-level security/encryption	row policy, masking, queryable encryption	Agents accessing transaction and customer databases	edition/consumption	Kernel integration	High	Oracle, Snowflake, MongoDB, IBM	Listed	High	Medium-high
Vector databases	Retrieval filtering/endpoint ACL	vector ACL, metadata filter	Secondary exposure via embeddings	seat/usage	Difficulty of syncing with source permissions	Medium	Databricks, MongoDB, Elastic, Oracle	Listed/Private	Medium-high	High
Enterprise knowledge base	Document-level permission governance	DLS/ACL sync/semantic security trimming	A hard requirement for enterprise search and RAG	Search/security upsell	ACL inheritance + index quality	High	Azure AI Search, Elastic, Box	Listed	Very high	High
Data catalog	catalog / metadata / glossary	Horizon, Collibra, Alation, Atlan	AI needs to know "what data is available"	platform subscription	Metadata network effects	High	Snowflake, Collibra, Alation, Atlan	Mixed	Medium-high	Medium
Data lineage	lineage	Unity Catalog lineage, Snowflake external lineage	AI auditing, traceability for training/inference	Platform bundling	Metadata depth	High	Databricks, Snowflake, BigID	Mixed	Medium-high	Medium
Data classification	Sensitive data identification and labeling	Purview, Google SDP, IBM Guardium Discovery	PII/PHI/PCI/source code/contract identification	tiered / usage	Classifier precision and coverage	High	Microsoft, Google, IBM, BigID	Mixed	Very high	High
DSPM	Data discovery, exposure surface, risk scoring	Google DSPM, Cyera, BigID, Sentra, Concentric, Rubrik DSPM	"Discover first, then govern" is a prerequisite before AI projects go live	data source / TB / account	Cross-cloud, cross-SaaS graph	High	Google/Wiz, Cyera, BigID, Sentra, Concentric AI, Rubrik	Mixed	Very high	Very high
DLP	Email/endpoint/network/browser/GenAI DLP	Purview DLP, Symantec DLP, Cloudflare DLP	Prompt, copy, upload, exfiltration	seat / endpoint / traffic	Channel and inline deployment	Medium-high	Microsoft, Broadcom, Cloudflare, Zscaler	Listed	High	Medium-high
DDR	Data detection and response	Guardium DDR, Varonis / data activity analytics	Anomalous access and exfiltration detection	platform + module	Access logs and behavioral models	High	IBM, Varonis	Listed	High	High
Data access governance	entitlement graph / policy	Privacera, Immuta, Veza, Wiz CIEM+DAG	Least-privilege for agents, cross-system authorization	platform subscription	Identity-data relationship graph	High	Privacera, Immuta, Veza, Wiz	Mixed	Very high	High
RAG permission governance	permission-aware retrieval	Azure AI Search DLS, Elastic DLS, Privacera PAIG	A hard requirement for enterprise RAG	Search/platform add-on	Retrieval authorization + citation + audit	High	Microsoft, Elastic, Privacera	Mixed	Very high	Very high
Agent data access control	runtime policy / approval / logs	Prisma AIRS, Unity AI Gateway, CyberArk Secure AI Agents, Cloudflare Mesh	A gate before agents execute autonomously	Platform add-on + premium	policy plane + identity + logs	High	Palo Alto, Databricks, CyberArk, Cloudflare	Listed/Acquired	Very high	Very high
AI data governance	prompt/output/data lineage/usage policy	BigID, Securiti, Databricks, Snowflake	EU AI Act, auditing, and model accountability	platform + governance module	Regulatory mapping + policy engine	High	BigID, Securiti, Databricks, Snowflake	Mixed	High	High
Privacy and compliance	DSAR, consent, cross-border	OneTrust, Securiti, TrustArc, Transcend	AI data usability constrained by regulation	subscription	Regulatory knowledge base + workflow	Medium-high	Securiti, OneTrust, TrustArc	Private	Medium-high	Medium
Encryption and key management	KMS, tokenization, BYOK/HYOK	Thales CipherTrust, AWS KMS, MongoDB QE	"Data-in-use protection" and external key control	license + usage	Key root of trust	High	Thales, AWS, MongoDB	Mixed	Medium-high	Medium
Cloud vendor data security	native DSPM / DLP / guardrails	Google SCC DSPM, Macie, Purview	Strongest native distribution	bundle / consumption	Distribution and native telemetry	Very high	Microsoft, Google, AWS	Listed	Very high	Medium-high
AI app and agent platforms	AI gateway / observability / guardrails	Unity AI Gateway, Bedrock Guardrails, Portkey	Model-call governance	usage / request volume	Breadth of integration surface	Medium-high	Databricks, AWS, Palo Alto, Cloudflare	Mixed	High	High
Enterprise customer-side services	MSSP / consulting / managed	Managed data security, AI governance consulting	High implementation complexity	service fee + managed	Industry know-how	Medium	NTT Data, Infosys, TCS, Wipro, HCLTech	Listed	Medium	Medium

The chain above also explains how budget boundaries are shifting: budgets that used to belong to IAM, DLP, SaaS security, cloud security, data governance, and GRC are being repackaged under "AI data security." The most direct change is that customers no longer just ask "do you have DLP," but instead ask "will Copilot see content it shouldn't," "can the agent read but not write," "is the RAG permission-aware," "are prompts and outputs audited," and "does the vector index inherit source permissions."

Under three scenarios, the budget path looks roughly as follows:

Dimension	Conservative	Base	Aggressive
Assumption	Enterprises buy models and Copilot first, retrofit governance later	Copilot / enterprise search / light agents gradually go into production	Agents enter customer service, R&D, IT operations, and BI/Finance workflows
Enterprise AI adoption	High	High	Very high
RAG adoption	Medium	High	Very high
Agent adoption	Low	Medium	High
Change in data security budget	Reallocation within the overall security budget, little net new	A dedicated AI data security budget emerges, but still co-managed with cloud/identity/data platforms	Clear net new governance and audit budget; AI projects must come with it
Most-benefiting segments	M365 permission governance, basic DLP, S3/GCS/SaaS DSPM	DSPM, RAG permission governance, agent access control, GenAI DLP	Data access governance, identity/NHI, DDR, AI gateway, knowledge base security
Beneficiary companies	Microsoft, Google Cloud, AWS, Varonis	Microsoft, Google/Wiz, Palo Alto, CrowdStrike, Databricks, Snowflake, Cyera, BigID, Privacera	Palo Alto, CrowdStrike, the CyberArk path, Databricks, Snowflake, Cyera, Veza, Noma
Disrupted companies	Pure new-concept AI security startups	Traditional point DLP, catalog-only tools	Point tools that only do a "prompt firewall"
Key risk	AI projects delayed, budget goes to infrastructure first	Platform features descend too fast	False blocks, permission mismatches, customers unwilling to add complexity

The base scenario is the most credible investment assumption right now: AI data security will become a standalone budget pool, but it will not become a fully isolated standalone market—instead it will compete for the control surface alongside identity, cloud, security platforms, and data platforms.

Technical Architecture and Sector Breakdown

Looking at an enterprise-grade AI data security system disassembled, the most valuable part is not a "single detection point" but the continuous control chain from discovery, classification, and permission inheritance through to auditing and response. The table below consolidates the 17-layer architecture requested into a single investment framework.

Architecture Layer	Problem Solved	Representative Capabilities	Long-term Moat	Risk of Being Replaced by Platform Built-in	Willingness to Pay	Notes
Data discovery layer	Find shadow data, orphan data, ROT data	Scan object storage, SaaS, lakehouse, databases	Medium-high: connector coverage, efficiency, low intrusiveness	Medium	High	The DSPM base layer; Google DSPM, IBM, Cyera, BigID, and Sentra all build around it.
Data classification layer	Identify PII/PHI/PCI/code/contracts, etc.	classifier, rules, LLM + context	High: precision, low false positives, industry templates	Medium	Very high	Google Sensitive Data Protection, Snowflake classification, Purview, and OpenText all emphasize sensitive classification.
Sensitive data identification layer	Judge data value and risk level	labels, risk scoring, sensitivity labels	High	Medium	Very high	The more labels can be reused across DLP, RAG, and auditing, the higher the value.
Data catalog and lineage layer	Audit "where data comes from and where it flows"	catalog, lineage, external lineage	High: metadata network effects	Medium	Medium-high	Snowflake Horizon/External lineage, Databricks metadata layer.
Data permission graph layer	Know "who can access what"	ACL graph, entitlement map, DAG	Very high	Low-medium	Very high	The layer most likely to form a durable moat; Veza, Wiz, Privacera, and Immuta are closest here.
Identity and NHI mapping layer	Map users, service accounts, and agents to resources	PAM, machine identity, continuous identity	Very high	Low	Very high	AI agents increase the number of NHIs, sharply raising the importance of the identity layer.
RAG permission inheritance layer	Carry source permissions through at retrieval time	ACL sync, query token, security trimming	Very high	Medium	Very high	Azure AI Search, Microsoft connectors, and Elastic DLS are the most direct examples.
Vector database permission filtering layer	Keep embedding/retrieval within authorization	endpoint ACL, metadata filter, DLS	Medium-high	Medium-high	Medium-high	Databricks already has vector endpoint ACL/filters; many pure vector databases are still weak.
Prompt / input DLP layer	Keep sensitive data out of the model, block prompt injection	PII filter, prompt protection	Medium	High	Medium-high	AWS Bedrock, Cloudflare, and Purview all do this, but it is more easily absorbed by platforms.
Output DLP layer	Prevent model results from leaking	output filter, citation, redaction	Medium	High	Medium-high	As important as input DLP, but with weaker standalone pricing power.
Agent data access audit layer	Reconstruct what the agent did	payload logs, tool logs, approval logs	Very high	Medium	High	Databricks, Anthropic, and OpenAI all provide logging and monitoring.
Data anomaly behavior detection layer	Detect anomalous access, lateral movement, sensitive data movement	DDR, UEBA, activity analytics	High	Low-medium	High	IBM Guardium DDR and Varonis are the most typical paths.
Data leak response layer	Act automatically once risk is seen	quarantine, block, revoke, ticket	Medium-high	Medium	High	If it only discovers without disposing, the value is discounted.
Encryption and key management layer	Protect data at rest/in transit/partly in use	KMS, BYOK, queryable encryption	Very high	Low	Medium-high	But it is more infrastructure; the incremental elasticity is lower than permission governance.
Compliance and audit reporting layer	Satisfy GDPR/HIPAA/PCI/FINRA/EU AI Act	audit trail, retention, policy evidence	Medium	Medium	Medium-high	Easy to commoditize, but still a must-have to close deals.
AI governance policy layer	Connect data, models, agents, and policy	AI use policy, risk register, AI governance	Very high	Medium	Medium-high	BigID, Securiti, Databricks, and Snowflake are contesting this layer.
Security operations integration layer	Connect SOC / SIEM / tickets / SOAR	APIs, logs, playbooks	Medium-high	Medium	Medium	More about platform expansion than a standalone profit pool.

Building on this architecture chain, the 30 segments raised can be compressed into five investment clusters most worth tracking:

Investment Cluster	Included Segments	Commercialization Stage	Revenue Elasticity	Margin Outlook	Competitive Landscape	Investment Appeal
Multi-cloud DSPM and data classification	DSPM, sensitive data discovery, SaaS data security, cloud data security, unstructured data risk	Already platformizing	Very high	High	Large platforms + AI-native startups	Highest
Data access governance and permission graph	Data access governance, permission graph, NHI/identity linkage	Heating up fast	Very high	Very high	Identity security firms + data governance firms + startups	Highest
RAG / Enterprise Search permission governance	permission-aware RAG, knowledge base security, vector database filtering, document-level security	Moving from PoC to hard requirement	Very high	High	Microsoft / Elastic / Privacera / platform built-in	Very high
GenAI DLP and agent runtime control	Prompt DLP, output protection, agent data access, memory security, action approval	Early-to-mid stage	High	Medium-high	PANW / Cloudflare / Databricks / startups	High
AI data governance and compliance	training/inference data governance, AI auditing, data sovereignty, privacy compliance	Mid-stage	Medium-high	High	BigID / Securiti / OneTrust / platform companies	High

Among these, the layer most likely to form a durable moat is the integrated control plane of "data permission graph + classification labels + retrieval authorization + log auditing"; the layer most likely to be replaced by cloud-vendor built-ins is basic scanning, baseline checks, and point prompt DLP; the layer most likely to produce "good product but hard to monetize" is AI security point products that only detect without closing the loop on disposition. This judgment is consistent with the trend of platforms like Google, Microsoft, Databricks, Snowflake, and Oracle continuing to build governance capabilities into the data plane and the AI plane.

Company Tiering and Investment List

First, a high-density investment list. Here I tier by "direct beneficiary / indirect beneficiary / platform beneficiary / AI-native challenger / at risk of platform squeeze," and try to separate "product releases" from "revenue landing."

Priority Research Matrix for Listed Companies

Company	Region/Ticker	Segment	Core Products	AI/RAG/Agent Data Security Benefit Path	Financial/Commercialization Evidence	Category	Valuation Observation
Microsoft	US/MSFT	Platform beneficiary	Purview, Copilot, Graph connectors, Azure AI Search	Sits directly at the center of enterprise knowledge bases, permission models, compliance, and retrieval authorization; Copilot itself drives demand for Purview/permission governance	FY25 commercial RPO $368 billion; FY26 Q2 commercial RPO $625 billion; tight binding of Copilot/Graph/Purview.	Category A	Large scale, high certainty, medium elasticity; not cheap but not reliant on a single narrative
Alphabet / Google Cloud / Wiz	US/GOOGL	Platform beneficiary	Google Cloud DSPM, Sensitive Data Protection, Wiz	Google has already platformized cloud and AI security; Wiz provides multi-cloud data graphs and DSPM	Acquisition of Wiz completed in March 2026; Google Cloud has officially integrated DSPM into SCC.	Category A	Strong platform-integration elasticity, but security revenue still hard to break out
Palo Alto Networks	US/PANW	Platform beneficiary	Prisma AIRS, Protect AI, Portkey, identity platform	Expanding from AI model security into agent lifecycle, LLM gateway, and runtime security	Q2 FY26 revenue $2.594 billion, RPO $16 billion; completed Protect AI in 2025, plans to acquire Portkey in 2026, AIRS 3.0 targets agentic AI.	Category A/B	Strong logic, fast M&A, market expectations already elevated
CrowdStrike	US/CRWD	Platform beneficiary	Falcon Data Security, Charlotte AI, SGNL	Entering the AI data surface via endpoint + identity + data protection	FY25 revenue $3.95 billion, up 29% year over year; identity security ARR over $435 million; plans to acquire SGNL in 2026.	Category A/B	Strongly platformized, but valuation already significantly priced in ahead
Zscaler	US/ZS	DLP / zero trust	Inline DLP, SSE, GenAI controls	Suited to browser, SaaS, upload/exfiltration, and Shadow AI scenarios	Clear product path, but no broken-out AI data security revenue disclosure found this round; overall more of a platform enhancement.	Category B	Right theme; needs clearer revenue attribution
Varonis	US/VRNS	DDR / data permission governance	Data security platform, Copilot risk governance, MDDR	The most direct beneficiary of governing SharePoint/OneDrive/email over-sharing	2025 ARR $745.4 million, SaaS ARR $638.5 million; Q1 2026 revenue and SaaS ARR guidance continue to grow.	Category A	High purity, high elasticity; valuation not low but still has fundamental support
Rubrik	US/RBRK	DSPM + data recovery	Rubrik DSPM, data recovery, Annapurna roadmap	If AI data security emphasizes "discovery + recovery + resilience," Rubrik benefits clearly	Has officially launched DSPM, but AI-related revenue not separately disclosed; still needs ongoing validation of sales mix.	Category B	Good logic, but more "platform expansion" than validated standalone revenue
Snowflake	US/SNOW	Data platform security	Horizon Catalog, row policy, masking, lineage	Sits at the core of the enterprise data lakehouse and AI data cloud	FY26 Q4 product revenue $1.23 billion, up 30% year over year; NRR 125%; 733 customers over $1 million; RPO $9.77 billion.	Category A	Strong long-term moat, but the market has partly priced in the "AI data cloud" path
MongoDB	US/MDB	Database/vector/encryption	Atlas, Vector Search, Queryable Encryption	AI apps often place operational DB + vector search together, so security capabilities can grow directly with usage	FY26 revenue $2.46 billion, up 23% year over year; Atlas up 29% year over year; over 65,200 customers.	Category A/B	Not a pure security name, but with high direct exposure to the AI data surface
Elastic	US/ESTC	Enterprise search/RAG/security	Elasticsearch, DLS/FLS, AI Assistant	Enterprise search, SOC AI assistants, and vector retrieval all need DLS/FLS	FY26 Q3 revenue $450 million, up 18% year over year; sales-led subscription $376 million, up 21% year over year; 1,660+ customers with $100K ACV.	Category A/B	High purity in RAG permission governance, with market attention still below the large platforms
Cloudflare	US/NET	GenAI DLP / AI-SPM / agent network security	AI Gateway, AI prompt protection, AI-SPM, Mesh	Entering via network ingress, browser, SASE, and the agent network	Q1 2026 revenue $639.8 million, up 34% year over year; current RPO up 34% year over year; has released AI prompt protection, AI-SPM, and Mesh, but does not break out revenue.	Category B	Excellent product cadence, but valuation is sensitive to AI expectations
Oracle	US/ORCL	Database / agent-native data security	Oracle AI Database, AI Vector Search, Deep Data Security	The "don't copy data to an external vector database" narrative fits enterprise security demands very well	Oracle 26ai launches Deep Data Security with built-in vector search; the path is strong for finance and government/large enterprises.	Category A/B	Security upside potentially underestimated, but customer adoption needs watching
IBM	US/IBM	DDR / discovery and classification / key governance	Guardium, Guardium DDR, Key Lifecycle Manager	Directly covers the full Discover / Classify / DDR / key mgmt chain	IBM explicitly uses Guardium for data discovery, classification, DDR, and key lifecycle management, but AI data security revenue is not broken out.	Category B/C	Strong defensive profile, with elasticity below pure-security SaaS platforms
Okta	US/OKTA	Identity / access	Workforce Identity, governance	Identity governance for AI agents / apps / connectors needs a strong identity foundation	Highly relevant to AI data security, but no direct data security revenue validation found this round.	Category C	More of a "necessary foundation," not the most direct beneficiary of data security revenue
Trend Micro	Japan/4704	Cloud security + AI security platform	Trend Vision One, AI security platform	Can carry AI risk detection and link cloud and data risk	2025 enterprise ARR over $1.3 billion, large-enterprise platform ARR $467 million, Q4 enterprise net sales up 8% year over year.	Category B	An Asia-Pacific representative with a clear platformization path

Important Private Company Observation Matrix

Company	Country/Region	Segment	Core Products	Funding/Valuation	Known Commercialization Signals	Competitive Relations	Assessment
Cyera	Israel/US	DSPM	Multi-cloud data discovery, classification, access analytics, AI data security	Raised another $400 million in January 2026, valued at $9 billion.	Fast customer expansion, but ARR not public	Pressures BigID, Sentra, Google/Wiz, Rubrik	One of the AI-native DSPM names most worth tracking
BigID	US	DSPM + AI governance	Data discovery, classification, AI governance, vector DB / agent governance	Company says 2024 revenue passed $100 million.	Has a revenue and platformization base	Competes across Microsoft/Purview, Cyera, Securiti	Closest to a "platform-type AI data security startup"
Sentra	Israel/US	DSPM	Cloud-native DSPM, archive scanning, data attack surface	$50 million Series B in 2025.	Emphasizes AI-ready data protection, ARR undisclosed	Competes with Cyera, BigID, Concentric	Worth tracking, but transparency still insufficient
Concentric AI	US	DSPM / unstructured data risk	Semantic intelligence, DSPM, DLP	$45 million Series B in 2024.	Strong in unstructured data and permission semantics	Challenges Varonis, BigID, Sentra	Worth tracking in the AI knowledge base security direction
Securiti	US	AI data governance / privacy	Data+AI security, privacy ops, agent governance	Acquired by Veeam; website continues to advance Agent Commander.	Strong regulatory/privacy semantics	Competes with OneTrust, BigID, Privacera	Strong regulatory drivers, but the post-acquisition pace needs watching
Privacera	US	Data access governance / RAG	PAIG, vector DB/RAG access control	Public news shows a 2026 rebrand to Trust3 AI.	Launched vector DB / RAG access control back in 2024	Competes with Databricks/Snowflake/Immuta	A highly relevant name in RAG permission governance
Immuta	US	Data access governance	Dynamic access control, cloud data access governance	$100 million raised in 2022, $267 million total funding.	Commercially mature, but funding updates sparse in recent years	Competes with Privacera, Veza, and platform-native governance	Needs validation of whether growth re-accelerates
Veza	US	Permission graph / identity-data governance	Access graph, entitlement intelligence	$108 million Series D in 2025, valued at $808 million.	Backed by Snowflake/Atlassian/Workday Ventures	Spans identity and data governance	Very much worth tracking, could become an M&A target
Noma Security	Israel	AI/Agent security	AI app, RAG, agent runtime security	$100 million Series B in 2025, $132 million total funding.	Growing very fast but revenue undisclosed	Competes with the PANW/Cloudflare/Protect AI path	A textbook AI-native challenger
Lasso Security	Israel	GenAI / LLM security	Prompt / LLM cybersecurity	$6 million seed round in 2023; later materials show cumulative funding has increased.	Clear direction, insufficient transparency	Competes with Cloudflare/PANW/peer startups	More of a technology bet
Protect AI	US	AI security platform	Model-to-runtime AI security	Acquired by Palo Alto in 2025.	The acquisition validates the sector's value	Already folded into PANW	Already an M&A pricing anchor
OneTrust / TrustArc / Transcend	US	Privacy and compliance	DSAR, consent, policy	Funding and valuation mostly outdated or need separate checking	Relevant to AI data usage compliance, but not fully overlapping with runtime data security	Partly overlaps with Securiti / BigID	More of a compliance beneficiary, not the strongest AI security elasticity
Collibra / Alation / Atlan	Europe/US/India	Catalog and lineage	catalog, lineage, governance	Collibra valued at $5.25 billion in 2021.	Important in AI governance, but security monetization needs validation	Competes with Snowflake/Databricks platform features	Catalog still matters, but pure investment elasticity below DSPM
Reco / Grip / DoControl / Adaptive Shield	Israel/US	SaaS data security	SaaS posture / SaaS DLP / access	Funding and ARR not systematically verified this round	Benefit from SaaS + AI app expansion	Also face absorption pressure from Microsoft/Cloudflare	Worth tracking, but each needs individual validation

Looking at the investment tiers:

Category A: core direct beneficiaries of AI/RAG/Agent data security—Microsoft, Google/Wiz, Varonis, Snowflake, Databricks (private), Cyera, BigID, and the Privacera/Veza path. The common trait: they sit at the core of the data control surface and connect directly to enterprise knowledge bases, lakehouses, retrieval, permissions, and catalogs.
Category B: clear beneficiaries, but with higher valuation or platform-squeeze risk—Palo Alto, CrowdStrike, Cloudflare, MongoDB, Elastic, Oracle, Trend Micro, Sentra, and Concentric. The common trait: products are strongly correlated with demand, but AI data security may not yet be broken out as a primary financial driver.
Category C: more defensive beneficiaries—IBM, Okta, Thales, Broadcom, OpenText, and AWS. Capabilities matter, but near-term financial elasticity is not necessarily the strongest.
Category D: strong narrative, insufficient financial validation—a large number of "AI security startups" and platform add-on modules, especially companies that only do a prompt firewall, LLM scanning, or an advisory layer.
Category E: high risk of platform consolidation—traditional point DLP, catalog-only tools, weak runtime governance products, and data governance tools that lack a permission graph and remediation.

Key Listed Companies and Valuation Observations

Below are the 15 listed companies most worth continued secondary research. Because many companies do not break out AI data security revenue, the following is better treated as a "research-priority and expectation-gap list" than a simple valuation table.

Company	Sector Positioning	Commercialization Stage	Key Financial/Customer Metrics	AI Data Security Evidence	Current Market Expectation	Research Conclusion
Microsoft	Enterprise knowledge base + permission control plane	Mature, demand expanding with Copilot	Commercial RPO $368 billion in FY25, FY26 Q2 commercial RPO $625 billion; share price about $423.54, market cap about $3.15 trillion.	Purview, Copilot, Graph connectors, and Azure AI Search integrate permissions and retrieval.	The market fully recognizes its AI main line, but may not fully price in Purview's secondary benefit	High certainty, low purity, high long-term moat
Google / Wiz	Multi-cloud DSPM + CNAPP + AI security	In the platform-consolidation period	Wiz consolidated into Google Cloud in March 2026; GOOGL market cap about $4.81 trillion.	Google already natively provides DSPM and sensitive data classification.	M&A integration and business-model synergy still to be seen	High certainty, high platform-suppression power
Palo Alto Networks	AI security platform + agent runtime + identity	Rapid expansion	Q2 FY26 revenue $2.594 billion, up 15% year over year; RPO $16 billion, up 23% year over year; share price about $247.55, market cap about $176 billion.	Protect AI, Prisma AIRS 3.0, and the Portkey acquisition all point to the agentic AI lifecycle.	Expectations clearly elevated; scrutinize M&A delivery and attach rate	High elasticity, valuation running hot
CrowdStrike	identity + data protection + AI agents	Expansion	FY25 revenue $3.95 billion, up 29% year over year; identity ARR over $435 million; share price about $618.83, market cap about $155.5 billion.	Falcon Data Security, Charlotte AI, SGNL.	The market sees it as one of the core AI security platforms	High quality, high expectations, guard against valuation pullback
Zscaler	inline DLP / browser / SaaS control	Mid-to-late stage	Share price about $174.69, market cap about $27.9 billion.	A natural fit for GenAI DLP scenarios, but no broken-out AI data security revenue found in this round's materials.	Moderately high expectations	Worth tracking, needs revenue validation
Varonis	Data permission governance / DDR	Direct-benefit period	ARR $745.4 million; SaaS ARR $638.5 million; share price about $28.78, market cap about $3.33 billion.	Copilot/SharePoint over-sharing risk closely matches its product.	High purity but smaller scale, high elasticity	One of the public pure-data-security names most worth digging into
Rubrik	DSPM + recovery	Early-to-mid expansion	Share price about $64.98, market cap about $12.89 billion.	Has officially made DSPM part of the suite.	The market views it more as a recovery/resilience company	Medium-high certainty, an AI expectation gap may exist
Snowflake	Lakehouse security control plane	Validated	FY26 Q4 product revenue $1.23 billion, up 30% year over year; NRR 125%; RPO $9.77 billion; 733 million-dollar customers; share price about $164.24, market cap about $55.78 billion.	Horizon makes governance for AI a core selling point.	The market has partly priced in the AI data cloud, but security monetization is not yet fully unfolded	Strong long-term moat, continue to track closely
MongoDB	operational + vector + encryption	Validated	FY26 revenue $2.46 billion, up 23% year over year; Atlas up 29%; over 65,200 customers; share price about $330, market cap about $26.86 billion.	Atlas places operational and vector on the same platform; Queryable Encryption provides the "server doesn't know the plaintext" security property.	The AI data platform attribute is gradually being re-rated by the market	Medium-high certainty, medium-high elasticity
Elastic	Enterprise search/RAG security	Validated	FY26 Q3 revenue $450 million, up 18% year over year; sales-led subscription $376 million, up 21% year over year; 1,660+ customers with $100K ACV; share price about $53.91, market cap about $5.73 billion.	DLS/FLS is directly relevant to RAG permission governance logic.	The market's awareness of its "security + search + AI" combination is still insufficient	Large expectation gap
Cloudflare	AI gateway / prompt DLP / agent networking	Rapid-experimentation stage	Q1 2026 revenue $639.8 million, up 34% year over year; cRPO up 34%; share price about $201.75, market cap about $71.1 billion.	prompt protection, AI-SPM, Mesh, and AI Gateway are all released.	Very strong AI expectations, high near-term valuation elasticity	Good company but valuation sensitive to the narrative
Oracle	Database-built-in AI security	Take-off stage	Share price about $186.61, market cap about $543.4 billion, P/E about 33.5x.	26ai launches AI Vector Search and Deep Data Security, stressing no need to copy enterprise data to an external vector database.	The market focuses more on the cloud and database main line; AI data security may still be underestimated	A potential expectation-gap name
IBM	DDR / discovery and classification / keys	Mature	Share price about $222.75, market cap about $212.1 billion, P/E about 19.7x.	Guardium already covers discovery, classify, DDR, and key management.	More of a defensive allocation than a high-elasticity SaaS	Defensive beneficiary
Okta	Identity foundation	Mature	Share price about $87.04, market cap about $15.5 billion.	AI agents / apps / connectors expand identity and access governance demand, but its data security revenue chain is rather indirect.	Right logic, insufficient purity	Medium beneficiary, not a top-choice pure name
Trend Micro	AI security platform	Regional representative	enterprise ARR over $1.3 billion, large-enterprise platform ARR $467 million.	Has put the AI Security Platform at the core of its platform narrative.	Asia-Pacific elasticity above global market perception	Worth adding to the Asia-Pacific watchlist

Based on the table above, valuation and expectation gaps can be simplified into a rough judgment:

Expectations already fairly fully reflected: CrowdStrike, Palo Alto, Cloudflare, Databricks (private), Cyera. Their common feature is a strong platform narrative, dense funding/M&A, and high market sentiment.
Possible expectation gap remaining: Varonis, Elastic, Oracle, some of Snowflake / MongoDB's security sidelines, and non-listed permission governance paths like Veza/Privacera. Their common feature: capabilities are already critical, but the market still prices them mainly on "existing business."
Good companies but valuations too expensive: CrowdStrike, Cloudflare, and parts of Palo Alto. Current share price, market cap, and market sentiment are all high, so near-term performance depends more on expansion speed beating expectations.
Revenue growth real, valuation relatively still researchable: Varonis, Elastic, MongoDB, and parts of Snowflake. "Relatively" is the operative word here, not "cheap."

Scoring Model and Current Ranking

I adopt the suggested weights with slight simplification: AI/RAG/Agent revenue exposure 25% + platform position and customer base 20% + permission/classification/governance moat 15% + product coverage 15% + financial quality 10% + growth elasticity 10% + valuation reasonableness 5%.

Given the current materials, the overall ranking (research priority, not investment advice) is roughly as follows:

Ranking Group	Companies	Logic
Group 1	Microsoft, Google/Wiz, Snowflake, Varonis, Databricks, Palo Alto	Sit at the core of the control surface and can connect permissions/retrieval/governance/security into a platform
Group 2	CrowdStrike, MongoDB, Elastic, Oracle, Cyera, BigID	Either strong platform coverage or a larger expectation gap
Group 3	Cloudflare, Rubrik, Trend Micro, Privacera, Veza, Sentra	Right sector, but revenue attribution/scale expansion still needs tracking
Group 4	IBM, Okta, Thales, Broadcom, OpenText	Important but more defensive, not the strongest earnings elasticity
Group 5	Players that only do a prompt firewall, point LLM scan, or catalog-only	Easily built into platforms or squeezed on price

In the reverse scoring for "platform consolidation risk," the highest risk is usually: point DLP, point AI-SPM, products that only do prompt/output filtering, catalog-only tools, and weak permission-graph governance vendors. The reason is that Microsoft, Google, AWS, Snowflake, Databricks, and Oracle have already pulled the key features into their platforms.

Risks, Open Questions, and Final Conclusions

The biggest risk is not that "the security demand does not exist," but in what form the demand takes, who captures it, and when it shows up in the financials. From current public materials, five most important investment judgments can be summarized.

First, AI data security will become the core control plane of the enterprise AI era, but it is not a standalone isolated market. It will deeply integrate with identity security, cloud security, data platforms, enterprise search, knowledge bases, and compliance auditing. In other words, it is more like a "control plane layer" than a forever-standalone single-product market.

Second, there are only five segments genuinely worth attention. They are: multi-cloud DSPM, data access governance and permission graph, RAG/enterprise search permission governance, GenAI DLP/agent runtime control, and AI data governance and compliance. This is the most central industry distillation of this report.

Third, the ten listed companies most worth in-depth research, ranked by "certainty × elasticity × platform position," in suggested priority order: Microsoft, Google/Alphabet, Palo Alto Networks, CrowdStrike, Varonis, Snowflake, MongoDB, Elastic, Oracle, Cloudflare. Among them, Microsoft/Google/Snowflake lean toward platform certainty, Varonis/Elastic/Oracle lean toward the expectation gap, and PANW/CRWD/NET lean toward high elasticity and high expectations.

Fourth, the ten private companies most worth continuous tracking are: Cyera, BigID, Sentra, Concentric AI, Securiti, Privacera, Veza, Noma Security, Immuta, Lasso Security. Among them, Cyera/BigID/Veza/Privacera have the highest strategic value; Noma/Lasso lean toward AI-native high-risk, high-payoff; and Immuta/Securiti need more validation of their growth curve.

Fifth, the five points the market most easily misunderstands are: one, AI data security is not the same as model security; two, prompt DLP is not the whole picture; three, vector databases do not naturally inherit source permissions; four, Copilot is not "secure," it "faithfully executes existing permissions"; five, what truly makes money is the control plane, not point detection.

Over the next 6–12 months, the metrics most worth tracking include:

Whether public companies begin to break out ARR, RPO, cRPO, and customer counts for AI data security / data protection / AI governance / identity for AI.
New permission and audit features in Microsoft Purview, Google Cloud DSPM, AWS Bedrock Guardrails, Snowflake Horizon, and Databricks Unity Catalog / AI Gateway.
Whether customers move RAG and agents from pilot to production; whether they begin to require document-level security, approval workflow, payload logging, and retention control.
Whether major M&A keeps happening on the DSPM / identity-data governance / AI gateway / RAG access control front line. Google-Wiz, PANW-Protect AI/Portkey, and CrowdStrike-SGNL have already pointed the direction.

Open Questions and Limitations

This report has tried to compress the conflict between "broad and comprehensive" and "verifiable," but several types of information still have not been adequately disclosed by public companies:

Most companies do not break out AI data security revenue, and it can only be inferred from product releases, customer scenarios, platform attach rates, and RPO/ARR; "releases" must not be mistaken for "revenue."
Some traditional vendors and regional companies (especially A-shares, Hong Kong shares, Europe, Japan, Korea, and India) have insufficient AI data security granularity in this round of public materials, and are better placed on a second-round verification list than turned into strong conclusions this round.
Vector database security, agent memory security, privacy-enhancing computation, and confidential computing remain early-stage for now; they may be important in the short term but will not necessarily form a large revenue pool immediately.

Final Conclusion

If you single out the links in the AI value chain that can genuinely form a long-term profit pool, data security, DSPM, RAG permission governance, data access governance, and enterprise knowledge base security are the cluster that most deserves attention. They determine whether enterprise AI can move from demo to production, whether Copilot and agents can touch core data, and who bears the future regulatory and audit risk.

For investment, the narrower follow-on directions most worth prioritizing converge to four main lines:

DSPM, GenAI DLP, RAG permission governance, and agent data access control. These four main lines have both clear demand and the best odds of converting into revenue, platform attach rate, RPO/cRPO, and margin improvement over the next 12–24 months.

This report is based on public information and does not constitute investment advice. Markets carry risk; invest with caution.

Mentioned Tickers

MSFT.USMSFT · US GOOGL.USGOOGL · US PANW.USPANW · US CRWD.USCRWD · US VRNS.USVRNS · US SNOW.USSNOW · US MDB.USMDB · US ESTC.USESTC · US NET.USNET · US ORCL.USORCL · US ZS.USZS · US IBM.USIBM · US OKTA.USOKTA · US RBRK.USRBRK · US 4704.TSE4704 · TSE