What Your Datadog Environment Isn’t Telling You

TroubleshootingAI

Mar 12

Written By Miranda Gaudet

Nick Vecellio

Co-Founder and Principal Engineer, NoBS

You’ve been using Datadog for a while now:

Agents are installed.
Tags are consistent.
Logs and traces are flowing.
Alerts fire when they should.

Everything is perfect. Well, not always.

Over time, configuration drift begins to appear.

As more engineers use the platform and new services are added, small changes accumulate. These changes are often difficult to see at the surface level and can have an impact on usability, platform security, and monitoring accuracy.

In mature environments, these issues are rarely the result of a single mistake. They emerge gradually as the platform grows. Below are three examples we commonly see that usually require some deeper investigation to uncover.

Overlapping Datadog RBAC Roles

Even if you stick with Datadog’s out-of-the-box roles, permission overlap can still occur.

In Datadog, users can be assigned multiple RBAC roles simultaneously. Permissions are additive, which means if one role denies a permission but another role grants it, the permission is granted. This can lead to situations where a user has more access than intended. For example:

A user is assigned the Read Only role
The same user is also assigned the Standard role

Because Standard includes modification privileges, that user can now alter log pipelines, create custom metrics, and modify parts of the Datadog account configuration. This happens even though the expectation might have been read-only access.

In smaller environments this may not matter much. But as the number of users grows, these overlaps become difficult to track manually. Reviewing role assignments requires examining:

Each role’s permissions
Which users have which roles
Where permissions overlap in unintended ways

When organizations have dozens of roles and hundreds of users, identifying these conflicts inside the UI becomes extremely difficult. This is why we analyze the RBAC structure programmatically to surface permissions users may have that they weren’t expected to have.

Misconfigured Datadog Log Pipelines

Log pipelines are another area where drift appears frequently. When teams start shipping logs to Datadog, usually the first step (if they aren’t JSON (which they should be) is to create a pipeline to process and remap the data:

A user creates the pipeline
Sample logs are used to test parsing
A Grok pattern is written to extract fields

We’re good to go, right? At first, maybe. But applications evolve. Logging formats change. New services get added. Over time, several common problems begin to appear:

Logs that no longer match the original Grok parser
Pipelines that partially process logs
Redundant pipelines performing similar transformations
Pipelines unintentionally modifying already processed logs

As the number of pipelines grows into the dozens, understanding the full processing flow becomes extremely difficult through the UI alone.

This can lead to situations where:

Logs silently fail to parse
Dashboards miss important data fields
Alerts trigger inconsistently

The underlying issue isn’t always obvious unless someone performs a deep review of pipeline behavior across the entire environment.

Low Adoption of Datadog RUM Session Replay

Datadog RUM Session Replay is a frontend developer's dream. Instead of trying to reproduce a user’s issue through trial and error, developers can watch exactly what happened during the session.

Session Replay allows teams to view:

The user’s interaction flow
Console errors
Network requests
Performance issues
Frontend crashes

In theory, this dramatically shortens debugging cycles. But in many environments we review, Session Replay is enabled but rarely used.

While Datadog provides strong observability data, it’s not always obvious from the UI how frequently teams actually rely on specific features like replay.

By analyzing usage patterns, we can determine:

Which teams actively use Session Replay
How often it is accessed
Whether the data being collected aligns with real debugging workflows

This helps organizations determine whether they are getting real value from the data they collect, or simply generating additional telemetry without meaningful usage.

We recently explored another use case in Automating RUM Reporting with Datadog AI Agents.

Introducing NoBServatory

We originally called it “Observatory” as a play on looking at your observability. But if you put “No” in front of it, it kinda names itself.

Anyway, what is it?

NoBServatory is a regularly scheduled report sent directly to you with the insights listed above, plus analysis across more than 15 additional areas of your Datadog environment. Each report includes:

An executive summary of your account’s overall health
Cross-cutting themes identified across the analysis
Scoring that tracks the health of the environment over time

The data is generated directly from the Datadog APIs. From there, we apply a deterministic score to each module based on measurable signals in the environment. After that, specifically tuned AI agents adjust those scores based on context, raising or lowering the deterministic score where appropriate.

The end result is a report containing the hard facts about your Datadog environment including the things you probably didn’t know you needed to look for.

The report includes insights into areas such as:

SIEM health
User adoption
Datadog Teams usage and adoption
Monitor fatigue analysis

…and plenty more.

These insights help teams understand what is happening inside their observability platform beyond what dashboards show.

Built With Security in Mind

NoBServatory was designed with security as a primary requirement. The environment hosting the product is compliant with:

PCI
SOC 2
HIPAA (with BAA support)
ISO 27001
FedRAMP
NIST

All customer data is isolated and protected with separated execution, with data only stored in-memory in ephemeral environments. For organizations with stricter requirements, subscriptions can also be configured to use AWS-hosted AI models, so your data never leaves the VPC.

FAQ: Datadog Environment Health & NoBServatory

Last updated: 2026-03-09

What is Datadog configuration drift?

Datadog configuration drift occurs when a monitoring environment gradually diverges from its original configuration over time. As new services are added and additional engineers make changes, roles, monitors, pipelines, and dashboards can evolve in ways that are difficult to track.

These small changes can accumulate and impact security permissions, monitoring accuracy, and the usability of the platform.

How do you audit a Datadog environment?

Auditing a Datadog environment typically involves reviewing several areas of the platform, including:

RBAC roles and user permissions
Monitor configuration and alert noise
Log pipeline processing and parsing rules
Feature adoption across teams
SIEM configuration and detection coverage

Because large environments can contain hundreds of users and dozens of pipelines or monitors, many organizations rely on API-driven analysis to identify patterns that are difficult to see directly in the Datadog UI.

Why do Datadog RBAC roles overlap?

Datadog allows users to be assigned multiple roles at the same time. Permissions are additive, meaning that if one role grants a permission while another role restricts it, the granted permission takes precedence.

This can lead to situations where users unintentionally receive broader access than intended, particularly when multiple roles are assigned across teams or projects.

What causes problems in Datadog log pipelines?

Log pipeline issues often appear as environments grow and logging formats evolve.

Common causes include:

Log formats changing over time
Grok parsers no longer matching incoming logs
Duplicate or overlapping pipelines
Pipelines modifying already processed logs

When dozens of pipelines exist in an environment, it can become difficult to understand how logs are being processed end-to-end.

What is Datadog RUM Session Replay used for?

Datadog RUM Session Replay allows frontend engineers to view a recording of a user’s experience inside an application.

Session Replay helps teams understand issues by showing:

User interaction flows
Console errors
Network requests
Frontend crashes and performance issues

Instead of trying to reproduce bugs manually, developers can see exactly what happened during the user session.

What is NoBServatory?

NoBServatory is a reporting system designed to analyze the health of Datadog environments.

It uses the Datadog APIs to collect configuration and usage data and produces a regularly scheduled report that highlights platform health, configuration drift, and feature adoption trends.

The report combines deterministic scoring with AI-assisted analysis to identify patterns that would otherwise require extensive manual investigation.

How often should a Datadog environment be reviewed?

Most organizations benefit from reviewing their observability environment on a regular cadence.

As teams scale and more services are added, configuration drift can appear gradually across permissions, pipelines, monitors, and usage patterns.

Regular reviews help ensure the environment remains secure, efficient, and aligned with how engineering teams actually use the platform.

See What Your Datadog Environment Isn’t Telling You

Most Datadog environments accumulate small configuration changes over time. Individually, these changes seem harmless. Collectively, they can impact observability accuracy, platform usability, and security posture. NoBServatory helps surface those issues before they become larger problems.

To learn more, contact: sales@nobs.tech.

Miranda Gaudet