Every Dependency Checked Before the Code Is Written

Eran Yahav

Eran Yahav, CTO of Tabnine


A developer asks an AI coding agent to add PDF export to a reporting service. The agent picks a library. It writes the integration code — import statements, wrapper functions, error handling, tests. The developer reviews it, approves it, opens a PR.

CI fails. The library has a known CVE. Or its license is GPL and the project is Apache-2.0. Or the organization removed it from the approved registry three months ago after an incident.

The developer now throws away the agent’s work. Finds a compliant alternative. Asks the agent to rewrite. Reviews again. Opens another PR. CI passes. Two cycles of work for one feature. The agent was fast. The process was slow.

This is normal. It happens every day in teams using AI coding agents. It will get worse.

The post-hoc problem

Supply chain security tooling today operates after the fact. SCA scanners run in CI. Dependabot opens PRs when vulnerabilities appear. License checkers flag violations at build time or PR review. Manual review catches the rest — sometimes.

This model worked when humans wrote code. A developer choosing a dependency has context: they know (or can check) the team’s approved list, the license constraints, the recent security advisories. They make a judgment before writing the code. The scanner in CI is a safety net, not the primary control.

AI coding agents do not have this context. An agent choosing a dependency picks whatever its training data and the current prompt suggest. It has no awareness of the organization’s approved package registry. It does not know your license policy. It does not check whether the library it selected was flagged in a CVE advisory last week. It optimizes for functional correctness — does this library solve the task? — and ignores supply chain constraints entirely.

The safety net becomes the primary control. And safety nets that bear primary load fail differently than safety nets that catch edge cases.

Scale makes post-hoc untenable

The arithmetic is straightforward. If developers write code manually and introduce dependencies at human pace, the churn from CI rejections is manageable. A few cycles per week, across a team, absorbed into normal workflow.

AI agents change the rate. In the teams we observe, an agent-assisted developer touches 3-5x more dependencies per week than a developer working without AI assistance. At a 200-person engineering org, that is the difference between 50 dependency reviews per week and 200. The CI rejection pipeline was not designed for 200.

Post-hoc scanning at this volume produces one of two outcomes. Either the rejection rate stays constant (same proportion of violations flagged) and the absolute number of rework cycles grows proportionally — creating a drag that offsets the productivity gain of the agent. Or teams reduce friction by weakening the scanner’s rules, accepting more risk to maintain throughput. Neither outcome is acceptable.

The feedback loop is also wrong. A CI rejection tells the developer the dependency is bad. It does not tell the agent. The agent has no mechanism to learn from the rejection. The next time it faces a similar task, it may propose the same dependency, or a different one with the same problem. There is no closing of the loop at the point of generation.

Checking dependencies before code exists

The approach we are building — and I want to be direct that this is engineering direction, not a shipping feature today — moves the supply chain check to the moment the agent decides to use a dependency. Before it writes the import statement. Before it generates the integration code. Before any human reviews anything.

The mechanism is conceptually simple. The agent, at the point where it would select a dependency, queries the organization’s constraints:

Approved registry check. Is this package in the organization’s approved registry? If the organization maintains an internal Artifactory, Nexus, or similar curated registry, only packages in that registry are available to the agent. Packages outside it do not exist from the agent’s perspective.

License policy enforcement. Does this package’s license comply with the project’s license policy? If the project is Apache-2.0, a GPL dependency is rejected before the agent writes a line of code. The policy is not a suggestion. It is a constraint on generation.

CVE posture check. Does this package version have known vulnerabilities above the organization’s severity threshold? If the organization’s policy rejects packages with critical CVEs, the agent cannot select a version with a critical CVE. It must find a clean version or an alternative package.

No-reintroduction guarantee. If a dependency is rejected, it cannot be reintroduced through a different code path in the same generation session. The agent cannot work around a rejected library by importing it indirectly, wrapping it in a utility, or vendoring it. The rejection is binding.

The output: the agent only generates code using dependencies that have already passed the organization’s supply chain policies. The PR that arrives for review is clean by construction. CI scanners still run — defense in depth — but they are back to being a safety net, not the primary gate.

Honest status

This is planned work. We are building the architecture and working through the engineering challenges described above. It is not shipping today. I am writing about it because the problem is real and growing, the direction is technically sound, and I want to be transparent about where we are heading.

The foundation is already in place. Tabnine’s governance-at-generation layer — the system that enforces organizational policies before code is produced — is the infrastructure this feature builds on. The supply chain check is a specific policy type within that framework: a constraint on which external packages the agent may use, evaluated at the same point where other governance rules are evaluated.

The hardest open problem is transitive dependency resolution at generation-time latency. Fully resolving a dependency tree for a nontrivial package can take seconds, which is too slow for interactive generation. We are exploring approaches including pre-computed resolution caches, incremental updates from registry mirrors, and partial resolution with conservative fallback policies. This is an active engineering problem, not a solved one.

Why this has to happen at generation, not at review

A reasonable objection: why not just make the CI scanner faster? Or give the agent a tool to check dependencies before committing?

Speed is not the issue. The issue is where in the workflow the constraint is enforced.

When the constraint is at CI, the agent generates code without awareness of it. Code is written, reviewed, and then discarded. The cost is not just the CI cycle — it is the developer’s attention, the review time, the context-switching. At agent-driven volume, this cost compounds.

When the constraint is at generation, code that violates supply chain policy is never produced. There is no review of non-compliant code. There is no rework. The constraint is invisible to the developer because it is already satisfied.

This is the same shift-left argument that the industry made for testing, for security scanning, for type checking. Move the constraint to the earliest point where it can be enforced. In AI-generated code, the earliest point is the moment the agent makes a dependency decision — before the code exists.

What this requires technically

This is not trivial to build. A few of the engineering challenges we are working through:

Dependency resolution at generation time. The agent must be able to query package registries, license databases, and vulnerability databases with low enough latency that generation is not noticeably slower. This means local caching, incremental updates, and efficient lookup against potentially large registries.

Policy representation. License policies and CVE severity thresholds must be expressible as machine-readable rules that the governance layer can evaluate at generation time. This connects to the governance-at-generation architecture described in earlier posts in this series. The policy format must be simple enough for security teams to author and audit, expressive enough to capture real-world constraints.

Transitive dependencies. Checking the direct dependency is necessary but insufficient. A package with an Apache-2.0 license that depends on a GPL package still introduces a GPL obligation. The check must resolve the transitive dependency tree — or at least the relevant slice of it — at generation time.

Agent integration. The check must be part of the agent’s decision process, not a post-processing step. This means the governance layer must intercept the agent’s dependency selection, not filter its output. The distinction matters: output filtering allows non-compliant code to be generated and then discarded, which wastes compute and introduces timing issues. Interception prevents non-compliant code from being generated at all.

The supply chain argument for generation-time governance

The broader point extends beyond dependencies to every supply chain decision an AI agent makes. Dependencies are the most concrete case — they have registries, licenses, CVEs, and well-defined policies. But the same pattern applies to container base images, API endpoints, configuration values, and any external resource the agent references in generated code.

The principle is the same: if the organization has a policy about an external resource, that policy should be enforced before the agent generates code that depends on that resource. Not after. Not at review. Not in CI. At generation.

Post-hoc supply chain security was adequate when humans were the primary authors of code. Humans choose dependencies deliberately and infrequently. Agents choose them rapidly and without organizational context.

Share this post

Leave a Reply