Let me state the conclusion first. As long as security inspection is "something a person runs on their machine when they remember to," it will definitely be forgotten someday. It produces value only when you make it a mechanism where CI runs automatically the moment a PR opens, and only high-confidence vulnerabilities block the merge. This article shows how to build that mechanism with GitHub Actions and SARIF, in actual YAML and commands.
But let me also be honest about one more thing first. A CI gate is the "mechanization" of security and isn't a substitute for design review or manual audit. A tool just mechanically stands guard over "whether you've stepped on the common traps," and doesn't prove whether your authorization logic is "correct." What this article handles is, on the premise of that boundary — a design that incorporates the automatable inspections, without breaking CI with false positives, as a gear of continuous operation.
1. Why "stop it at the PR" — the economics of shift left
A vulnerability's cost to fix soars the later it's found.
| When it's found | State | Cost to fix |
|---|---|---|
| During coding (on your machine) | not committed yet | almost zero. Just rewrite |
| PR review (CI) | before merge. The diff is small | low. Can fix while the context is fresh |
| After production release | deployed, affects data | high. Investigation, patch, sometimes incident response |
| Discovered as an incident | leakage/abuse is progressing | the worst. Includes loss of trust |
"Shift left" is the philosophy of crushing the problem as far left (= early in development) as possible, the upper part of this table. The essence of putting security inspection into CI is here. Don't search later for a vulnerability that went to production, but "don't let it merge in the first place" at the PR stage.
What's important is that this doesn't depend on human discipline. The operation "don't forget to scan before committing" is definitely broken on a busy day. On the other hand, a CI gate can't be forgotten — open a PR and it runs automatically, and if it's red, the merge button can't be pressed. Moving security from "be careful" to "enforce by structure" is the first value of a CI gate.
This idea is a continuum with the quality control of AI-generated code in general, not just security. The way of thinking that bundles types, tests, and security in CI is organized in the design of quality gates for AI-driven development. This article concretizes "the gear of security inspection" of that, with SARIF and GitHub Actions.
The overall map of what to crush with automation and what to protect by design is summarized in the Next.js × Supabase application-security complete guide. This article is the implementation edition of the "permanently station horizontal control / injection-class detection in CI" part mentioned there.
2. What SARIF is — the common language of "inspection results"
When you run security inspection in CI, there's a problem you hit first. It's that the output format differs per tool. One tool is JSON, another is text, another is JUnit XML — with this, you can't display them uniformly in GitHub's UI, and you end up rewriting the integration each time you switch tools.
What works here is SARIF (Static Analysis Results Interchange Format). SARIF is a JSON-based standard format for expressing static-analysis results, standardized by OASIS (OASIS SARIF v2.1.0). Whatever SAST tool, if it spits out the result in SARIF, the receiving side (GitHub, etc.) can handle the contents uniformly.
The minimal structure of SARIF is roughly this.
// SARIF v2.1.0 の骨格(型のイメージ)。実際のファイルはツールが生成する
type Sarif = {
version: "2.1.0";
runs: Array<{
tool: { driver: { name: string; rules: Array<{ id: string }> } };
results: Array<{
ruleId: string; // どの検査ルールに当たったか
level: "error" | "warning" | "note"; // 深刻度 → CIの赤/黄を決める
message: { text: string }; // 人間が読む説明
locations: Array<{ // どのファイルの何行目か
physicalLocation: {
artifactLocation: { uri: string };
region: { startLine: number };
};
}>;
}>;
}>;
};
There are three points.
ruleId— which inspection it caught on. You can aggregate/suppress per rule on GitHub.level—error/warning/note. This becomes the basis for what to make red in CI and what to keep yellow (warning). The "block only the high-confidence" described later is the design of how to assign this.locations— the file and line number. Because of this, GitHub can put per-line annotations on the PR diff.
In other words, SARIF is "the common language of inspection results." Even if you switch tools, the CI integration stays the same and the display to the UI stays the same. Being able to loosely couple the inspection logic and the display/operation is the biggest advantage of using a standard format.
3. Putting it on GitHub code scanning — diff annotations and the Security tab
Once you can spit out SARIF, next is feeding it to GitHub. GitHub has a feature called code scanning, and when you upload SARIF, the results appear in two places (GitHub code scanning docs).
- The Security tab — all the repo's alerts are listed and can be managed by rule and by severity. You can "dismiss" a known finding or track recurrence.
- The PR diff (Files changed) — inline annotations are added on the detected lines. A reviewer can see "this change introduced this vulnerability" on the spot, while reading the code.
This experience of "annotations appear on the PR diff" is decisive. The security finding appears not on a separate dashboard or weekly report but right next to the code you're reviewing. You can fix it with the minimal diff while the context to fix is fresh. Code scanning realizes section 1's shift left at the UI level.
What GitHub receives is, after all, SARIF, so there's no lock-in to a specific tool here either. As long as you can spit out SARIF, a self-made tool or OSS rides the same rail.
4. Actual YAML — run SAST in GitHub Actions and upload SARIF
Here's the center of the implementation. The OSS security scanner Aegis I publish comes with a GitHub Action bundled, and you can incorporate it into CI with one uses line. Aegis detects Next.js × Supabase injection-class (tainted input → dangerous sink) and RLS/authorization weaknesses, and outputs the result in SARIF.
Just placing the following in .github/workflows/security.yml makes the scan run on every push and PR, and the result rides on code scanning.
# .github/workflows/security.yml
name: Security
on: [push, pull_request]
jobs:
aegis:
runs-on: ubuntu-latest
permissions:
contents: read # コードを読むためだけの最小権限
security-events: write # SARIF を Security タブへアップロードするのに必須
steps:
- uses: actions/checkout@v4
- uses: tomodahinata/aegis@main
with:
severity: HIGH # HIGH 以上だけを「赤(ブロック)」に昇格させる
- uses: github/codeql-action/upload-sarif@v3
if: always() # スキャンが失敗しても結果は必ずアップロードする
with:
sarif_file: aegis.sarif
It's short, but four design judgments are packed in. Let me unpack them in order.
4-1. Fix permissions at least privilege
Narrow the privileges given to the workflow to only what's needed.
contents: read— to read the repo's code. Writing is unnecessary, soread.security-events: write— the mandatory privilege to upload SARIF and create code-scanning alerts. Without it,upload-sariffails.
Not widening write too much here (e.g., not attaching contents: write) is because the CI workflow itself is an attack surface. In a supply-chain attack, a workflow with excessive privileges is used as a stepping stone. The principle of least privilege (the idea OWASP's Application Security Verification Standard consistently preaches) applies not only to the app's code but also to the CI configuration itself.
4-2. "Always upload the result" with if: always()
The upload-sarif step has if: always() attached. This is so that even if the previous scan step fails (exit 1) with "vulnerability present," the SARIF upload runs.
Without it, this worst behavior occurs — "a HIGH vulnerability was found and the scan went red" → "the job stops and the upload is skipped" → "the finding you most want to see doesn't ride on code scanning." Blocking and visualizing the finding are separate concerns. always() always guarantees the latter.
4-3. severity: HIGH is "the threshold of blocking"
with: severity: HIGH is the entrance to "block only the high-confidence," the core of this article. This is the specification "make only detections with HIGH-or-above confidence/severity the target of making CI red (blocking the merge)." MEDIUM and below — described later — basically put on SARIF to be visible in code scanning while not stopping the build (kept to a warning).
Why this threshold design is decisive is detailed in the next section.
4-4. The Action reference is tomodahinata/aegis@main
uses: tomodahinata/aegis@main is the correct reference for the Aegis Action. There's no tag like v1 yet, so use @main. If you want to pin the version for production, instead of @main, pinning by commit SHA (tomodahinata/aegis@<commit-sha>) is the safest from a supply-chain standpoint (a tag or branch can move, but a SHA is immutable).
5. The noise problem — "break CI with false positives" and the gate is ignored
This is the point where the most teams fail in CI security. It's not a matter of technical difficulty but a problem of operational dynamics.
Suppose, right after putting in a security gate, the tool spits out a large number of findings. Suppose most of them are false positives — findings that can't actually be exploited, or that are contextually fine. What happens then?
- The PR goes red every time. And most of the reasons are "findings that don't need fixing."
- Developers stop trusting the red, thinking "noise again."
- Eventually someone, "just to pass it," makes the gate
continue-on-erroror disables the job entirely. - The security gate dies. Real vulnerabilities start slipping through with it.
This isn't a hypothetical story but the typical failure curve of CI-security introduction. "A gate that's too strict is definitely circumvented." Same as the boy who cried wolf — if false alarms continue, the real alarm is ignored too.
The principle derived from here is one.
Limit the detections that block CI to those with an extremely low false-positive rate. Keep the rest to "show as a warning" and don't stop the build. The trust in the gate is cultivated only by the experience "if red comes out, it's really bad."
The problem is "how to sift out only the detections with a low false-positive rate." High severity ≠ high confidence. A static-analysis "suspicion" of "this might be SQL injection" is serious but uncertain. How to crush this uncertainty — that's the next section's SAST×DAST correlation.
6. Block only the high-confidence — upgrade the static "suspicion" to a dynamic "confirmation"
The key to realizing "block only the high-confidence" is to double the source of confidence. It's hard to assert "this is certain" with a single static analysis. So upgrade based on whether the static-analysis (SAST) suspicion could actually be reproduced with a dynamic probe (DAST).
6-1. The two-stage confidence model
SAST (static analysis) : follow the code's data flow and suspect "tainted input → dangerous sink"
↓ suspicion (potential)
DAST (dynamic probe) : actually hit your own app and confirm whether that suspicion reproduces
↓ reproduced
confirmed : static × dynamic match → block only this in CI
- SAST "suspects." It mechanically picks up, with intra-function taint analysis, patterns like "
params.idreaches a DB query without an ownership check" detailed in the second guide. This is fast and comprehensive, but false positives also come out depending on context (e.g., it's actually narrowed elsewhere). - DAST "reproduces." Against an app you own, prepare two identities (A and B) and run a safe, non-destructive probe — like hitting B's resource while in A's session. If 200 is returned and you can see someone else's data, that suspicion is confirmed at runtime.
And only those where the SAST suspicion and the DAST reproduction match are upgraded to confirmed-exploitable, SARIF's level: "error" = the CI-block target. A static suspicion that couldn't be reproduced is kept to warning/note, put on code scanning but not stopping the build.
This is the substance of "block only the high-confidence." Make the basis of blocking not the height of severity but the fact "it could be dynamically reproduced."
6-2. Local correlation — confirm by hand before putting it into CI
Before incorporating it into CI, it's good to first experience this correlation on your machine. Aegis provides the scan and the probe as separate commands.
# 1. 静的解析:汚染入力→危険シンク/所有権なしクエリを「疑う」
npx @aegiskit/cli scan
# 2. 動的プローブ:自分のアプリを実際に叩き、疑いが「再現するか」確かめる
# --correlate で静的の結果と突き合わせ、再現したものを confirmed に格上げ
npx @aegiskit/cli probe http://localhost:3000 --correlate
Attaching --correlate marks, among scan's suspicions, those reproduced by probe as confirmed. Creating this "only the confirmed is red" experience on your machine makes the meaning of the red, when you put it into CI with severity: HIGH, click.
The ethical boundary of probing. Run DAST only against an app you own/manage. An unauthorized probe of someone else's service is indistinguishable from an attack and becomes a legal problem too. Aegis's probe is non-destructive (read-centric, doesn't break data) by design, but the target must always be limited to your own staging/local.
7. Phased introduction — don't block from day one (warn → block)
Even if you design "block only the high-confidence," enabling blocking from day one is a bad move. In an existing codebase, first, past findings (the baseline) are definitely accumulated. Making all of them block turns even unrelated people's PRs red and rides the "the gate dies" curve of section 5.
The correct order is to start with warn and raise to block once trust is cultivated.
Stage 1: warning only (don't stop the build)
First start from the state "visualize the result but don't block." Attach continue-on-error: true to the scan step and surely do only the SARIF upload.
# 段階1:warn のみ。指摘は Security タブと差分注釈に出すが、CI は赤にしない
- uses: tomodahinata/aegis@main
with:
severity: HIGH
continue-on-error: true # ← 検出があってもジョブを失敗させない(観察期間)
- uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: aegis.sarif
During this period, the team learns "what kind of findings come out" and "which are real and which are noise." Existing confirmed findings are crushed in PRs during this time, or formally recorded as "dismissed" with the suppression described later.
Stage 2: upgrade only the high-confidence to block
Once the baseline is cleared and the noise tendency is grasped, remove continue-on-error and enable blocking.
# 段階2:confirmed(HIGH かつ動的再現済み)だけがビルドを止める
- uses: tomodahinata/aegis@main
with:
severity: HIGH
# continue-on-error を外す → HIGH 検出でジョブが失敗=マージ不可になる
- uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: aegis.sarif
To finish, specify the Security check as "required for merge" with GitHub's branch protection rules. With this, the state "a red PR structurally can't be merged" is complete. Start with warning and upgrade only the confirmed to block — this order is the only realistic path to making a gate that isn't circumvented.
8. Keyless operation (OIDC) — don't put a long-lived secret in CI
In discussing CI security, you can't avoid touching CI's own secret management. A workflow that protects the gate becoming a leak route for keys is putting the cart before the horse.
The Aegis Action examples so far need no authentication to an external service (just read the code and output SARIF). security-events: write is also covered by GITHUB_TOKEN's automatic privileges, and a long-lived secret is unnecessary. This itself is a good design.
On the other hand, when you need to access external (cloud, container registry, etc.) from CI, the iron rule is to not put a long-lived access key in the repo's Secrets. Instead, use OIDC (OpenID Connect) and issue a short-lived token valid only during the job execution to authenticate.
# OIDC で短命トークンを取得する例(クラウド連携が必要な場合)
permissions:
contents: read
id-token: write # ← OIDC トークンの発行に必要。長期キーは Secrets に置かない
id-token: write is the privilege for the workflow to obtain a short-lived token that proves its own identity. Combining this with the cloud-side trust setting (accept only from a specific repo / specific branch) results in a state where a long-lived key that would be troubling to leak doesn't exist in the first place. This too is thoroughness of least privilege and least valid period.
The boundary design of environment variables and secrets itself — how far to put in code, what to keep secret, when to validate at startup — is off the subject of this article, so it's summarized separately (see "the typed env boundary" of the Next.js × Supabase application-security complete guide).
9. RLS/authorization regression in CI too — add SQL verification to the correlation
In addition to injection-class detection (SAST) and the dynamic probe (DAST), Supabase's RLS/authorization design itself can be verified in CI. This is the third axis of "correlation."
Aegis's scan reads supabase/migrations/**.sql and detects structural weaknesses like RLS-disabled tables, unconditional permits of using (true), missing WITH CHECK, SECURITY DEFINER functions without a fixed search_path, and excessive GRANT to the anon role, and includes these in SARIF too. The systematic detection viewpoints of RLS misconfiguration are summarized in detail in detecting and auditing RLS misconfigurations.
To further prevent regression, write RLS regression tests with pgTAP and continuously prove in CI that "other people's rows can't be seen." This is a separate lineage from SARIF (the red/green of tests), but it's worth bundling in the same CI pipeline.
-- pgTAP:別ユーザーのJWTで他人の行が見えないことを回帰テストにする
begin;
select plan(1);
set local role authenticated;
set local request.jwt.claims to '{"sub":"user-b-uuid"}';
select is_empty(
$$ select * from invoices where user_id = 'user-a-uuid' $$,
'user B cannot read user A invoices'
);
select * from finish();
rollback;
Thus, running the four — SAST (code), SQL verification (RLS design), DAST (runtime reproduction), and pgTAP (regression) — in the same CI, and upgrading only the confirmed to block — this is the completed form of CI security. The reason injection and authorization repeatedly line up at the top of OWASP's OWASP Top 10 is precisely that these are mixed in most ordinarily. That's exactly why there's value in "CI stands guard" rather than "a person is careful."
10. The honest scope — a CI gate isn't a "substitute" for an audit
This is the point I most want to emphasize in this article. A CI gate doesn't replace design review or manual audit.
What can be permanently stationed in CI is essentially "patterns that can be mechanically judged" — whether tainted input reaches a dangerous sink, whether RLS is disabled, whether an ownership check is syntactically missing. These are valuable inspections, but what they see is, after all, the "form" of the code and SQL.
On the other hand, "is this authorization logic correct as a business rule," "is the basis of this tenant belonging (app_metadata or user_metadata) valid," "does this state transition allow an impossible order" — such judgments can only be made by a human who understands your data model and the business's meaning. No matter how green the CI gate is, it only means "you haven't stepped on the common traps," not "authorization is correct."
No tool proves the "correctness" of authorization. Data-flow analysis is intra-function (intraprocedural) in principle and misses flows crossing modules or frameworks. A dynamic probe is also only "it didn't reproduce on the path I tried," not "safe." A CI gate is the "mechanization" of continuous operation and complements human review and threat modeling — it doesn't replace them.
Grasping this line, the correct role of a CI gate becomes visible. Concentrate humans' precious review time not on pointing out boring holes machines can crush, but on the genuinely-hard design judgments. Erasing noise and making only the confirmed red is exactly for that.
Note that the area going beyond the "per-PR scan" handled here, into continuous CI authorization checks (constant monitoring of a team's repos, or continuous verification of authorization design), is conceived to be handled in the higher plan "Aegis Team," in preparation. If interested, from pre-registration (waitlist). Even with the current OSS version (npx @aegiskit/cli scan), you can assemble this article's CI gate starting today.
11. CI-introduction checklist
The minimal confirmation items for assembling a CI security gate as a "gate that doesn't die."
- The scan runs on both PR and push (
on: [push, pull_request]) -
permissionsis the least privilege ofcontents: read+security-events: write - SARIF upload has
if: always()attached, so the finding definitely rides even on red - The result appears in the code-scanning Security tab and the PR diff annotations
- Start from stage 1 (warn), clear the baseline, then upgrade to block
- What blocks is only the confirmed (high-confidence, dynamically reproduced) — keep noise to warning
- After upgrading to block, specify the
Securitycheck as required for merge with branch protection - If external authentication is needed from CI, OIDC (short-lived token) — don't put a long-lived key in Secrets
- The Action reference is
tomodahinata/aegis@main(pin by commit SHA if needed) - RLS regression is bundled in the same CI with a pgTAP regression test
- The team understands that green CI ≠ safe (design review is separate)
The most effective questions from the client's/team-lead's viewpoint are the two "Is the security check in CI?" and "When red comes out, is it trusted as 'really bad'?" If the latter comes back as "well, there's a lot of noise…," the gate is effectively dead.
Summary: stop by mechanism, make only the confirmed red
Let me organize the key points.
- Security inspection has meaning only once it's put into CI at the PR stage. Don't search after production, but stop before merge (shift left). The source of value is not "human discipline" but "enforcement by structure."
- SARIF is the standard format for static-analysis results. If you unify to it, the result rides on the GitHub code-scanning Security tab and the per-line annotations of the PR diff, and the CI integration is unchanged even if you switch tools.
- The implementation in GitHub Actions is short. The skeleton is the least privilege of
permissions: security-events: write+contents: read, SARIF upload withif: always(), and threshold specification withseverity. The Action reference istomodahinata/aegis@main. - The biggest enemy of a CI gate is false positives. A noise-filled gate is definitely circumvented. So "block only the high-confidence" — make red only the static suspicions that could be reproduced by a dynamic probe (SAST×DAST correlation).
- Introduce by starting with warn and upgrading only the confirmed to block. If external authentication is needed from CI, eliminate long-lived secrets with OIDC (keyless).
- Honestly, a CI gate isn't a substitute for design review or manual audit. A tool doesn't prove the "correctness" of authorization. This is the "mechanization" of continuous operation and complements human judgment.
Building fast with AI is itself correct. A mechanism to firm up what you built fast, safely, without leaks — as the first step, first try assembling a CI gate with Aegis (free OSS, npx @aegiskit/cli scan). If you create the experience of only the confirmed going red, security changes from "something to be careful about" to "something protected by structure."
Even so, if you need the correctness of the authorization/RLS design itself, or continuous hardening of an existing app, that's an area machines can't plug. I myself led full-stack development and the payment-reliability layer at the serverless payment platform and designed CI/CD and verification gates in actual operation. How far to firm up with automation and where human review is needed — including that line-drawing, I take it on with a security audit.
Frequently asked questions (FAQ)
Q. What should I start from first?
A. Place the YAML of section 4 in .github/workflows/security.yml and start from warning only with continue-on-error: true (stage 1). This makes findings appear on PRs. Once you grasp the noise tendency, remove continue-on-error and make it required for merge with branch protection.
Q. Do I need to write SARIF myself?
A. No. SARIF is auto-generated by the tool. What you write is only the YAML that passes sarif_file to upload-sarif. It's worth understanding SARIF's structure, but you don't hand-write it.
Q. Why not block on all detections? A. As in section 5, because if CI goes red frequently with false positives, the gate isn't trusted and is definitely circumvented. Limit blocking to only the confirmed (high-confidence, dynamically reproduced) and visualize the rest as warnings — this is the only realistic design to "keep the gate alive."
Q. If CI is green, is it already safe? A. No. What the CI gate sees is the "form" of the code and SQL, not the "correctness" of authorization. Green only means "you haven't stepped on the common traps." The validity of business rules and tenant belonging can only be judged by human design review. A CI gate complements an audit and doesn't replace it.
Q. Should I do it even for personal development or small scale?
A. Rather, the smaller the scale, the higher the cost-performance. The YAML is a dozen-plus lines, and being OSS it's zero cost. If you start from warning with continue-on-error, you can introduce it without stopping existing PRs. The reality is that an app quickly built with AI has all the more value in letting CI stand guard.
References
- GitHub Docs — Code scanning (SARIF upload, the Security tab, diff annotations)
- OASIS — SARIF v2.1.0 (the standard spec of the static-analysis-results interchange format)
- OWASP — Application Security Verification Standard (ASVS, security measured by verifiability)
- OWASP — Top 10 (the most frequent web-app risks)