← Back to Reviews | Development

shadcn/improve Review 2026 — Codebase Audits & Smart Plans for Cheap Agents

Marcus Webb · · Rated 8.5/10 · Free (MIT)
8.5 / 10
Ease of Use 9
Features 9
Value for Money 10
Performance 8
Support & Ecosystem 7

✅ Pros

  • Brilliant cost optimization: uses expensive models for high-leverage planning work, hands execution to cheap models — typical audit costs $0.50-2.00 vs $5-20 for a full-expensive-model approach
  • Comprehensive audit categories: built-in support for security, performance, tests, bugs, accessibility, architecture, dependencies, and more — plus custom categories via configuration
  • GitHub issues integration: `--issues` flag auto-publishes plans as trackable issues, making it easy to integrate with existing workflows
  • Branch-scoped audits: `/improve branch` limits analysis to the current branch's changes, perfect for pre-PR review
  • Thoughtful separation of concerns: plans are plain markdown files that any agent (or human) can implement — no vendor lock-in to a specific execution model
  • Free and open-source under MIT license, installable with one command via Agent Skills
  • Reconciliation workflow: `/improve reconcile` verifies what was implemented, updates the backlog, and surfaces stuck tasks — maintaining the plan as a living document

⚠️ Cons

  • Requires an Agent Skills-compatible agent (Claude Code, Codex, Cursor, Cline) — not usable in standard VS Code or JetBrains without an AI agent
  • Quality of plans depends heavily on the underlying model quality; using a weak model for the audit phase defeats the purpose
  • No built-in scheduling or CI integration yet — must be triggered manually or wrapped in a custom script
  • Binary confidence in findings (HIGH/MEDIUM/LOW) without probabilistic calibration can lead to false confidence in marginal findings
Best For

developers wanting AI-powered codebase audits at minimal cost

Pricing

Free (MIT)

shadcn/improve is an open-source agent skill from the creator of shadcn/ui — the React component library that became the gold standard for design systems. With 5,200+ GitHub stars and rapidly climbing, it brings a novel philosophy to AI-assisted development: let expensive models do the thinking, let cheap models do the doing.

The idea is straightforward but powerful. Instead of paying Claude Opus or GPT-5.5 to implement every change, you use your most capable model to audit your codebase, identify issues, and write detailed implementation plans — then hand those plans to a cheaper model (or a human) to execute.

The Architecture: Planning → Execution Separation

you        →  /improve                    (expensive model, advises)
plans/     →  001-fix-n-plus-one.md       (self-contained specs)
agent      →  reads plan, implements      (cheap model, executes)

This division of labor exploits an important asymmetry: planning requires intelligence, execution requires diligence. The expensive model’s time is best spent understanding the codebase, judging priorities, and writing precise specs — tasks where intelligence compounds. The cheap model just needs to follow instructions.

How much does it save?

Based on our testing with a medium-sized Next.js codebase (~15K lines, 20 packages):

ApproachModel Cost (audit + fix 5 issues)Time
Full expensive (Claude Opus)~$12-18~8 min
Full cheap (Claude Haiku)~$0.30-0.50~5 min (but lower quality)
improve audit (Opus) → cheap execute~$0.80-2.00~6 min

The sweet spot: you get 90%+ of the quality of a full expensive session at 10-15% of the cost.

Getting Started

Installation is one command in any Agent Skills-compatible environment:

npx skills add shadcn/improve

This works in Claude Code, Codex CLI, Cursor, Cline, and any agent supporting the Agent Skills protocol. No additional configuration needed.

Core Commands

CommandDescription
/improveFull audit → prioritized findings → plans
/improve quickCheap pass: hotspots, top findings only
/improve deepExhaustive: every package, every category
/improve securityFocused security audit
/improve perfPerformance audit
/improve branchAudit only current branch changes
/improve nextFeature suggestions
/improve plan <description>Skip audit, spec one thing
/improve review-plan <file>Critique an existing plan
/improve execute <plan>Dispatch cheap agent to implement
/improve reconcileRefresh backlog: verify, unblock, retire
/improve ... --issuesAlso publish plans as GitHub issues

Real-World Testing

We ran improve against three codebases:

Codebase 1: shadcn/ui (original project)

The canonical test. /improve found:

#FindingCategoryEffortConfidence
1shadow-config duplicated in search.ts/view.ts, copies already driftedtech-debtMHIGH
2O(n²) icon migration loop in theme-switcherperformanceMHIGH
3Missing aria-labels on 3 interactive componentsaccessibilitySMEDIUM
4Deprecated API usage in build hooksdependenciesLHIGH
5Unhandled promise rejections in 2 event handlersbugsSMEDIUM

Each finding came with a detailed plan file in the plans/ directory. The plan for finding #1 included:

  • A call graph showing where the duplicated config is referenced
  • A migration strategy: consolidate into one source, update imports
  • Test cases to verify the refactor didn’t break anything
  • Rollback instructions

Codebase 2: Production Next.js app (E-commerce, ~25K lines)

Findings were more substantial:

#FindingCategoryEffortConfidence
1Unoptimized image loading on product grid (no lazy loading)performanceMHIGH
2API route missing rate limiting on checkout endpointsecurityMHIGH
3Stale cache invalidation in product listingbugsLHIGH
4Duplicate GraphQL queries in 3 componentstech-debtSMEDIUM
5No error boundary around payment formbugsSMEDIUM

The security audit (/improve security) caught the missing rate limiting — and found 2 additional issues:

  • SQL injection vector in a search endpoint (via raw query interpolation)
  • Missing CSRF token validation on the checkout mutation

Codebase 3: Rust CLI tool (~8K lines)

#FindingCategoryEffortConfidence
1Unsafe block in parser with insufficient safety commentbugsMHIGH
2Dead code: 3 functions never calledtech-debtSHIGH
3Missing error handling on file I/ObugsSMEDIUM
4Non-idiomatic use of unwrap() in library codestyleLMEDIUM

The Rust audit was impressive — the model understood the ownership model, type system, and idiomatic patterns well enough to flag genuinely useful findings.

Plan Quality

We evaluated 20 plans generated by /improve with Claude Opus as the auditing model:

MetricScore
Plans with correct root cause18/20 (90%)
Plans with actionable, specific steps19/20 (95%)
Plans successfully implemented by cheap model16/18 (89%)
False positive findings (not actual issues)2-3 per full audit (~10%)
Missed issues (found by manual review)1-2 per full audit

The false positive rate is reasonable — comparable to seasoned human reviewers. The missed issue rate is excellent for an automated tool.

How Plans Are Structured

Here’s a real plan file from our testing:

# Plan: 001-optimize-image-loading.md

## Issue
Product grid images load eagerly, causing 2.3s LCP on mobile.

## Root Cause
`next/image` components on `src/app/products/page.tsx:45-78` 
use `priority={true}` on all items instead of just the first.

## Implementation

### Step 1: Update image priority
File: `src/app/products/page.tsx`
Change: Only set `priority` on `products[0]`
```tsx
{products.map((p, i) => (
  <ProductCard 
    key={p.id} 
    product={p}
    priority={i === 0}
  />
))}

Step 2: Add lazy loading

File: src/components/ProductCard.tsx Change: Import Image from next/image and remove loading="eager"

Step 3: Verify

  1. Run npm run build and check no build errors
  2. Run lighthouse and confirm LCP < 1.5s

Rollback

git revert HEAD if LCP does not improve.


## Integration with Development Workflow

The most valuable workflow we found:

1. **Daily / weekly audit**: Run `/improve` at the start of a sprint
2. **Triage findings**: Review the findings table, pick priorities
3. **Generate plans**: Reply with "plan 1, 3 and 5"
4. **Execute in parallel**: Hand plans to cheap agents or assign to team members
5. **Pre-PR check**: Run `/improve branch` before opening a PR
6. **Weekly reconcile**: Run `/improve reconcile` to keep the backlog current

This creates a continuous improvement loop that costs pennies per cycle.

## Privacy & Data Handling

Since the skill works in your agent's local environment, all code analysis happens within your session. If you use a cloud-based model for the audit phase (which is the point — you want the most capable model), your code will be sent to that provider's API. For sensitive codebases, use a self-hosted model like GLM-5.2 or Llama 4 for both audit and execution.

## Comparison with Alternatives

| Feature | shadcn/improve | SonarQube | CodeRabbit | Custom CI linting |
|---------|---------------|-----------|------------|-------------------|
| AI-powered analysis | ✅ Deep reasoning | ❌ Rule-based | ✅ | ❌ Rule-based |
| Cost efficiency | ✅ High (split models) | ✅ Free (OSS) | $15-30/mo | ✅ Free |
| Generated plans | ✅ Markdown specs | ❌ | ❌ | ❌ |
| Agent execution | ✅ Built-in | ❌ | ❌ | ❌ |
| CI integration | ❌ Manual trigger | ✅ | ✅ | ✅ |
| Learning curve | Minimal | Medium | Low | Medium |
| Code coverage | Any language | 30+ languages | 10+ languages | Config-defined |

## Who Should Use shadcn/improve?

### ✅ Perfect for

- **Teams with large codebases** — systematic audits that scale with repo size
- **Solo developers** — catch issues before they compound, without expensive manual code review
- **Open-source maintainers** — free, MIT-licensed, published plans help contributors understand what's needed
- **Agency developers** — audit client codebases on onboarding and generate actionable improvement plans
- **Cost-conscious teams** — the model split approach reduces AI costs by 80-90%

### ❌ Not ideal for

- **Real-time / CI enforcement** — no built-in CI pipeline integration (yet)
- **Teams wanting automated fixes** — improve plans, it doesn't implement. You need an execution step
- **Very small codebases (< 1K lines)** — the overhead of the audit-pipeline isn't worth it

## Verdict: 8.5/10

shadcn/improve is one of those tools that makes you wonder why nobody thought of it before. The insight — that the expensive model should be the strategist, not the laborer — is obvious in retrospect, and the execution is polished.

The plans it generates are genuinely useful: specific, actionable, grounded in the actual codebase. The branch-scoped audit is a killer feature for PR workflows. And at 5,200+ stars after just a few weeks, the community clearly agrees.

The biggest limitation is the lack of CI integration — you need to trigger it manually. But as a developer tool for systematic codebase improvement, it's already indispensable.

**Rating breakdown:**
- Ease of use: 9/10 — One-command install, intuitive commands
- Features: 9/10 — Comprehensive audit categories, plan execution, reconciliation
- Value: 10/10 — Free, open-source, huge cost savings over full-expensive-model approaches
- Performance: 8/10 — Audit quality depends on model choice; execution handoff works smoothly
- Ecosystem: 7/10 — Growing community but only works in Agent Skills-compatible environments

---

*Tested with Claude Opus (audit) and Claude Haiku (execution) on June 18, 2026. Results may vary with different model combinations. shadcn/improve is MIT-licensed and available at github.com/shadcn/improve.*
shadcn improve code-audit agent-skills code-review ai-agents development review 2026