Adding AI Features to Your SaaS in 30 Minutes

My first AI feature took 3 days. I spent 6 hours reading OpenAI docs, 4 hours figuring out streaming, another 4 hours debugging edge cases where the stream would just... stop. Then I realized I had no way to track how much each user was consuming, so I spent another full day building a credit system from scratch.

The second time I added AI to a SaaS, it took 30 minutes. Same feature complexity. Same provider. The difference was infrastructure.

When I built ResumeFast, the AI resume generation pipeline was the core product feature. But I didn't spend days on API route setup, streaming configuration, or usage tracking. All of that was already wired up in OmniKit. I just wrote the prompts and the UI.

Key Takeaways:

OmniKit ships with pre-configured API routes for OpenAI, Anthropic, and Google -- swap providers by changing one env variable

The built-in credit system tracks AI usage per user and enforces limits tied to their subscription plan

Streaming responses work out of the box with the Vercel AI SDK, including error handling and abort support

Rate limiting on AI endpoints prevents runaway costs from abuse or bugs

The Problem: AI Integration Has Too Many Moving Parts

Adding a basic AI feature to a SaaS sounds simple. Call an API, get a response, show it to the user. But production AI integration actually involves:

API route setup -- Server-side routes that securely call AI providers without exposing keys
Streaming -- Users expect real-time token-by-token output, not a 15-second loading spinner
Multiple providers -- OpenAI for GPT, Anthropic for Claude, Google for Gemini. Each has different SDKs, auth patterns, and response formats
Usage tracking -- You need to know how many tokens each user consumed so you can bill them
Credit enforcement -- Check if the user has enough credits before making the API call, not after
Rate limiting -- One user shouldn't be able to drain your OpenAI budget with a script
Error handling -- API timeouts, rate limits from the provider, malformed responses, network failures

Each of these is solvable. But solving all of them from scratch takes days, not minutes. And that's time you could spend on the AI features that actually differentiate your product.

How OmniKit Handles It

OmniKit comes with a complete AI integration layer built on the Vercel AI SDK. The SDK provides a unified interface across providers, so you write your feature once and swap models by changing configuration.

Pre-Configured Provider Setup

Three providers are configured out of the box. You just add your API keys:

# .env.local
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_GENERATIVE_AI_API_KEY=AIza...

The provider configuration lives in a single file:

// lib/ai/providers.ts
import { createOpenAI } from "@ai-sdk/openai";
import { createAnthropic } from "@ai-sdk/anthropic";
import { createGoogleGenerativeAI } from "@ai-sdk/google";

export const openai = createOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

export const anthropic = createAnthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

export const google = createGoogleGenerativeAI({
  apiKey: process.env.GOOGLE_GENERATIVE_AI_API_KEY,
});

// Switch your default model in one place
export const defaultModel = openai("gpt-4o");

Want to switch from OpenAI to Anthropic globally? Change one line. Want to use different models for different features (cheap model for summarization, powerful model for code generation)? Import the provider you want per route.

Your First AI Route in 5 Minutes

Here's a complete API route that handles streaming, authentication, and credit checking:

// app/api/ai/generate/route.ts
import { streamText } from "ai";
import { defaultModel } from "@/lib/ai/providers";
import { auth } from "@/lib/auth";
import { checkCredits, deductCredits } from "@/lib/credits";

export async function POST(req: Request) {
  const session = await auth();
  if (!session?.user) {
    return new Response("Unauthorized", { status: 401 });
  }

  const { prompt, system } = await req.json();

  // Check credits before calling the API
  const hasCredits = await checkCredits(session.user.id, "ai-generation");
  if (!hasCredits) {
    return Response.json(
      { error: "Insufficient credits. Please upgrade your plan." },
      { status: 402 }
    );
  }

  const result = streamText({
    model: defaultModel,
    system: system || "You are a helpful assistant.",
    prompt,
    onFinish: async ({ usage }) => {
      // Deduct credits based on actual token usage
      await deductCredits(session.user.id, "ai-generation", usage.totalTokens);
    },
  });

  return result.toDataStreamResponse();
}

That's it. Authentication, credit checks, streaming, and usage tracking in ~30 lines. The onFinish callback fires after the stream completes, so you deduct credits based on actual consumption -- not an estimate.

The Client Side

On the frontend, the useChat hook from the Vercel AI SDK handles the streaming UI:

// components/ai-chat.tsx
"use client";

import { useChat } from "@ai-sdk/react";

export function AIChat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading, error } =
    useChat({
      api: "/api/ai/generate",
      onError: (err) => {
        if (err.message.includes("402")) {
          // Show upgrade modal
        }
      },
    });

  return (
    <div className="flex flex-col gap-4">
      {messages.map((message) => (
        <div
          key={message.id}
          className={
            message.role === "user"
              ? "bg-muted rounded-lg p-3 self-end"
              : "bg-background border rounded-lg p-3 self-start"
          }
        >
          {message.content}
        </div>
      ))}

      <form onSubmit={handleSubmit} className="flex gap-2">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask anything..."
          className="flex-1 border rounded-lg px-3 py-2"
          disabled={isLoading}
        />
        <button
          type="submit"
          disabled={isLoading}
          className="bg-primary text-primary-foreground px-4 py-2 rounded-lg"
        >
          Send
        </button>
      </form>
    </div>
  );
}

The useChat hook handles streaming token display, message history, loading states, and error handling. You get a ChatGPT-like experience with no manual WebSocket or EventSource management.

The Credit System: Pay for What You Use

This is where most tutorials stop and real products begin. Your users need a way to pay for AI usage, and you need a way to enforce limits.

OmniKit's credit system ties directly into Stripe subscriptions. Each plan defines how many AI credits the user gets per billing period:

// config/plans.ts
export const plans = {
  free: {
    name: "Free",
    aiCredits: 1000, // ~10 basic generations
    features: ["Basic AI generation", "5 projects"],
  },
  pro: {
    name: "Pro",
    price: 19,
    aiCredits: 50000, // ~500 generations
    features: ["All AI models", "Unlimited projects", "Priority support"],
  },
  business: {
    name: "Business",
    price: 49,
    aiCredits: 200000, // ~2000 generations
    features: ["Custom models", "API access", "Team sharing", "Analytics"],
  },
};

The credit check and deduction functions work against a simple database table:

// lib/credits.ts
import { db } from "@/lib/db";
import { userCredits } from "@/lib/db/schema";
import { eq } from "drizzle-orm";

export async function checkCredits(
  userId: string,
  operation: string
): Promise<boolean> {
  const credits = await db.query.userCredits.findFirst({
    where: eq(userCredits.userId, userId),
  });

  if (!credits) return false;

  // Each operation type can cost different amounts
  const cost = getOperationCost(operation);
  return credits.remaining >= cost;
}

export async function deductCredits(
  userId: string,
  operation: string,
  tokensUsed: number
) {
  const cost = Math.ceil(tokensUsed / 10); // 10 tokens = 1 credit

  await db
    .update(userCredits)
    .set({
      remaining: sql`${userCredits.remaining} - ${cost}`,
      totalUsed: sql`${userCredits.totalUsed} + ${cost}`,
    })
    .where(eq(userCredits.userId, userId));
}

function getOperationCost(operation: string): number {
  const costs: Record<string, number> = {
    "ai-generation": 10,
    "ai-summarization": 5,
    "ai-chat": 3,
  };
  return costs[operation] || 10;
}

When a Stripe webhook fires for a new subscription or renewal, the credit balance resets automatically. When a user upgrades mid-cycle, the difference is prorated. All of this is handled by existing OmniKit infrastructure -- you just define the credit amounts per plan.

I wrote about the true cost of building SaaS from scratch before. Implementing a credit system alone -- with proper database tracking, Stripe integration, proration, and enforcement -- is easily a 20-30 hour project. Here it's configuration.

Rate Limiting AI Endpoints

AI API calls are expensive. A single GPT-4o request can cost $0.01-0.10 depending on context length. If someone scripts 10,000 requests against your endpoint, that's $100-1,000 on your OpenAI bill.

I covered rate limiting in depth already, but AI endpoints need special treatment. You want tighter limits and per-user tracking:

// lib/ai/rate-limit.ts
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";

const redis = Redis.fromEnv();

// AI endpoints get stricter limits than regular API routes
export const aiRateLimit = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(20, "1 m"), // 20 AI requests per minute
  prefix: "ratelimit:ai",
});

// Even stricter for expensive models
export const premiumModelRateLimit = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(5, "1 m"), // 5 requests per minute
  prefix: "ratelimit:ai-premium",
});

Then in your route, check the rate limit before the credit check -- because rate limit checks are free, but credit checks hit the database:

// In your API route
const { success } = await aiRateLimit.limit(`user:${session.user.id}`);
if (!success) {
  return Response.json(
    { error: "Too many requests. Please slow down." },
    { status: 429 }
  );
}

This layered approach -- rate limiting at the request level, credits at the usage level -- means you're protected from both abuse and accidental cost overruns.

Practical AI Features You Can Build

Once the infrastructure is in place, here are features you can add in minutes, not days.

Content Generation

Blog post drafts, product descriptions, email copy. The most common AI SaaS feature:

const result = streamText({
  model: openai("gpt-4o-mini"), // Use the cheaper model for bulk content
  system: `You are a copywriter. Write in a conversational,
           direct tone. No fluff. No corporate speak.`,
  prompt: `Write a product description for: ${productName}.
           Key features: ${features.join(", ")}`,
});

Summarization

Turn long documents, feedback threads, or support tickets into concise summaries:

const result = await generateText({
  model: anthropic("claude-sonnet-4-20250514"),
  system: "Summarize the following into 3-5 bullet points. Be specific.",
  prompt: documentText,
});

Note the generateText function instead of streamText -- for summaries, you typically want the full response before displaying it.

Chat with Context

Give the AI access to your product's data for contextual conversations:

const result = streamText({
  model: defaultModel,
  system: `You are a support assistant for ${productName}.
           Use the following documentation to answer questions.
           If you don't know the answer, say so.

           Documentation:
           ${relevantDocs}`,
  messages: conversationHistory,
});

Smart Categorization

Automatically tag, categorize, or route incoming data:

const result = await generateObject({
  model: openai("gpt-4o-mini"),
  schema: z.object({
    category: z.enum(["bug", "feature", "question", "praise"]),
    priority: z.enum(["low", "medium", "high", "critical"]),
    summary: z.string(),
  }),
  prompt: `Categorize this customer feedback: "${feedbackText}"`,
});

// result.object is fully typed: { category: "bug", priority: "high", summary: "..." }

The generateObject function from the Vercel AI SDK returns structured, typed data. No JSON parsing. No prompt engineering to get the right format. It uses the schema to constrain the model's output.

Switching Providers Without Rewriting Code

This is the part that saves you when OpenAI has an outage at 3 AM, or when Anthropic releases a model that's better for your use case.

Because the Vercel AI SDK abstracts the provider interface, switching models is a one-line change:

// Before: OpenAI
const result = streamText({
  model: openai("gpt-4o"),
  prompt: userPrompt,
});

// After: Anthropic (same code, different model)
const result = streamText({
  model: anthropic("claude-sonnet-4-20250514"),
  prompt: userPrompt,
});

// Or: Google (still the same code)
const result = streamText({
  model: google("gemini-2.0-flash"),
  prompt: userPrompt,
});

Same streaming behavior. Same error handling. Same credit deduction. The only thing that changes is which model processes the request.

I've used this in practice when OpenAI's API had degraded performance for a few hours. Switched the default model to Claude, pushed the env var change, and users didn't notice. No code deploy. No rewrite. Just a config change.

What 30 Minutes Actually Looks Like

Here's the actual timeline when you start from OmniKit:

Minutes 0-5: Add your API keys to .env.local. Pick your default model.

Minutes 5-15: Create your AI API route. Define the system prompt for your specific use case. Set the credit cost for the operation.

Minutes 15-25: Build the frontend component. Wire up useChat or call the API with fetch. Add a loading state and error handling.

Minutes 25-30: Test it. Adjust the prompt. Watch credits deduct in the database. Ship it.

Compare this to how I launched 2 profitable SaaS products in 3 weeks. The AI features in ResumeFast -- which are the core product -- took about an afternoon. The prompts and UI took longer than the infrastructure, which is exactly how it should be.

Common Mistakes to Avoid

1. Calling the AI API from the client. Your API key will be exposed. Always route through a server-side API route. OmniKit's routes are all server-side by default.

2. Not setting spending limits. OpenAI and Anthropic both let you set monthly spending caps in their dashboards. Set them. A bug in your code or a spike in traffic can empty your account overnight.

3. Deducting credits before the response completes. If the AI call fails after you've already deducted, the user loses credits for nothing. Use the onFinish callback to deduct only on success.

4. Using the most expensive model for everything. GPT-4o-mini and Claude Haiku are dramatically cheaper and perfectly fine for categorization, summarization, and simple generation tasks. Reserve the flagship models for complex reasoning.

5. Ignoring rate limiting. I said it before but it bears repeating. One bad actor or one buggy client-side retry loop can cost you hundreds of dollars in API fees. Rate limit everything.

The Bottom Line

AI features are table stakes for SaaS in 2026. Your users expect them. Your competitors have them. The question isn't whether to add AI -- it's how fast you can ship it.

The gap between "AI is cool" and "AI is in production" used to be weeks of integration work. Streaming, credits, rate limiting, error handling, provider abstraction -- each one is a rabbit hole.

With OmniKit, that gap is 30 minutes. You spend your time on prompts, UX, and the product decisions that actually matter. Not on plumbing.

The founders who ship AI features this week will capture users that the founders shipping "next quarter" never will. In the vibe coding era, speed is the product.

Questions about adding AI to your SaaS? Reach out at raman@omnikit.dev or join the Discord.

Adding AI Features to Your SaaS in 30 Minutes - A Practical Guide