AI Builds the Tooling. AI Doesn't Run the Tooling.

The popular move right now is to put AI in the loop of everything. Wire a model into every task, let agents run the business, and lately, buy a little machine for your desk to run a model locally around the clock. I went the other way, and it is the single best operational decision I have made in the last year.

Here is the line I keep coming back to: AI builds the tooling. AI doesn't run the tooling.

That distinction is the whole thing, so let me be precise about it. I use AI heavily to design, write, and continuously refine the tooling that runs my business. Some of it I built once and never touched again. Most of it I iterate on constantly as I learn what the business actually needs. And the building itself is mostly thinking. The time goes into architecting and planning: knowing my own operation intimately, being creative about how to model it, and having enough technical understanding to steer the work. This is not point-an-agent-at-it-and-walk-away. AI stays very much employed here, on the building. What it seldom does is run inside the tooling, and the few edge cases where it does are deliberate, which I get to below. The jobs themselves are ordinary code on a schedule, and they call no model to do their daily work. The cost of AI lands at build and refinement time, not on every run, so the work compounds without a meter running.

Diagram: AI builds and refines the tooling on my laptop, the tooling runs deterministically on an operations server with a daily cache layer, and I read Slack rollups in a few minutes a day. — Build and refine with AI. Run without it. Read the summary, make the decisions. *(Click the diagram to enlarge.)*

The trap most people are walking into

If you let AI run your operations live, you are renting intelligence on every single heartbeat. An agent that checks your finances every hour bills you tokens twenty four times a day, whether or not anything changed. Multiply that across support, sales, billing, and reporting, and you have built a business with a meter running on every function, all day, every day. It feels modern. It gets expensive very, very quickly.

The alternative is to spend the intelligence at build time and compile it down into something dumb and deterministic that runs on its own. AI is the factory that builds and keeps upgrading the machines. It is not a worker you pay to stand on the line every shift.

Decompose before you automate

The unglamorous part comes first, and it is the part the "just point an agent at it" crowd skips. Before automating anything, I mapped my business into the functional domains that can and should run on their own: Support, Sales and CRM, Dunning (recovering failed payments), Win-Back, Trials, and Financials. Market research and web performance sit alongside those.

Each domain gets its own small, purpose-built job. Not one big brain trying to do everything. A handful of focused, boring scripts that each do one thing well. That decomposition is the actual work. The automation is easy once you have named the pieces.

The operations server by functional domain: Support, Sales and CRM, Dunning, Win-Back, Trials, Financials, Market Research, and Web Performance, each a scheduled job, most running with no AI at runtime. — Each operational domain is its own small job on the server. *(Click to enlarge.)*

A server far from my desk

Here is where I split hard from the local-mini trend. I took the whole operation off my own network and pushed it onto a server far away, a small VPS that is always on. AI wrote the agents on my laptop. I push them to that box, and they run there.

A mini humming in your office dies when your power blips and only works when you are near it. My operations server runs whether I am at my desk, on a flight, or asleep. It is reachable from anywhere in the world, and it gives me the redundancy and availability I actually need. The intelligence happens at build and refinement time, on my laptop. What is left to run is execution, and execution wants uptime, not a GPU on your shelf.

What it connects to, and what it replaced

The server does not reinvent the systems that actually run the business. It talks straight to them through their APIs. Stripe for payments and subscriptions. QuickBooks for estimates, invoices, and reconciliation. Help Scout for the support queue. Mailgun and Mailchimp for email. Slack for reporting. Those are the systems of record, and I keep every one of them.

What I dropped is the expensive layer that used to sit on top of them. The clearest example is Baremetrics. It is a genuinely good product, and it did a lot in one place: it read my Stripe data and turned it into MRR, churn, and trend reporting, and it ran a flow for recovering failed payments. But underneath, all of that is just reading and acting on an API I already have access to. My server talks to Stripe directly, so it does that same reporting and that same recovery itself, on a schedule, for nothing.

What replacing it really bought me was not the saved subscription fee, though that helps. It freed up hours of time and labor. Pulling reports, watching a dashboard, chasing failed payments by hand: that is work that now simply happens, without me or anyone else spending a day on it.

The diagram below shows this through the financials lens, because that is where the savings were clearest. The other functions look a little different. Sales, support, and the rest each connect to their own systems and retire their own tools. The pattern holds across all of them: integrate with the source of record, and drop the layer whose only job was to read it.

That is the quiet cost story most people miss. That kind of tool bills monthly, forever, and the bill grows as you grow. The replacement is a small server and code I own. I did not lose a single capability. I moved it in-house, and instead of just billing me it keeps getting better.

Slack is my dashboard

The jobs run on a schedule. The only thing on that server that "decides" anything is cron, deciding when to run. When a job finishes, it posts a short rollup to Slack: recovery results to one channel, renewals and overdue invoices to another, trial signups and expiries to a third.

I read the summaries. I do not babysit the work. The whole operation reports to me in about five minutes a day. That is the payoff of deterministic tooling: it is quiet, it is predictable, and it tells me only what I need to know to steer.

Give me the baby, not the labor.

I have said that for years. The tooling is how it finally became true across the whole operation.

The cache layer I rebuild every day

This is the piece I think most people miss, and it is what keeps the cost near zero. My jobs do not call a live API or a model every time they need context. They read a cache layer that gets rebuilt on a schedule. The billing jobs read a cached snapshot of customer accounts that refreshes daily, not on every run. Replies get classified and stored in a small local database, again with no model in the loop. The intelligence is pre-computed and parked where the cheap, dumb jobs can reach it. When AI does get involved, it reads that cache instead of reasoning from scratch, which keeps it both cheaper and more accurate. And when the dumbed-down scripts cannot answer something with high fidelity, they do not guess. They escalate to me. I make the call, and then we fold that decision back into the cache so the next run handles it on its own. It is an iterative loop, and every pass through it makes the system a little more capable.

The iterative loop: a scheduled job reads the daily cache, checks its confidence, resolves about 80 percent automatically, and escalates the low-fidelity 20 percent to me; I decide and we fold the decision back into the cache so the next run is smarter. — The low-confidence cases come to me; my decisions get folded back in. *(Click to enlarge.)*

Support: six months of building an engine, one case at a time

For the last six months I have done something deliberately slow. Every support ticket gets reviewed by me and scoped with the agent. Not skimmed, not rubber-stamped. Reviewed. That patient, manual work built a cache of roughly a thousand cases, and that cache is now the basis of my entire support agent.

The thing you learn when you actually study your own support is that tickets are not infinite. They bundle. When you read enough of them, the noise resolves into a handful of categories. Maybe it is five buckets, maybe it is ten, the number does not matter. What matters is that once you can see the buckets, you can build specific logic for each one. That logic is repeatable, and in practice it cleanly handles around eighty percent of everything that comes in. Every incoming ticket also gets graded for confidence against that history. Anything the system scores above ninety percent confidence is answered automatically. Everything below that bar stays in its bucket for a human to confirm, and the genuine edge cases, the ones no bucket fits, are the small slice that comes to me every time.

That does two things, and both of them matter.

The first is customer experience. Each bucket gets a quality, scoped response that was shaped by a thousand real cases and approved by me. Compare that to what most people are shipping right now: a raw model pointed at a ticket, responding blindly, confidently, and often badly. We have all received those answers. They are fast and they are useless. A scoped system built on your own reviewed history does the opposite. It gives a good answer because it has seen the good answer a hundred times.

The second is durability. The engine compounds. Every day I review the edge cases, the small slice that did not fit a bucket, and I fold them back in. Each day's review pays a dividend the next day, because today's edge case becomes tomorrow's handled case. The system gets a little smarter and a little more complete every single day, and none of that improvement evaporates. It is banked.

After enough of this, you end up with something quietly remarkable. A support engine that never gets tired, always runs, and never charges you to do it. The human did the thinking, keeps doing the thinking on the hard ten or twenty percent, and the machine handles the rest tirelessly. That is the whole bet, and support is where it has paid off most clearly.

Where I do let AI run, and why it stays cheap

I am not an absolutist about this. A few things genuinely need a model at runtime, and there the discipline is choosing the right model for the job rather than reaching for the most expensive one by reflex.

Two examples. Drafting a support reply needs judgment, so a capable model runs there, on demand, and only behind my review before anything goes out. Nothing is sent autonomously. Separately, my weekly competitor analysis needs to chew through a lot of content, so it runs on a deliberately cheaper model, because that is mass output where a smaller model is the correct, cost-aware call.

Knowing which tasks collapse into deterministic code and which actually need a brain each time is the entire skill. Most collapse. The few that do not get the cheapest model that can still do the job well.

The human is still the brain

The system handles orchestration. It does not handle judgment. I am still very much in the operation, and that is by design, not a limitation I am working to remove.

The server can run the jobs, build the caches, and report the results. It cannot decide which customer situation deserves an exception, which trend in the competitor data actually matters, or what the business should do next. That is me. The tooling is the nervous system. I am the brain. Anyone selling you a business that runs itself with no human in it is selling you a business that will eventually do something stupid at full speed with no one watching.

If AI disappeared tomorrow

Here is the part I like most. If every AI tool I use vanished overnight, my operations would not stop. The tooling is already in place. It is ordinary code that I own, running on a server I control. The jobs would keep firing on schedule, the caches would keep rebuilding, the Slack rollups would keep landing.

Better still, it is portable. The same agents can be repointed at a new goal, a new project, a new end game, with the heavy thinking already baked in. I did not build a dependency on AI. I used AI to build infrastructure that outlives it.

What it actually changed

I want to be plain about the outcome, because this is the part that matters to anyone running a business. This strategy made me both more productive and more profitable at the same time, which almost never happens together. I do not need a stack of excess services anymore, and I do not need excess labor. The repeatable work that used to justify another subscription or another hire is just running, quietly, on a small server. Costs went down, output went up, and the gap keeps widening because the system improves every day. The investment also pays for itself twice over. The time it frees up goes straight into building the next solution, and what I have built is not a one-off: it is a framework I can drop into another business, pointed at a new set of problems. That is the real return.

How to start, if you are running a startup

You do not need a fleet or a budget to apply this. The shape is simple:

List your functional domains. Write down the parts of your business that repeat: support, billing, follow-ups, reporting. You automate domains, not vague intentions.
Automate the deterministic 90 percent. Most of the work in each domain has no judgment in it. Have AI write a plain script for that part, and have it run on a schedule.
Build a cache, not a live dependency. Log your knowledge and state where the cheap jobs can read it. Your own reviewed history is the most valuable, and cheapest, context you have.
Reserve AI for judgment, and pick the model on purpose. Strong model where it needs to be smart, cheaper model where it needs to be voluminous, and a human in front of anything that reaches a customer.
Push it off your machine and your network. Put it on a small always-on server in the cloud so it works when you do not. The cloud is a powerful place. There is a reason enterprises do not run on a server in a closet, and you do not want to learn that lesson the hard way.

The goal was never to hand the business to AI. The goal was to use AI to build tooling that quietly runs the repeatable parts, so the scarce resource, my attention, goes to the decisions only a human should make.