My AI agent went quiet last week.
It runs on a Hostinger VPS that I rent for about the price of a kopi a day. Its job is to read logs, check on a few sites I run, and handle small bits of business plumbing while I sleep. Ordinary stuff. Useful stuff.
Then it stopped responding.
The chain of events was familiar to anyone who has built on top of cloud LLMs. One model provider hit a daily cap. The next one’s auth token expired. A third one demanded a refresh I could not do remotely without first SSHing into the box. The agent did not crash. It just sat there, polite and useless, waiting for a brain that never showed up.
The Hidden Bill
If you have ever bought office furniture on a corporate plan, you understand the difference between capex and opex. Capex is the desk. Opex is the air-con bill, every month, forever.
Most AI tooling today runs on opex. You rent the brain. The vendor sets the meter. When they raise prices or change their rate limits, you find out the hard way. When their service hiccups, your agent goes quiet.
For a hobby project that is fine. For something doing real work, it is a quiet liability that nobody puts on a slide.
Putting The Brain In The Box
So I tried something different. I deployed Ollama on the same VPS that runs the agent and pulled down a small open model. Gemma in this case, but the choice barely matters. The point is the model now lives next to the agent, on hardware I am already paying for.
The VPS is not a powerful machine. Two virtual cores, eight gigs of RAM, a hundred gigs of disk. You will not be training anything on it. But for the kind of routine work the agent actually does, classifying messages, summarising logs, drafting short replies, a small local model is enough.
No more rate limits at 11pm. No more auth tokens to refresh. No more meter ticking in the background.
What This Means If You Run A Business
Most SMEs in Singapore are now putting AI somewhere into their workflow. Often that means signing up for a service, pasting in an API key, and quietly increasing their dependency on a vendor most of their staff have never heard of.
That is fine for now. Vendors are still subsidising usage to win market share. But the same script has played out before, with cloud storage, with email, with online ads. You start free. You end up locked in.
Running a small model on your own server will not replace the big ones. It is not supposed to. But it changes the question. Instead of which vendor do we trust, it becomes which work belongs on our own machine, and which work needs to leave it.
That is a more grown-up question. It is the kind of question you ask once your AI work starts to matter.
The Practical Bit
If you are curious to try this, the rough recipe is short. Get a VPS, eight gigs of RAM is enough to start. Install Docker. Pull Ollama. Pull a small model that fits. Point your agent or app at the local endpoint instead of the cloud one. Most modern frameworks let you swap providers in one line of config.
It will not feel as snappy as the frontier models. Replies are slower. Reasoning is shallower. But for a lot of jobs, that is a fair trade for not having a stranger’s outage become your problem.
My agent is back online. The brain is in the box now. And the box is mine.