GPUs are insanely expensive these days. With token costs rising as well, I have even switched to running a local LLM using Claude Code to keep costs down.

But there are times when my local setup just does not have enough power, and I need something that could fill that gap without me having to rent a raw VM and set everything up from scratch every time.

Google AI Studio API Keys management page.
You can get a free Gemini API key right now with no billing required — here's what to do with it

I grabbed the free key out of curiosity and immediately had ideas.

3

Running models locally is great, until it isn't

Memory, memory, and then some more memory

Lightning AI is a cloud workspace that lives in your browser. You open it, and you get a persistent development environment with storage, an IDE, and access to GPU compute when you need it. The keyword there is persistent. Unlike a raw GPU VM that you spin up, use, and lose your work on, a Studio keeps your files and environment intact between sessions. You set things up once and pick up exactly where you left off.

The GPU options are also pretty versatile under the free tier. A T4 GPU at 0.19 credits per hour is the entry point. It has 16GB of VRAM, which is enough to run a 7B or 13B model comfortably for inference. If you have been running those sizes locally, this is roughly the same experience, except it is not touching your machine at all. Step up to an L4 GPU at 0.48 credits per hour, and you get 24GB of VRAM, which puts you in 30B model territory without breaking a sweat.

At the top end of the free tier, you can get an RTX PRO 6000 with a staggering 96GB of VRAM as well. To put that in perspective, that is enough to load models that would require a Mac Studio just to fit in memory. Most people will never need that, but it is there when a project calls for it.

The practical upside is that you are not paying for compute you are not using. You develop on a free CPU Studio, and when you actually need the GPU, you switch to it and use the 15 free monthly credits (or pay for more, if needed).

You can do some cool stuff with it

Test before deployment

hermes-agent running on a macbook
Raghav Sethi/MakeUseOf

Lighting AI also has a template library, which makes it much easier to use than just a GPU rental service, where you host your own models. There are pre-built environments for things you'd otherwise spend an afternoon setting up yourself.

There is an OpenClaw template where you just need to clone the environment into your studio to get OpenClaw running in the cloud with no Mac Mini or dedicated machine. The whole thing runs in the cloud for you, and it also makes for a fantastic testing ground for your setups. Since it's isolated from your networks as well, you can use it for sanity checks before deploying on your own environments.

I actually prefer Hermes agent over OpenClaw. If you don't know what it is, Hermes is basically an AI agent that can learn how you like things done over time and get faster at recurring tasks. And guess what? There is a template for that, too.

Gemma 4 responding to a prompt on a MacBook in LM Studio
The fix for local LLMs was never a bigger model

My local LLM kept choking on context until I added this 500MB model.

1

Beyond those two, the template library covers a lot of other ground. There are ready-to-go setups for turning speech into text, hosting your own image generation tool, building a chatbot that can read and answer questions from your own documents, and deploying any model as a private API endpoint you can call from other apps. Most of these would take a few hours to set up from scratch. The templates get you there in minutes.

For inference, you get a lot of open-weight models, like the Gemma 4 or MiniMax 2.5 families. You can even use your credits towards closed SOTA models like Opus 4.8, or even some legacy models like GPT-4o.

There is a learning curve, and it's a steep one

You'll need to know how to code

and agent.md file open in Lightning AI
Screenshot by Raghav - NAR

None of this is click-and-go in the way that using Claude or ChatGPT is. The templates get you started faster than setting things up from scratch, but you still need to know what you are looking at once the environment loads. If you have never written a line of code before, Lightning AI is going to be difficult, even if you are using AI to help you through it.

This is not a criticism of the platform. It is just what it is. Lightning AI is built for people who are already doing AI development and want a better environment to do it in, not for people who want to run a chatbot without touching any code.

If you are in the first group, though, it is a pretty strong option. The free tier is generous enough to properly evaluate it. The GPU access is straightforward once you are set up. And having a persistent cloud environment that connects to your local IDE via SSH means you are not locked into working in a browser if you do not want to be. You get the compute without giving up the workflow you already have.

LM Studio running an AI chat.
I’ll never pay for AI again

AI doesn’t have to cost you a dime—local models are fast, private, and finally worth switching to.

7

There are other options as well

One more thing worth mentioning. If you are mainly looking to run inference on open-weight models rather than host your own setup, NVIDIA Build is worth checking out before you commit to anything.

It has a solid free tier, gives you access to the latest open-weight models without any setup, and is a much lighter option if hosting your own endpoint is not actually what you need.

lightning AI logo

Lightning AI is a cloud platform that gives you a persistent GPU-powered workspace in your browser for building, training, and deploying AI models, without any infrastructure setup.