Renting your very own GPU from DigitalOcean

Peter Krupa October 23, 2025

If you want to play with some bigger large language models (LLMs) but you’re not ready to spend thousands of dollars on your own rig, one option is to rent a GPU from a place like DigitalOcean. In this article, I’ll walk you through doing that. For this project, we’ll be using:

We’re also going to use:

Step 1: Get a GPU Droplet

There are a number of places you can rent a virtual machine with a GPU, but DigitalOcean is the cloud provider I found to be the easiest and most cost-effective for small-scale use cases. DigitalOcean’s VMs are called “droplets.” First, you have to request access to GPU droplets, as I guess they don’t rent them to just anyone.

Then, select the OS. I’m using the one that comes pre-loaded with all the drivers needed to run a GPU:

Select the OS

Pick your machine. This will depend on which model you’ll be running. Since it’s just for a single user, the only criteria is a GPU with enough VRAM to fit the entire model. We want to run llama3.3:70b, which is 43gb, so the RTX 6000 with 48gb of VRAM is enough. As of right now, this droplet is renting for US$1.57/hr.

Select the GPU

The last thing you’ll want to do is add an SSH key to make it easy to access your droplet from a terminal, then pick a name. I’m going with gpu-inference-droplet because I want it to be easy to remember. Then hit CREATE!!

IMPORTANT: Digital Ocean charges by the hour, and you will be charged as long as your droplet exists. To stop the meter running, you have to destroy the droplet, not just stop it.

Step 2: Set up your droplet

Once DigitalOcean has finished creating your droplet, you can copy the droplet’s IP address from the dashboard and use that to SSH into your VM and start setting it up.

$ ssh root@<droplet IP address>

The first thing I like to do is install Tailscale. Assuming you already have Tailscale running on your machine, this will put the VM on the same virtual private network and make it much easier and more idiomatic to set up connections. From your shell session, run:

$ curl -fsSL https://tailscale.com/install.sh | sh

… and then follow the authentication instructions to add the VM to your tailnet (if you need more details on how Tailscale works and how to set up a tailnet, go here). Now if I want to SSH into the VM, I just do ssh root@gpu-inference-droplet and I don’t have to worry about looking up the IP address every time.

Next, you want to install Ollama as the server for our LLMs. Again, pretty straight-forward, just do:

$ curl -fsSL https://ollama.com/install.sh | sh

Pick some models to install from Ollama’s library. For now, I’m going to go with llama3.3:70b (43gb), so:

$ ollama pull llama3.3:70b

Now everything should be all up and running! Exit your SSH session back to your workstation and run the following to test:

$ curl -v http://gpu-inference-droplet:11434/api/tags

That should return a list of available models that looks something like this, which means your Ollama server is up and listening and you can reach the endpoint:

Test your endpoint

Step 3: Make it easy to launch

As long as your GPU droplet exists, DigitalOcean is going to charge you for it. So you might want to destroy it when you’re not using it so it doesn’t end up costing you hundreds or thousands of dollars a month. But destroying it means you’d have to re-do all this setup!

Fortunately, there’s a solution.

All you have to do is take a snapshot of your GPU droplet. (This can take awhile to save, as it’s a pretty large volume.) Then the next time you want to spin up your GPU for inference, just use the snapshot when you are selecting a machine, and all your settings and downloaded models will be ready to go!

Spin up from a snapshot

If you want to make it even faster, you can install the DigitalOcean CLI doctl, and then you can spin up your custom GPU droplet from a snapshot programmatically in seconds with a single line of bash. Follow these instructions to install doctl and authenticate.

Then you can run a command that looks like this:

$ doctl compute droplet create gpu-inference-droplet \
--image 000000000 \
--size gpu-6000adax1-48gb \
--region tor1 \
--vpc-uuid <your UUID here> \
--tag-names '' \
--ssh-keys '0000000'

This command has a few variables you’ll need to collect to build it. First, you can name the droplet whatever you want. I’m naming mine gpu-inference-droplet to keep it consistent. Next, the --image variable should be the 9-digit ID number of the snapshot you saved of your droplet. To find it, run:

$ doctl compute image list --public --format ID,Name,Slug,Type | grep "snapshot"

.. then copy the first number on the left. The --size flag is whatever GPU droplet you’re using. To see a list of options, run:

$ doctl compute size list | grep "gpu"

For --region, I use tor1 because GPU droplets aren’t available in all regions, and as of this writing, the Toronto datacenter seems to have the most options. Next, for --vpc-uuid, enter your DigitalOcean unique user ID. You can find that by running $ doctl account get.

You can use --tag to tag your droplet whatever you want. Lastly, to add an SSH key to your droplet, use the --ssh-keys flag with the ID number of an SSH key you’ve added to your DigitalOcean account. To see a list of your SSH keys, run:

$ doctl compute ssh-key list

After you spin up your droplet, check its health to make sure everything is running:

$ doctl compute droplet list

Then when you’re done using your LLM, be sure to destroy the droplet with:

$ doctl compute droplet delete gpu-inference-droplet

Save these commands to a text file somewhere, then anytime you want to spin up your own, private, powerful LLM, copy/paste into the terminal and you’re up and running in seconds!

Step 4: y tho??

Setting up an Ollama server is neat, but to make it useful, you have to have some kind of front end that handles the API responses. There’s lots of options out there, but here I’ll discuss two: Open WebUI and Kilo Code.

Open WebUI

The first is Open WebUI, which provides a basic ChatGPT-style interface. Installing it is beyond the scope if this piece, but you can go here for instructions. Once you have it running, open the settings and click on Connections. This is where you can add/remove endpoints for your LLM server.

Showing your connections

Click on the little plus symbol, and add the following to add your Ollama server (11434 is the default port used by Ollama):

Add a connection

Now your connection is added, and Open WebUI has all your models!

Connection added

Select one from the dropdown menu to get started.

Open WebUI model selection dropdown menu

Kilo Code

I don’t think anyone could argue that a model small enough to run on a single GPU would be as good as Anthropic’s Claude models when it comes to coding assistants. But the Chinese open-weight models keep getting better and faster, so who knows!

The Qwen family of models from Alibaba is one example, so let’s try it out. SSH into your GPU droplet and pull qwen3-coder:30b, a mid-sized long-context model that comes in at a svelte 19gb:

$ ollama pull qwen3-coder:30b

Next, you’ll have to increase the default context window. Essentially, coding assistants require a pretty big context window, and Ollama automatically sets its context window very small. Run the following to create a version of qwen3-coder:30b with a larger default context window:

$ ollama create qwen3-coder:30b-32k -f - <<'EOF'
FROM llama3.1:8b
PARAMETER num_ctx 32768
EOF

Then run ollama rm qwen3-coder:30b to remove the original if you want to clear up your disk space.

I use the Kilo Code extension in VS Code. Once you have it installed, open it from the sidebar, go to the settings, and hit the little “+” next to Configuration Profile, name your new configuration profile (“droplet”), and add the details for your Ollama endpoint:

Adding a configuration profile to Kilo Code

Once the models load, select whichever one you’re using and hit “Save”.

And that’s basically it!

Back in the coding assistant sidebar, select from the little menu what kind of help you’d like (Architect, Code, Ask, Debug, Orchestrator) and prompt away. And as always, when you’re done with your coding session, remember to open your terminal and run $ doctl compute droplet delete gpu-inference-droplet to destroy your droplet.

Final thoughts

With open-weight models geting better, inference geting faster and more efficient, and new AI-related breakthroughs happening all the time, my expectation is that local inference will become more viable, especially as LLM technology matures and people have a better idea what they actual want from it.

Who knows, maybe someday we’ll all have our own AI boxes at home! For now though, renting a GPU offers an accessible option for people who just use it once in awhile, or who want to try out local inference before they invest in expensive equipment.

If you have any questions or need to get in touch, you can always contact me on Mastodon or LinkedIn. Hope this helps someone!