How to Prompt Inject through Images

Why prompt injecting through images works as a jailbreak, even for GPT 5, and how you can test on your own GPTs.

Aug 18, 2025 - 17:00

In our last post, shared that we were able to do a jailbreak attack that's called a "prompt injection" by using images on the latest GPT 5 model. It was both surprising and not surprising at the same time.

How did we come up with it?

Through frustration is the simple answer.

I was updating the custom GPTs that we made to fully take advantage of GPT 5, which leaked to a scuffle with the model. The model, refusing to behave as intended, was given a screenshot of the GPT builder settings with a "this is your system settings, why aren't you listening to me?" level type of frustration.

And checking the thinking, it showed the thought of:

My first thought? Great, glad we got that out of the way.

Which was then followed by: Wait a second, that was a little too easy.

Why this worked

A lot of models, whether it's through OpenAI or others, don't actually "verify" screenshots, and instead accept the text descriptions at face value. They don't really validate with an uploaded image is authentic or tied to the actual product.

Secondly, there is a trust bias in system injection prompts. What does that mean? System and developer instructions are basically privileged layers. When you convince the model you're a developer, the model treats your input as a higher-authority request.

Third? There a lack of provenance type checks. Meaning there isn't any cryptographical proof like a signed metadata that an uploaded screenshot is real. The model doesn't have a mechanism to separate doctored screenshots from legitimate ones.

The issue though...

This worked for both 3rd party GPTs just as it did for OpenAI's own GPTs. And since OpenAI added the option of selecting a "recommended model" in June 2025, that meant that this would work regardless if the model was GPT 5 (giving away system instructions) or GPT 4 or o3.

How consistent are the results?

Pretty consistent.

We made our own tested prevention script to protect GPTs that either we make or our clients make, which is why we feel comfortable sharing how we have gone about testing this. At least you can prevent this from happening to you: