GPT-4 Vision: 11 Amazing Use Cases — This is HUGE!!

Indish Marketer
7 min readOct 25, 2023
a simple and eye-catching image that pairs a single vision eye icon with the text “GPT-4 Vision: 11 Amazing Use Cases.”

Introduction to GPT-4 for Multimodal Model

I’m incredibly excited to dive into GPT-4, the new multimodal version of ChatGPT that can understand images! In this post, we’ll explore some of the amazing things this AI can do, from building apps to guessing numbers of objects, identifying plants and locations, recommending TV shows, and more. Let’s see just how intelligent this AI really is across multiple modalities.

DALL-E 3 prompt generator machine by Indish Marketer

Click Here to Get Access to Ultimate DALL-E 3 Prompt Generator for FREE

1. Creating an App from a Hand-Drawn Diagram

First up, I did a simple sketch in my notebook to lay out a basic app with a frontend, backend, and some styling. I took a picture of this and prompted GPT-4 to create the actual app from the image.

Photo of my hand-drawn app diagram

Remarkably, it generated full frontend code in HTML/CSS/JS along with a Python backend leveraging OpenAI’s API. After copying the code into files, I had a working app where you can send a text prompt and get a GPT-3 response! All from that quick hand-drawn diagram.

chatgpt vision interface
Response after the prompt
code for backend
Code for the backend of the app
code for the frontend
Code for the frontend of the app

After that, I simply went to the command shell and pasted the code, then ran the app. Here is the result:

frontend code in the coding console

After that, I ran the program, and here is what the app’s interface looks like:

ASK GPT-4 interface
DALL-E 3 prompt generator machine by Indish Marketer

Click Here to Get Access to Ultimate DALL-E 3 Prompt Generator for FREE

Creating Professional YouTube Thumbnails Using DALL-E and GPT-4 Vision

Learn how to create professional YouTube video thumbnails using DALL-E 3 and GPT-4 Vision with going for the paid subscription:

2. Guessing the Number of Beads in a Jar

Next up: can GPT-4 estimate the number of beads in a jar just from an image? This is more of a logic/math puzzle.

two jars filled with beads
Image Credit: eurekalert.org

The photo above shows jars filled with gumballs and beads, respectively. The number of gumballs pictured is 659, and the beads number 27,852. Now, we are going to use the second jar, as shown with a man holding it in the image below.

a man holding a jar of beads
Image Credit: eurekalert.org

The photo shows a large jar filled with beads. GPT-4 first broke the problem down step-by-step — estimating the volume of the jar based on the man’s head size, estimating the bead size compared to his shirt details, then calculating an approximate bead count.

Its initial guess was shockingly close to the real number! However, additional attempts showed the estimate varies wildly, proving it can’t perfectly solve visual logic puzzles yet. But impressive it can try at all!

screenshot of gpt-4 vision counting the number of beads
gpt-4 is solving math problem

3. Explaining a YouTube Video Image

I also wanted to see if GPT-4 could explain a concept from a screenshot of a YouTube video. It analyzed all the text, diagrams, and host in the image to provide a detailed breakdown of the prompt mutation techniques being discussed.

screenshot of youtube video explaining by gpt-4 vision

It even generated an example prompt based on the limited info in the screenshot! Being able to get explanations from visuals like this makes GPT-4 helpful for learning complex topics from videos or articles.

4. Generating Funny Memes

For something more lighthearted, I tried using an image of my front porch to generate funny memes. The results weren’t award-winning, but some were chuckle-worthy based on noticing my odd house number and an old stool in the photo.

a door and below it some text

5. Creating a Website from an Image

Next up, I drew a simple website layout in my notebook with boxes for header, body content, etc. I asked GPT-4 to generate the HTML/CSS and JS for a 90s hacker-themed site based on this sketch.

a website flow on a notebook

It produced valid code for a working retro site! I even iteratively asked it to add a popup alert, which it seamlessly integrated. The AI can build basic websites straight from simple drawings and descriptions.

screenshot of a website
Website from the diagram

6. Finding a Camping Spot for the Night

To test GPT-4’s reasoning abilities, I took two photos — one of a dense forest area, another of a riverside spot. I asked it to suggest the best place to camp for the night based on survival expertise.

The AI provided a detailed pros and cons evaluation of both locations, taking into account factors like shelter, resources, and hazards. It recommended camping at the edge of the forest near the river — blending the advantages of both areas. Impressive situational logic!

picutes of two forests and river

Here, you can observe the suggestions provided by GPT-4 after analyzing both images using its vision:

7. Identifying Edible Wild Plants

I stumbled upon some bright red wild berry-looking plants on a hike and snapped a photo. Asking GPT-4, it correctly identified them as rose hips, explained they are edible high in Vitamin C, but also advised carefully confirming any wild plants before eating them.

Its knowledge of flora could be very useful for hikers or survivalists when unsure if an unknown plant or mushroom is safe to consume.

photo of rose hips

8. Identifying a Flower

Along the same lines, I took a picture of an unusual purple wildflower. GPT-4 was able to accurately classify it as a “cranesbill geranium” just from the visual. Its flower recognition abilities could assist gardeners and botanists as a quick reference.

Photo of cranesbill geranium

9. Geo-guessing the Location of a Mountain

I uploaded a scenic photo taken atop a mountain I hiked in Norway. When prompted, GPT-4 visually assessed the landscape and correctly geo-guessed the general region based on the terrain being consistent with Scandinavia, particularly western Norway.

This demonstrates how machine vision can be applied to geographical location identification, similar to the viral online game GeoGuessr.

Mountain view photo

10. Fantasy Premier League Defender Recommendations

As a test of a more specialized domain, I provided GPT-4 football league standings, schedules, and player stats as images. I asked for fantasy football advice on which defenders to target in upcoming weeks.

Impressively, it analyzed the images, identified strong defensive picks, and gave sound recommendations based on the data — proving knowledge applications through computer vision.

Table of player stats and texts

11. TV Show Recommendations

Finally, for a more casual test, I simply showed GPT-4 a screenshot from The Office TV series and asked for recommendations of similar shows I might enjoy. It provided a list of popular sitcoms like it, based solely on recognizing the context of that one image.

Its vision capabilities enable relevant recommendations across many domains, from entertainment to shopping to travel and more.

Screenshot from The Office

After GPT-4 Vision analyzed the image, it presented me with the following TV series options:

DALL-E 3 prompt generator machine by Indish Marketer

Click Here to Get Access to Ultimate DALL-E 3 Prompt Generator for FREE

Conclusion and Future Explorations

In summary, these experiments demonstrated remarkable competence by GPT-4 in understanding and reasoning about diverse images. While not perfect, its multimodal intelligence points to a highly useful AI assistant as vision capabilities continue improving.

The future possibilities are exciting — nearly any task that involves comprehending visual information or scenarios could benefit from this technology. I look forward to exploring more applications of GPT-4’s computer vision and sharing what I discover! Please let me know if you have any ideas for putting this AI to the test.

--

--

Indish Marketer

Founder @ IndishMarketer.Com | Organic Marketing Expert | 41K Traffic Per Month - Website | YouTube 6.5K | Business Growth Consultation: info@indishmarketer.com