GPT-4 Vision: 11 Amazing Use Cases — This is HUGE!!

7 min readOct 25, 2023

a simple and eye-catching image that pairs a single vision eye icon with the text “GPT-4 Vision: 11 Amazing Use Cases.”

Introduction to GPT-4 for Multimodal Model

I’m incredibly excited to dive into GPT-4, the new multimodal version of ChatGPT that can understand images! In this post, we’ll explore some of the amazing things this AI can do, from building apps to guessing numbers of objects, identifying plants and locations, recommending TV shows, and more. Let’s see just how intelligent this AI really is across multiple modalities.

DALL-E 3 prompt generator machine by Indish Marketer

Click Here to Get Access to Ultimate DALL-E 3 Prompt Generator for FREE

1. Creating an App from a Hand-Drawn Diagram

First up, I did a simple sketch in my notebook to lay out a basic app with a frontend, backend, and some styling. I took a picture of this and prompted GPT-4 to create the actual app from the image.

Remarkably, it generated full frontend code in HTML/CSS/JS along with a Python backend leveraging OpenAI’s API. After copying the code into files, I had a working app where you can send a text prompt and get a GPT-3 response! All from that quick hand-drawn diagram.

chatgpt vision interface — Response after the prompt

code for backend — Code for the backend of the app

code for the frontend — Code for the frontend of the app

After that, I simply went to the command shell and pasted the code, then ran the app. Here is the result:

After that, I ran the program, and here is what the app’s interface looks like:

Click Here to Get Access to Ultimate DALL-E 3 Prompt Generator for FREE

Creating Professional YouTube Thumbnails Using DALL-E and GPT-4 Vision

Learn how to create professional YouTube video thumbnails using DALL-E 3 and GPT-4 Vision with going for the paid subscription:

2. Guessing the Number of Beads in a Jar

Next up: can GPT-4 estimate the number of beads in a jar just from an image? This is more of a logic/math puzzle.

two jars filled with beads — Image Credit: eurekalert.org

The photo above shows jars filled with gumballs and beads, respectively. The number of gumballs pictured is 659, and the beads number 27,852. Now, we are going to use the second jar, as shown with a man holding it in the image below.

a man holding a jar of beads — Image Credit: eurekalert.org

The photo shows a large jar filled with beads. GPT-4 first broke the problem down step-by-step — estimating the volume of the jar based on the man’s head size, estimating the bead size compared to his shirt details, then calculating an approximate bead count.

Its initial guess was shockingly close to the real number! However, additional attempts showed the estimate varies wildly, proving it can’t perfectly solve visual logic puzzles yet. But impressive it can try at all!

screenshot of gpt-4 vision counting the number of beads

3. Explaining a YouTube Video Image

I also wanted to see if GPT-4 could explain a concept from a screenshot of a YouTube video. It analyzed all the text, diagrams, and host in the image to provide a detailed breakdown of the prompt mutation techniques being discussed.

screenshot of youtube video explaining by gpt-4 vision

It even generated an example prompt based on the limited info in the screenshot! Being able to get explanations from visuals like this makes GPT-4 helpful for learning complex topics from videos or articles.

4. Generating Funny Memes

For something more lighthearted, I tried using an image of my front porch to generate funny memes. The results weren’t award-winning, but some were chuckle-worthy based on noticing my odd house number and an old stool in the photo.

5. Creating a Website from an Image

Next up, I drew a simple website layout in my notebook with boxes for header, body content, etc. I asked GPT-4 to generate the HTML/CSS and JS for a 90s hacker-themed site based on this sketch.

It produced valid code for a working retro site! I even iteratively asked it to add a popup alert, which it seamlessly integrated. The AI can build basic websites straight from simple drawings and descriptions.

screenshot of a website — Website from the diagram

6. Finding a Camping Spot for the Night

To test GPT-4’s reasoning abilities, I took two photos — one of a dense forest area, another of a riverside spot. I asked it to suggest the best place to camp for the night based on survival expertise.

The AI provided a detailed pros and cons evaluation of both locations, taking into account factors like shelter, resources, and hazards. It recommended camping at the edge of the forest near the river — blending the advantages of both areas. Impressive situational logic!

Here, you can observe the suggestions provided by GPT-4 after analyzing both images using its vision:

7. Identifying Edible Wild Plants

I stumbled upon some bright red wild berry-looking plants on a hike and snapped a photo. Asking GPT-4, it correctly identified them as rose hips, explained they are edible high in Vitamin C, but also advised carefully confirming any wild plants before eating them.

Its knowledge of flora could be very useful for hikers or survivalists when unsure if an unknown plant or mushroom is safe to consume.

8. Identifying a Flower

Along the same lines, I took a picture of an unusual purple wildflower. GPT-4 was able to accurately classify it as a “cranesbill geranium” just from the visual. Its flower recognition abilities could assist gardeners and botanists as a quick reference.

9. Geo-guessing the Location of a Mountain

I uploaded a scenic photo taken atop a mountain I hiked in Norway. When prompted, GPT-4 visually assessed the landscape and correctly geo-guessed the general region based on the terrain being consistent with Scandinavia, particularly western Norway.

This demonstrates how machine vision can be applied to geographical location identification, similar to the viral online game GeoGuessr.

10. Fantasy Premier League Defender Recommendations

As a test of a more specialized domain, I provided GPT-4 football league standings, schedules, and player stats as images. I asked for fantasy football advice on which defenders to target in upcoming weeks.

Impressively, it analyzed the images, identified strong defensive picks, and gave sound recommendations based on the data — proving knowledge applications through computer vision.

11. TV Show Recommendations

Finally, for a more casual test, I simply showed GPT-4 a screenshot from The Office TV series and asked for recommendations of similar shows I might enjoy. It provided a list of popular sitcoms like it, based solely on recognizing the context of that one image.

Its vision capabilities enable relevant recommendations across many domains, from entertainment to shopping to travel and more.

After GPT-4 Vision analyzed the image, it presented me with the following TV series options:

Click Here to Get Access to Ultimate DALL-E 3 Prompt Generator for FREE

Conclusion and Future Explorations

In summary, these experiments demonstrated remarkable competence by GPT-4 in understanding and reasoning about diverse images. While not perfect, its multimodal intelligence points to a highly useful AI assistant as vision capabilities continue improving.

The future possibilities are exciting — nearly any task that involves comprehending visual information or scenarios could benefit from this technology. I look forward to exploring more applications of GPT-4’s computer vision and sharing what I discover! Please let me know if you have any ideas for putting this AI to the test.

GPT-4 Vision: 11 Amazing Use Cases — This is HUGE!!

Introduction to GPT-4 for Multimodal Model

1. Creating an App from a Hand-Drawn Diagram

Creating Professional YouTube Thumbnails Using DALL-E and GPT-4 Vision

2. Guessing the Number of Beads in a Jar

3. Explaining a YouTube Video Image

4. Generating Funny Memes

5. Creating a Website from an Image

6. Finding a Camping Spot for the Night

7. Identifying Edible Wild Plants

8. Identifying a Flower

9. Geo-guessing the Location of a Mountain

10. Fantasy Premier League Defender Recommendations

11. TV Show Recommendations

Conclusion and Future Explorations

Written by Indish Marketer