Exploring image and multimedia generation

<aside> <img src="/icons/list_gray.svg" alt="/icons/list_gray.svg" width="40px" /> Contents

</aside>

What to learn

Most Large Language Models do not generate images or multimedia directly. They call another model to do it for them.

All these other models respond to prompts but cannot be controlled as well as text models. In general, they have many limitations

In this activity, you should start exploring the power and limitations of these models.

The best place to get started is the equivalent for LMSys Arena, IMGSys which compares open source image generation models.

imgsys.org | an image model arena by fal.ai

<aside> <img src="/icons/warning_red.svg" alt="/icons/warning_red.svg" width="40px" /> Note: IMGSys does not include the best commercial image generation models such as Midjourney or DALL-E. To explore those, you need to use ChatGPT or Midjourney directly.

</aside>

What to do

<aside> <img src="/icons/cursor-click_gray.svg" alt="/icons/cursor-click_gray.svg" width="40px" /> Click on the triangle next to each step to see more details and/or get resources necessary for the task.

Read through all steps first.

</aside>

Step 1: Got to imgsys.org

Untitled

Step 2: Try some of the suggested prompts to generate images

Untitled

Step 3: Try some more challenging prompts

<aside> <img src="/icons/paste_gray.svg" alt="/icons/paste_gray.svg" width="40px" /> Prompts for image generation are slightly different than prompts for text generation. You can use the same natural language but you cannot always exactly control the exact positioning.

</aside>

A man is holding a traditional handset to his left ear while pointing outside the window with his right hand.

A teacher writing on a green board the text "Class dismissed".

A drawn layout plan of a flat with two bed rooms and kitchen.

Two people running towards each other with smiles on their faces and with open hands and arms outstretched.