vineethac.blogspot.com: LLaVA

Friday, March 29, 2024

Generative AI and LLMs Blog Series

In this blog series we will explore the fascinating world of Generative AI and Large Language Models (LLMs). We delve into the latest advancements in AI technology, focusing particularly on LLMs, which have revolutionized various fields, including natural language processing and text generation.

Throughout this series, we will discuss LLM serving platforms such as Ollama and Hugging Face, providing insights into their capabilities, features, and applications. I will also guide you through the process of getting started with LLMs, from setting up your development/ test environment to deploying these powerful models on Kubernetes clusters. Additionally, we'll demonstrate how to effectively prompt and interact with LLMs using frameworks like LangChain, empowering you to harness the full potential of these cutting-edge technologies.

Stay tuned for insightful articles, and hands-on guides that will equip you with the knowledge and skills to unlock the transformative capabilities of LLMs. Let's explore the future of AI together!

Image credits: designer.microsoft.com/image-creator

Ollama

Part1 - Deploy Ollama on Kubernetes

Part2 - Prompt LLMs using Ollama, LangChain, and Python

Part3 - Web UI for Ollama to interact with LLMs

Part4 - Vision assistant using LLaVA

Hugging Face

Part1 - Getting started with Hugging Face

Part2 - Code generation with Code Llama Instruct

Part3 - Inference with Code Llama using LangChain

Part4 - Containerize your LLM app using Python, FastAPI, and Docker

Part5 - Deploy your LLM app on Kubernetes

Part6 - LLM app observability <coming soon>

Thursday, February 1, 2024

Ollama - Part4 - Vision assistant using LLaVA

In this exercise we will interact with LLaVA which is an end-to-end trained large multimodal model and vision assistant. We will use the Ollama REST API to prompt the model using Python.

Full project in my GitHub

https://github.com/vineethac/Ollama/tree/main/ollama_vision_assistant

LLaVA, being a large multimodal model and vision assistant, can be utilized for various tasks. Here are a couple of use cases:

Image Description Generation

Input: Provide LLaVA with an image.

Use Case: LLaVA can generate descriptive text or captions for the content of the image. This is particularly useful for automating image cataloging or enhancing accessibility for visually impaired users.

Question-Answering on Text and Image

Input: Ask LLaVA a question related to a given text or show it an image.

Use Case: LLaVA can comprehend the context and provide relevant answers. For instance, you could ask about details in a picture or seek information from a paragraph, and LLaVA will attempt to answer accordingly.

These are just a few examples, and the versatility of LLaVA allows for exploration across a wide range of multimodal tasks and applications.

Sample interaction with LLaVA model

Image

Image credits: shutterstock

Prompt

python3 query_image.py --path=images/img1.jpg --prompt="describe the picture 
and what are the essentials that one need to carry generally while going these 
kind of places?"

Response

{
    "model": "llava",
    "created_at": "2024-01-23T17:41:27.771729767Z",
    "response": " The image shows a man riding his bicycle on a country road, surrounded by 
    beautiful scenery and mountains. He appears to be enjoying the ride as he navigates 
    through the countryside. \n\nWhile cycling in such environments, an essential item one 
    would need to carry is a water bottle or hydration pack, to ensure they stay well-hydrated 
    during the journey. In addition, it's important to have a map or GPS device to navigate 
    through potentially less familiar routes and avoid getting lost. Other useful items for 
    cyclists may include a multi-tool, first aid kit, bike lock, snacks, spare clothes, 
    and a small portable camping stove if planning an overnight stay in the wilderness.",

Hope it was useful. Cheers!