Friday, January 26, 2024

Ollama - Part3 - Web UI for Ollama to interact with LLMs

In the previous blog posts, we covered the deployment of Ollama on Kubernetes cluster and demonstrated how to prompt the Language Models (LLMs) using LangChain and Python. Now we will delve into deploying a web user interface (UI) for Ollama on a Kubernetes cluster. This will provide a ChatGPT like experience when engaging with the LLMs.

Full project in my GitHub

https://github.com/vineethac/Ollama/tree/main/ollama_webui


The above referenced GitHub repository details all the necessary steps required to deploy the Ollama web UI. The Following diagram outlines the various components and services that interact with each other as part of this entire system:


For detailed information on deploying Prometheus, Grafana, and Loki on a Kubernetes cluster, please refer this blog post.

A sample interaction with the mistral model using the web UI is given below.


Hope it was useful. Cheers!

Thursday, January 25, 2024

Ollama - Part2 - Prompt Large Language Models (LLMs) using Ollama, LangChain and Python


In this exercise we will learn to interact with the LLMs using Ollama, LangChain, and Python.

Full project in my GitHub

https://github.com/vineethac/Ollama/tree/main/ollama_langchain


Import necessary modules from LangChain library and Python's argparse module

from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.llms import Ollama
import argparse

Argument parsing

parser = argparse.ArgumentParser()
parser.add_argument('--model', type=str, default="llama2")

args = parser.parse_args()
model = args.model

Initialize Ollama

llm = Ollama(
        model=model, callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]), base_url="http://ollama:11434"
)

Interactive loop

while True:
    print(f"Model: {model}")
    prompt = input("Ask me anything: ")

    if prompt=="/bye":
        break

    llm(prompt)
    print("\n \n")


In summary, this script sets up a simple command-line interface for interacting with the Ollama language model. It takes user prompts, sends them to the Ollama model for processing, and prints the model's responses. The loop continues until the user enters "/bye" to exit.

Hope it was useful. Cheers!

Monday, January 15, 2024

Ollama - Part1 - Deploy Ollama on Kubernetes

Docker published GenAI stack around Oct 2023 which consists of large language models (LLMs) from Ollama, vector and graph databases from Neo4j, and the LangChain framework. These utilities can help developers with the resources they need to kick-start creating new applications using generative AI. Ollama can be used to deploy and run LLMs locally. In this exercise we will deploy Ollama to a Kubernetes cluster and prompt it.

In my case I am using a Tanzu Kubernetes Cluster (TKC) running on vSphere with Tanzu 7u3 platform powered by Dell PowerEdge R640 servers. The TKC nodes are using best-effort-2xlarge vmclass with 8 CPU and 64Gi Memory.  Note that I am running it on a regular Kubernetes cluster without GPU. If you have GPU, additional configuration steps might be required.



Hope it was useful. Cheers!