Docker published GenAI stack around Oct 2023 which consists of large language models (LLMs) from Ollama, vector and graph databases from Neo4j, and the LangChain framework. These utilities can help developers with the resources they need to kick-start creating new applications using generative AI. Ollama can be used to deploy and run LLMs locally. In this exercise we will deploy Ollama to a Kubernetes cluster and prompt it.
In my case I am using a Tanzu Kubernetes Cluster (TKC) running on vSphere with Tanzu 7u3 platform powered by Dell PowerEdge R640 servers. The TKC nodes are using best-effort-2xlarge vmclass with 8 CPU and 64Gi Memory. Note that I am running it on a regular Kubernetes cluster without GPU. If you have GPU, additional configuration steps might be required.
Full project in my GitHub
https://github.com/vineethac/Ollama/tree/main/ollama_on_kubernetes
No comments:
Post a Comment