vineethac.blogspot.com: Artificial intelligence

Saturday, April 26, 2025

Azure AI Foundry - Part4 - Deploy and use a generative AI model

Azure AI Foundry supports deploying large language models (LLMs). In this article, we will see how to deploy a model and use it.

Azure AI Foundry Portal

Select your project - My assets - Models + endpoints - Deploy model
Click Deploy base model
Select the model you want to deploy (here I am selecting gpt-4.1) and click Confirm

You can see the deployment details like capacity (token per minute), resource location etc.
Click on Create resource and deploy

Now it will start creating the resource and this step may take a minute or so.
Once it is done, it will take you to the following page where you can see the mode details on the model you just deployed.

Click on Open in playground to test the model.
Once the chat playground is open, you will see your deployment, and under that you will see a section where you can give the model instructions and context. An example is given in the following screenshot. Once the model instructions and context are provided make sure to click Apply changes button.
Now you can click on Generate prompt, provide the query and click on Send.
You can also set values for limiting the maximum output token for the model response, temperature, frequency penalty etc. under the Parameters section.

A sample response is provided in the following screenshot.

To see the sample code, you can click on View code.

You can also see code samples and authentication using API key as shown below.

Metrics (total requests, token count, etc.) related to your LLM model deployment can be found on the following page.

Python

Sample code to interact with the model can be found in my GitHub repo.

Hope it was useful. Cheers!

Monday, January 15, 2024

Ollama - Part1 - Deploy Ollama on Kubernetes

Docker published GenAI stack around Oct 2023 which consists of large language models (LLMs) from Ollama, vector and graph databases from Neo4j, and the LangChain framework. These utilities can help developers with the resources they need to kick-start creating new applications using generative AI. Ollama can be used to deploy and run LLMs locally. In this exercise we will deploy Ollama to a Kubernetes cluster and prompt it.

In my case I am using a Tanzu Kubernetes Cluster (TKC) running on vSphere with Tanzu 7u3 platform powered by Dell PowerEdge R640 servers. The TKC nodes are using best-effort-2xlarge vmclass with 8 CPU and 64Gi Memory. Note that I am running it on a regular Kubernetes cluster without GPU. If you have GPU, additional configuration steps might be required.

Full project in my GitHub

https://github.com/vineethac/Ollama/tree/main/ollama_on_kubernetes