vineethac.blogspot.com: Python

Showing posts with label Python. Show all posts

Tuesday, April 22, 2025

Azure AI Foundry - Part3 - Abstractive text summarization

In this article, I will show you how to use Azure Cognitive Services for text summarization.

Azure AI Foundry portal

AI Services - Language + Translator

Summarize Information - Summarize text

Select a connected AI service resource or create a new one.

Playgrounds - Summarize Information - Summarize text

Python

Sample code to summarize a PDF can be found in my GitHub repo. Following is an example of a resume summary:

Hope this was useful. Cheers!

Thursday, July 4, 2024

vSphere with Tanzu using NSX-T - Part35 - Monitoring supervisor cluster health with Python and vCenter APIs

vSphere with Tanzu Supervisor cluster is a Kubernetes platform that simplifies the deployment, management, and scaling of Kubernetes clusters. Monitoring the health of your WCP/ Supervisor clusters is crucial to ensure the smooth running of your Tanzu Kubernetes Clusters (TKCs) and applications. In this blog post, we'll explore how to use Python and vCenter APIs to verify the health of your Supervisor clusters.

You can access the Python script from my GitHub repository: https://github.com/vineethac/VMware/tree/main/vSphere_with_Tanzu/wcp_cluster_health

This script connects to the vCenter server, retrieves the cluster summary, and checks the Tanzu Supervisor cluster configuration info and prints the status of the cluster. By using this Python script, you can easily monitor the health of your Tanzu Supervisor clusters through vCenter APIs.

Hope it was useful. Cheers!

Monday, July 1, 2024

vSphere with Tanzu using NSX-T - Part34 - CPU and Memory utilization of a supervisor cluster

vSphere with Tanzu is a Kubernetes-based platform for deploying and managing containerized applications. As with any cloud-native platform, it's essential to monitor the performance and utilization of the underlying infrastructure to ensure optimal resource allocation and avoid any potential issues. In this blog post, we'll explore a Python script that can be used to check the CPU and memory allocation/ usage of a WCP Supervisor cluster.

You can access the Python script from my GitHub repository: https://github.com/vineethac/VMware/tree/main/vSphere_with_Tanzu/wcp_cluster_util

Sample screenshot of the output

The script uses the Kubernetes Python client library (kubernetes) to connect to the Supervisor cluster using the admin kubeconfig and retrieve information about the nodes and their resource utilization. The script then calculates the average CPU and memory utilization across all nodes and prints the results to the console.

Note: In my case instead of running it as a script every time, I made it an executable plugin and copied it to the system executable path. I placed it in $HOME/.krew/bin in my laptop.

Hope it was useful. Cheers!

Monday, April 22, 2024

Hugging Face - Part6 - Repo model xyz is gated and you must be authenticated to access it

Today, while working locally on my machine with mistralai/Mistral-7B-Instruct-v0.2 from Hugging Face, I encountered the following issue:

Cannot access gated repo for url https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/resolve/main/config.json.
Repo model mistralai/Mistral-7B-Instruct-v0.2 is gated. You must be authenticated to access it.

OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2.
403 Client Error. (Request ID: Root=1-66266e88-14951c696b21d7515a1dd516;df373d0d-261c-41ec-9142-bca579e082fc)

Cannot access gated repo for url https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/resolve/main/config.json.
Access to model mistralai/Mistral-7B-Instruct-v0.2 is restricted and you are not in the authorized list. Visit https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 to ask for access.

Upon conducting a Google search, I observed that certain Hugging Face repositories are restricted, requiring an access token for downloading models locally from these gated repositories.

Following are the discussion threads:

https://huggingface.co/google/gemma-7b/discussions/31

https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/discussions/93

Therefore, if you intend to utilize this code for downloading and engaging with a Mistral model on your local system, you'll require a Hugging Face access token and must implement minor adjustments as outlined below:

import os

MODEL_ID = "mistralai/Mistral-7B-Instruct-v0.2"

access_token = os.environ["HFREADACCESS"]

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, token=access_token)

model = AutoModelForCausalLM.from_pretrained(MODEL_ID, token=access_token)

Note: You can pass the access token to the script as an environment variable. If this is running on Kubernetes as a pod, then you can consider creating a secret with the access token, inject the secret to the container as env using secretKeyRef.

Next, you'll need to log in to Hugging Face, navigate to the model card you wish to download, and select "Agree and access repository". Once completed, executing the Python script should enable you to download the model locally and interact with it seamlessly.

Hope it was useful. Cheers!

Saturday, April 20, 2024

Hugging Face - Part5 - Deploy your LLM app on Kubernetes

In our previous blog post, we explored the process of containerizing the Large Language Model (LLM) from Hugging Face using FastAPI and Docker. The next step is deploying this containerized application on a Kubernetes cluster. Additionally, I'll share my observations and insights gathered during this exercise.

You can access the deployment yaml spec and detailed instructions in my GitHub repo:

https://github.com/vineethac/huggingface/tree/main/6-deploy-on-k8s

Requirements

I am using a Tanzu Kubernetes Cluster (TKC).

Each node is of size best-effort-2xlarge which has 8 vCPU and 64Gi of memory.

❯ KUBECONFIG=gckubeconfig k get node
NAME                                             STATUS   ROLES                  AGE    VERSION
tkc01-control-plane-49jx4                        Ready    control-plane,master   97d    v1.23.8+vmware.3
tkc01-control-plane-m8wmt                        Ready    control-plane,master   105d   v1.23.8+vmware.3
tkc01-control-plane-z6gxx                        Ready    control-plane,master   97d    v1.23.8+vmware.3
tkc01-worker-nodepool-a1-pqq7j-dc6957d97-8gjn8   Ready    <none>                 21d    v1.23.8+vmware.3
tkc01-worker-nodepool-a1-pqq7j-dc6957d97-c9nfq   Ready    <none>                 21d    v1.23.8+vmware.3
tkc01-worker-nodepool-a1-pqq7j-dc6957d97-cngff   Ready    <none>                 21d    v1.23.8+vmware.3
❯

I've attached 256Gi storage volumes to the worker nodes that is mounted at /var/lib/containerd. The worker nodes on which these llm pods are running should have enough storage space. Otherwise you may notice these pods getting stuck/ restarting/ unknownstatus. If the worker nodes run out of the storage disk space, you will see pods getting evicted with warnings The node was low on resource: ephemeral-storage. TKC spec is available in the above mentioned Git repo.

Deployment

The deployment and service yaml spec are given in fastapi-llm-app-deploy-cpu.yaml.

This works on a CPU powered Kubernetes cluster. Additional configurations might be required if you want to run this on a GPU powered cluster.

We have already instrumented the Readiness and Liveness functionality in the LLM app itself.

The readiness probe invokes the /healthz endpoint exposed by the FastAPI app. This will make sure the FastAPI itself is healthy/ responding to the API calls.

The liveness probe invokes liveness.py script within the app. The script invokes the /ask endpoint which interacts with the LLM and returns the response. This will make sure the LLM is responding to the user queries. For some reason if the llm is not responding/ hangs, the liveness probe will fail and eventually it will restart the container.

You can apply the deployment yaml spec as follows:

❯ KUBECONFIG=gckubeconfig k apply -f fastapi-llm-app-deploy-cpu.yaml

Validation

❯ KUBECONFIG=gckubeconfig k get deploy fastapi-llm-app
NAME              READY   UP-TO-DATE   AVAILABLE   AGE
fastapi-llm-app   2/2     2            2           21d
❯
❯ KUBECONFIG=gckubeconfig k get pods | grep fastapi-llm-app
fastapi-llm-app-758c7c58f7-79gmq                               1/1     Running   1 (71m ago)    13d
fastapi-llm-app-758c7c58f7-gqdc6                               1/1     Running   1 (99m ago)    13d
❯
❯ KUBECONFIG=gckubeconfig k get svc fastapi-llm-app
NAME              TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)          AGE
fastapi-llm-app   LoadBalancer   10.110.228.33   10.216.24.104   5000:30590/TCP   5h24m
❯

Now you can just do a curl against the EXTERNAL-IP of the above mentioned fastapi-llm-app service.

❯ curl http://10.216.24.104:5000/ask -X POST -H "Content-Type: application/json" -d '{"text":"list comprehension examples in python"}'

In our next blog post, we'll try enhancing our FastAPI application with robust instrumentation. Specifically, we'll explore the process of integrating FastAPI metrics into our application, allowing us to gain valuable insights into its performance and usage metrics. Furthermore, we'll take a look at incorporating traces using OpenTelemetry, a powerful tool for distributed tracing and observability in modern applications. By leveraging OpenTelemetry, we'll be able to gain comprehensive visibility into the behavior of our application across distributed systems, enabling us to identify performance bottlenecks and optimize resource utilization.

Stay tuned for an insightful exploration of FastAPI metrics instrumentation and OpenTelemetry integration in our upcoming blog post!

Hope it was useful. Cheers!

Saturday, March 30, 2024

Hugging Face - Part4 - Containerize your LLM app using Python, FastAPI, and Docker

In this exercise, our objective is to integrate an API endpoint for the Large Language Model (LLM) provided by Hugging Face using FastAPI. Additionally, we aim to encapsulate this whole application within a Docker container for portability and ease of deployment.

To achieve this, our project consists of several key components:

Large Language Model: Our application logic resides in model.py, where the model_pipeline function serves as the core engine behind our LLM interaction using LangChain. We've chosen the Mistral Instruct model from Hugging Face for this exercise.
API Endpoint Integration: We'll be incorporating an API endpoint using FastAPI to seamlessly interact with the LLM downloaded from Hugging Face. The main.py file implements the FastAPI framework, defining routes and endpoints. Specifically, the /ask endpoint invokes the model_pipeline function to interact with the Mistral Instruct model and generate a response.
Containerization: Utilizing the Dockerfile, we containerize our FastAPI LLM application. This ensures that our application, along with its dependencies, can be easily packaged and deployed across various environments.

You can access the Dockerfile, Python code, and other observations on my GitHub repository:

https://github.com/vineethac/huggingface/tree/main/5-containerize-llm-app

Deploy on Kubernetes as a pod

Deploying directly as a pod is not a preferred way. This is just for quick testing purpose! In the next blog post we will see how to deploy this as a Kubernetes deployment resource.

❯ KUBECONFIG=gckubeconfig k run hf-11 --image=vineethac/fastapi-llm-app:latest --image-pull-policy=Always
pod/hf-11 created
❯ KUBECONFIG=gckubeconfig kg po hf-11
NAME    READY   STATUS              RESTARTS   AGE
hf-11   0/1     ContainerCreating   0          2m23s
❯
❯ KUBECONFIG=gckubeconfig kg po hf-11
NAME    READY   STATUS    RESTARTS   AGE
hf-11   1/1     Running   0          26m
❯
❯ KUBECONFIG=gckubeconfig k logs hf-11 -f
Downloading shards: 100%|██████████| 3/3 [02:29<00:00, 49.67s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:03<00:00,  1.05s/it]
INFO:     Will watch for changes in these directories: ['/fastapi-llm-app']
INFO:     Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
INFO:     Started reloader process [7] using WatchFiles
Loading checkpoint shards: 100%|██████████| 3/3 [00:11<00:00,  3.88s/it]
INFO:     Started server process [25]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
2024-03-28 08:19:12 hf-11 watchfiles.main[7] INFO 3 changes detected
2024-03-28 08:19:48 hf-11 root[25] INFO User prompt: select head or tail randomly. strictly respond only in one word. no explanations needed.
2024-03-28 08:19:48 hf-11 root[25] INFO Model: mistralai/Mistral-7B-Instruct-v0.2
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
2024-03-28 08:19:54 hf-11 root[25] INFO LLM response:  Head.
2024-03-28 08:19:54 hf-11 root[25] INFO FastAPI response:  Head.
INFO:     127.0.0.1:53904 - "POST /ask HTTP/1.1" 200 OK
INFO:     127.0.0.1:55264 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:43342 - "GET /healthz HTTP/1.1" 200 OK

For a quick validation, I did exec into the pod and curl against the exposed APIs.

❯ KUBECONFIG=gckubeconfig k exec -it hf-11 -- bash
root@hf-11:/fastapi-llm-app#
root@hf-11:/fastapi-llm-app# curl -d '{"text":"select head or tail randomly. strictly respond only in one word. no explanations needed."}' -H "Content-Type: application/json" -X POST http://localhost:5000/ask
{"response":" Head."}root@hf-11:/fastapi-llm-app#
root@hf-11:/fastapi-llm-app# curl localhost:5000
"Welcome to FastAPI for your local LLM!"root@hf-11:/fastapi-llm-app#
root@hf-11:/fastapi-llm-app#
root@hf-11:/fastapi-llm-app# curl localhost:5000/healthz
{"Status":"OK"}root@hf-11:/fastapi-llm-app#
root@hf-11:/fastapi-llm-app#

You can also use kubectl expose command to create a service for this pod and then port forward to it and then curl to it.

Hope it was useful. Cheers!

Friday, February 23, 2024

Hugging Face - Part3 - Inference with Code Llama using LangChain

In the field of understanding and working with human language (NLP), Hugging Face is a key platform that provides many pre-trained models for different tasks. With Transformers, LangChain, and Python developers can easily use Hugging Face's models on their own computers for quick processing. Using LangChain offers a streamlined and user-friendly approach to tapping into the capabilities of pre-trained language models. In this blog post we focus on how to inference with Code Llama - Instruct model from Hugging Face locally using LangChain.

You can access the Python script in my GitHub repository:
https://github.com/vineethac/huggingface/tree/main/4-codellama_with_langchain

To initiate inference with Code Llama, developers can start by specifying the desired model using its identifier, such as MODEL_ID = "codellama/CodeLlama-7b-Instruct-hf". Transformers simplifies the process by providing a unified interface with the familiar Python programming language, allowing users to effortlessly initialize the model and tokenizer.

Once the model and tokenizer are set up, developers can leverage LangChain's HuggingFacePipeline class to create a text generation pipeline. This pipeline, defined with parameters like max_new_tokens and repetition_penalty, becomes a powerful tool for local inferencing. By combining this pipeline with LangChain's PromptTemplate, developers can easily construct prompts and invoke the entire chain to generate responses. This streamlined process facilitates local inferencing with Code Llama, empowering developers to leverage Hugging Face's models for a wide range of natural language processing tasks in their Python applications.

Example

root@hf-3:/codellama# python3 codellama_langchain.py
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████| 749/749 [00:00<00:00, 3.57MB/s]
tokenizer.model: 100%|█████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 4.48MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 6.13MB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████████████| 411/411 [00:00<00:00, 1.86MB/s]
config.json: 100%|███████████████████████████████████████████████████████████████████| 646/646 [00:00<00:00, 3.40MB/s]
model.safetensors.index.json: 100%|██████████████████████████████████████████████| 25.1k/25.1k [00:00<00:00, 68.2MB/s]
model-00001-of-00002.safetensors: 100%|██████████████████████████████████████████| 9.98G/9.98G [01:50<00:00, 90.0MB/s]
model-00002-of-00002.safetensors: 100%|██████████████████████████████████████████| 3.50G/3.50G [00:39<00:00, 89.5MB/s]
Downloading shards: 100%|███████████████████████████████████████████████████████████████| 2/2 [02:30<00:00, 75.16s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.86s/it]
generation_config.json: 100%|█████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 110kB/s]

Ask codellama: given two unsorted integer lists. merge the two lists, sort the merged list, and find median using python. consider the length of the merged list while finding the median value.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
 Here is a possible solution to the problem:

def merge_and_find_median(list1, list2):
   # Merge the two lists
   merged_list = list1 + list2

   # Sort the merged list
   merged_list.sort()

   # Find the median value
   if len(merged_list) % 2 == 0:
       # Even number of elements in the merged list
       median = (merged_list[len(merged_list) // 2 - 1] + merged_list[len(merged_list) // 2]) / 2
   else:
       # Odd number of elements in the merged list
       median = merged_list[len(merged_list) // 2]

   return median

Explanation:

* First, we merge the two lists by concatenating them.
* Then, we sort the merged list using the `sort()` method.
* Next, we check whether the length of the merged list is even or odd. If it's even, we take the average of the middle two elements of the list. If it's odd, we simply take the middle element as the median.
* Finally, we return the median value.

Note that this solution assumes that both input lists are sorted in ascending order. If they are not sorted, you may need to add additional code to sort them before merging and finding the median.</s>

Ask codellama: /bye
root@hf-3:/codellama#

Hope it was useful. Cheers!

Tuesday, February 20, 2024

Hugging Face - Part2 - Code generation with Code Llama - Instruct

Code Llama, an impressive publicly available machine learning model, is a specialised version of Llama 2 that was created by further training Llama 2 on code-specific datasets. It is specifically designed to tackle coding challenges. It can generate both code and descriptive natural language about code, making it a versatile asset for developers. Some common use cases include writing new functions or even debugging existing code. It supports a wide range of popular programming languages, including Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash.

Code Llama – Instruct, an advanced variation of Code Llama which is designed to accept natural language instructions as input and returns the expected output. This unique feature makes the model more adept at understanding and fulfilling user requirements. The Meta AI team recommend using Code Llama - Instruct variants whenever you intend to use Code Llama for your code generation tasks.

In this blog post, I will guide you through the process of employing the Code Llama - Instruct model from Hugging Face locally for code generation tasks. We will be utilizing the Python Transformers library for this. You can access the Python script in my GitHub repository:

https://github.com/vineethac/huggingface/tree/main/3-codellama-instruct

Example

root@hf-5:/# python3 codellama_prompt.py
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 749/749 [00:00<00:00, 3.44MB/s]
tokenizer.model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 4.12MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 9.76MB/s]
special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 411/411 [00:00<00:00, 2.08MB/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 646/646 [00:00<00:00, 3.51MB/s]
model.safetensors.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25.1k/25.1k [00:00<00:00, 47.9MB/s]
model-00001-of-00002.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.98G/9.98G [02:02<00:00, 81.2MB/s]
model-00002-of-00002.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.50G/3.50G [00:45<00:00, 76.7MB/s]
Downloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [02:48<00:00, 84.38s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:20<00:00, 10.10s/it]
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 444kB/s]


Ask codellama/CodeLlama-7b-Instruct-hf: reverse a list in python.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Result: <s>[INST] reverse a list in python. [/INST]  There are several ways to reverse a list in Python. Here are a few methods:

1. Using the `reversed()` function:

my_list = [1, 2, 3, 4, 5]
reversed_list = list(reversed(my_list))
print(reversed_list)  # [5, 4, 3, 2, 1]

2. Using slicing:

my_list = [1, 2, 3, 4, 5]
reversed_list = my_list[::-1]
print(reversed_list)  # [5, 4, 3, 2, 1]

3. Using the `reverse()` method:

my_list = [1, 2, 3, 4, 5]
my_list.reverse()
print(my_list)  # [5, 4, 3, 2, 1]

Note that the `reverse()` method reverses the list in place, meaning that it modifies the original list. The other two methods create a new list with the elements in reverse order.

Ask codellama/CodeLlama-7b-Instruct-hf: /bye
root@hf-5:/#

The first time you execute the Python script, the model will be automatically downloaded to your local machine. Subsequently, upon subsequent runs, the previously saved model will be utilized in processing user inputs.

root@hf-5:~# cd ~/.cache/huggingface/hub/
root@hf-5:~/.cache/huggingface/hub#
root@hf-5:~/.cache/huggingface/hub# ls | grep Instruct
models--codellama--CodeLlama-7b-Instruct-hf
root@hf-5:~/.cache/huggingface/hub#
root@hf-5:~/.cache/huggingface/hub# cd models--codellama--CodeLlama-7b-Instruct-hf
root@hf-5:~/.cache/huggingface/hub/models--codellama--CodeLlama-7b-Instruct-hf# ls
blobs  refs  snapshots
root@hf-5:~/.cache/huggingface/hub/models--codellama--CodeLlama-7b-Instruct-hf# cd blobs/
root@hf-5:~/.cache/huggingface/hub/models--codellama--CodeLlama-7b-Instruct-hf/blobs#
root@hf-5:~/.cache/huggingface/hub/models--codellama--CodeLlama-7b-Instruct-hf/blobs# ls -altrh
total 13G
-rw-r--r-- 1 root root  749 Feb 19 12:03 526f464cf83353c59f7c07b9e587498b47d67a1b
-rw-r--r-- 1 root root 489K Feb 19 12:03 45ccb9c8b6b561889acea59191d66986d314e7cbd6a78abc6e49b139ca91c1e6
-rw-r--r-- 1 root root 1.8M Feb 19 12:03 6b25321d89e21832a89e6273834eab0e4378a53b
-rw-r--r-- 1 root root  411 Feb 19 12:03 d85ba6cb6820b01226ef8bd40b46bb489041c6a8
-rw-r--r-- 1 root root  646 Feb 19 12:03 8fb4018bc8ceaddbaf7d3d238911a30fd5e9081a
-rw-r--r-- 1 root root  25K Feb 19 12:03 cd3b8fb46c4d5616e91520a7a7d9a5a75af759a8
-rw-r--r-- 1 root root 9.3G Feb 19 12:05 0f52c0eab2dafa0a13e8103a426b17137f7b053e9211334158d7bd7cc1148ceb
-rw-r--r-- 1 root root 3.3G Feb 19 12:06 9ddab1824225fbe405cea67c5d8d87666f1ab5c59ec89abdf2cacae9b555da75
-rw-r--r-- 1 root root  116 Feb 19 12:06 aa9aac2cbaa80cf25094e7d9a527bd1cab9f5321
drwxr-xr-x 6 root root 4.0K Feb 19 12:06 ..
drwxr-xr-x 2 root root 4.0K Feb 19 12:06 .
root@hf-5:~/.cache/huggingface/hub/models--codellama--CodeLlama-7b-Instruct-hf/blobs#

I hope it was useful. Cheers!

Monday, February 19, 2024

Hugging Face - Part1 - Getting started

This blog series will help you get started with Hugging Face, including:

Downloading and using Hugging Face models locally via the Python Transformers library.
Constructing an API for your LLM application using FastAPI.
Containerizing your project with Docker.
Deploying and running your containerized LLM application on a Kubernetes cluster.

An overview about Hugging Face, types of Language Models, and the Transformers library are given in my GitHub repo: https://github.com/vineethac/huggingface/tree/main

Here are some examples of running the language models locally from Hugging Face using Pipeline function from the Transformers library:

question-answering

Model used: distilbert-base-cased-distilled-squad

6-question-answering.py

'''
Question answering from a given context.
'''

from transformers import pipeline

question_answerer = pipeline(task="question-answering", model="distilbert-base-cased-distilled-squad")
output = question_answerer(
    question="What work I do?",
    context="My name is Vineeth and I work as a Site Reliability Engineer at VMware in Bangalore, India",
)

print(output)


root@hf-2:/transformers-course# python3 6-question-answering.py
{'score': 0.9214025139808655, 'start': 35, 'end': 60, 'answer': 'Site Reliability Engineer'}
root@hf-2:/transformers-course#

translation

Model used: Helsinki-NLP/opus-mt-fr-en

8-translation.py

'''
Translate from fr to en.
'''

from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
output = translator("Ce cours est produit par Hugging Face.")

print(output)


root@hf-2:/transformers-course# python3 8-translation.py
[{'translation_text': 'This course is produced by Hugging Face.'}]
root@hf-2:/transformers-course#

More details and examples are given in my GitHub repo:

https://github.com/vineethac/huggingface/tree/main/1-examples

Hope it was useful. Cheers!

Thursday, January 25, 2024

Ollama - Part2 - Prompt Large Language Models (LLMs) using Ollama, LangChain and Python

In this exercise we will learn to interact with the LLMs using Ollama, LangChain, and Python.

Full project in my GitHub

https://github.com/vineethac/Ollama/tree/main/ollama_langchain

Import necessary modules from LangChain library and Python's argparse module

from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.llms import Ollama
import argparse

Argument parsing

parser = argparse.ArgumentParser()
parser.add_argument('--model', type=str, default="llama2")

args = parser.parse_args()
model = args.model

Initialize Ollama

llm = Ollama(
        model=model, callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]), base_url="http://ollama:11434"
)

Interactive loop

while True:
    print(f"Model: {model}")
    prompt = input("Ask me anything: ")

    if prompt=="/bye":
        break

    llm(prompt)
    print("\n \n")

In summary, this script sets up a simple command-line interface for interacting with the Ollama language model. It takes user prompts, sends them to the Ollama model for processing, and prints the model's responses. The loop continues until the user enters "/bye" to exit.

Hope it was useful. Cheers!

Friday, September 22, 2023

Configure syslog forwarding in vCenter servers using Python

As a system administrator, it's essential to ensure that your vCenter servers are properly configured to collect and forward system logs to a central location for monitoring and analysis. In this blog, we'll explore how to configure syslog forwarding in vCenter servers using Python.

You can access the Python script from my GitHub repository:
https://github.com/vineethac/VMware/tree/main/vCenter/syslog_forwarding

In this blog, we've demonstrated how to get, test, and set syslog forwarding configuration in vCenter servers using Python. By following these steps, you can ensure that your vCenter servers are properly configured to collect and forward system logs to a central location for monitoring and analysis. Remember to replace the placeholders in the config file with your actual vCenter server names, syslog server IP address or hostname, port, and protocol.

Hope it was useful. Cheers!

Sunday, October 9, 2022

Working with Kubernetes using Python - Part 06 - Create namespace

Following code snipet uses Python client for the kubernetes API to create namespace. You will need to specify the kubeconfig file and the context to use for creating the namespace. This is an example case if you are working with multiple kubeconfig files where multiple K8s clusters could be present in each kubeconfig file.

from kubernetes import client, config
import argparse


def load_kubeconfig(kubeconfig_file, context_name):
    try:
        config.load_kube_config(
            config_file=f"{kubeconfig_file}", context=f"{context_name}"
        )
    except config.ConfigException as err:
        print(err)
        raise Exception("Could not configure kubernetes python client!")
    v1 = client.CoreV1Api()
    return v1


def create_ns(v1, ns_name):
    print("Creating namespace")
    namespace = client.V1Namespace(metadata={"name": ns_name})
    ret = v1.create_namespace(namespace)
    print(ret)


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("-c", "--context", required=True, help="K8s context")
    parser.add_argument("-f", "--file", required=True, help="Kubeconfig file")
    args = parser.parse_args()

    context = args.context

    v1 = load_kubeconfig(args.file, context)

    ns_name = input("Enter namespace name: ")
    create_ns(v1, ns_name)


if __name__ == "__main__":
    main()

Following is sample output:

❯ python3 create_namespace.py -c tkc-admin@tkc -f /Users/vineethac/testing/ccs/tkc.kubeconfig
Enter namespace name: vineethac-test11
Creating namespace
{'api_version': 'v1',
 'kind': 'Namespace',
 'metadata': {'annotations': None,
              'cluster_name': None,
              'creation_timestamp': datetime.datetime(2022, 12, 7, 11, 57, 17, tzinfo=tzutc()),
              'deletion_grace_period_seconds': None,
              'deletion_timestamp': None,
              'finalizers': None,
              'generate_name': None,
              'generation': None,
              'labels': {'kubernetes.io/metadata.name': 'vineetha-test11'},
              'managed_fields': [{'api_version': 'v1',
                                  'fields_type': 'FieldsV1',
                                  'fields_v1': {'f:metadata': {'f:labels': {'.': {},
                                                                            'f:kubernetes.io/metadata.name': {}}}},
                                  'manager': 'OpenAPI-Generator',
                                  'operation': 'Update',
                                  'time': datetime.datetime(2022, 12, 7, 11, 57, 17, tzinfo=tzutc())}],
              'name': 'vineethac-test11',
              'namespace': None,
              'owner_references': None,
              'resource_version': '5518430',
              'self_link': None,
              'uid': '0e9f1211-e09f-4d2d-b475-8995bb0c0907'},
 'spec': {'finalizers': ['kubernetes']},
 'status': {'conditions': None, 'phase': 'Active'}}

Sunday, September 25, 2022

Working with Kubernetes using Python - Part 05 - Get pods

Following code snipet uses Python client for the kubernetes API to get all pods and pods under a specific namespace for a given context:

from kubernetes import client, config
import argparse


def load_kubeconfig(context_name):
    config.load_kube_config(context=f"{context_name}")
    v1 = client.CoreV1Api()
    return v1


def get_all_pods(v1):
    print("Listing all pods:")
    ret = v1.list_pod_for_all_namespaces(watch=False)
    for i in ret.items:
        print(i.metadata.namespace, i.metadata.name, i.status.phase)


def get_namespaced_pods(v1, ns):
    print(f"Listing all pods under namespace {ns}:")
    ret = v1.list_namespaced_pod(f"{ns}")
    for i in ret.items:
        print(i.metadata.namespace, i.metadata.name, i.status.phase)


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("-c", "--context", required=True, help="K8s context")
    parser.add_argument("-n", "--namespace", required=False, help="K8s namespace")
    args = parser.parse_args()

    context = args.context
    v1 = load_kubeconfig(context)

    if not args.namespace:
        get_all_pods(v1)
    else:
        get_namespaced_pods(v1, args.namespace)


if __name__ == "__main__":
    main()

Saturday, June 11, 2022

Working with Kubernetes using Python - Part 04 - Get namespaces

Following code snipet uses Python client for the kubernetes API to get namespace details from a given context:

from kubernetes import client, config
import argparse


def load_kubeconfig(context_name):
    config.load_kube_config(context=f"{context_name}")
    v1 = client.CoreV1Api()
    return v1


def get_all_namespace(v1):
    print("Listing namespaces with their creation timestamp, and status:")
    ret = v1.list_namespace()
    for i in ret.items:
        print(i.metadata.name, i.metadata.creation_timestamp, i.status.phase)


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("-c", "--context", required=True, help="K8s context")
    args = parser.parse_args()

    context = args.context
    v1 = load_kubeconfig(context)
    get_all_namespace(v1)


if __name__ == "__main__":
    main()

Friday, April 8, 2022

Working with Kubernetes using Python - Part 03 - Get nodes

Following code snipet uses kubeconfig python module to switch context and Python client for the kubernetes API to get cluster node details. It takes the default kubeconfig file, and switch to the required context, and get node info of the respective cluster.

kubectl commands:

kubectl config get-contexts
kubectl config current-context
kubectl config use-context <context_name>
kubectl get nodes -o json

Code:

Reference:

https://kubeconfig-python.readthedocs.io/en/latest/
https://github.com/kubernetes-client/python

Hope it was useful. Cheers!

Saturday, March 19, 2022

Working with Kubernetes using Python - Part 02 - Switch context

Following code snipet uses kubeconfig python module and it takes the default kubeconfig file, and switch to a new context.

kubectl commands:

kubectl config get-contexts
kubectl config current-context
kubectl config use-context <context_name>

Code:

Note: If you want to use a specific kubeconfig file, instead of conf = KubeConfig()

you can use conf = KubeConfig('path-to-your-kubeconfig')

Reference:

https://kubeconfig-python.readthedocs.io/en/latest/

Hope it was useful. Cheers!

Friday, March 4, 2022

Working with Kubernetes using Python - Part 01 - List contexts

Following code snipet uses Python client for the kubernetes API and it takes the default kubeconfig file, list the contexts, and active context.

kubectl commands:

kubectl config get-contexts
kubectl config current-context

Code:

Note: If you want to use a specific kubeconfig file, instead of

contexts, active_context = config.list_kube_config_contexts()

you can use

contexts, active_context = config.list_kube_config_contexts(config_file="path-to-kubeconfig")

Reference:

https://github.com/kubernetes-client/python

Hope it was useful. Cheers!

Pages

Tuesday, April 22, 2025

Azure AI Foundry portal

Python

Thursday, July 4, 2024

Monday, July 1, 2024

Monday, April 22, 2024

Saturday, April 20, 2024

Requirements

Deployment

Validation

Saturday, March 30, 2024

Deploy on Kubernetes as a pod

Friday, February 23, 2024

Example

Tuesday, February 20, 2024

Example

Monday, February 19, 2024

question-answering

translation

Thursday, January 25, 2024

Full project in my GitHub

Friday, September 22, 2023

Sunday, October 9, 2022

Sunday, September 25, 2022

Saturday, June 11, 2022

Friday, April 8, 2022

Saturday, March 19, 2022

Friday, March 4, 2022