LLM-powed applications used in production with focus on augmenting model and deploying them⚓︎

Table of Contents⚓︎

LlamaIndex -⚓︎

What are the use of the LlamaIndex ?

Prompting
RAG Systems
AI-ChatBots
Structured Data Extraction from Unstructured Data
Fine-Tuning Models for Specific Tasks
Multi-Model Applications
AI-Agents

AutoGen - AI Agentic Framework⚓︎

AutoGen provide abstraction to build llm application with conversable-multi-agents desiged to solve complex task through inter agentic conversations.

Conversable - Agents can do inter conversation among them via sending and recieving messages.

Customization - Agents can be customized to integrate llms, tools, human or a combination of them

What are the use of AutoGen ?

Ch-1 : Running LLMs⚓︎

How to run the LLMs ?

Locally
Cloud based API

What is an LLM API ?

A service that allows developers to integrate and interact with large language models like GPT, Claude, Llama, and others in their applications without having to manage the underlying model infrastructure themselves

The common llm api providers including OpenAI, Google(Gemini/VertexAI), HuggingFace, Anthropic(Cloude), Groq, AWS Bedrock many more...

These APIs power use cases like content generation, summarization, code completion, Q&A, translation, chatbots, and many more...

How the LLM API call works ?

User through application sends a prompt (text) to an API endpoint.
API gateway checks authentication/authorization via api key and forwards the prompt to the appropriate model service.
Model generates a response and processes the output.
API formats and returns the result to the application, often as JSON

What are the key-features needed to know before using llm api ?

Token based pricing - Most LLM APIs charge per number of tokens processed (input + output), where tokens are small chunks of text
Authentication - The use of llm api requires an API key generated through registration with the provider.
Rate limiting, monitoring, billing - API providers manage quotas, track use, and provide dashboards/analytics for usage.
Security and privacy - Sending data to remote models raises privacy concerns; API providers often outline compliance and data usage policies

What are the advantages of llm apis over local running model ?

Zero infrastructure - No need for local hardware, GPUs, or managing model parameters
Scalability - Handle many requests efficiently because managed by provider.
Speed to deployment - Quickly access the latest models and features.

References⚓︎

Running model locally and access them through local server : https://github.com/bentoml/OpenLLM

Ch-2 : Building a Vector Storage⚓︎

2.1 Ingesting documents 2.1 Splitting documents 2.3 Embedding models 2.4 Vector databases

Ch-3 : Retrieval Augmented Generation⚓︎

3.2 Orchestrators 3.2 Retrievers 3.3 Memory 3.4 Evaluation

Ch-4 : Advanced RAG⚓︎

4.1 Query Construction 4.2 Agents and Tools 4.3 Post processing 4.4 Program LLMs

Ch-5 : Agents⚓︎

5.1 Agents fundamentals 5.2 Agent frameworks 5.3 multi-agents

Ch-6 : Deployment of LLMs⚓︎

6.1 local 6.2 demo 6.3 server 6.4 edge

Inference Optimization Security

https://github.com/mlabonne/llm-course

https://huggingface.co/learn/cookbook/en/enterprise_cookbook_gradio