How to choose the Right Large Language Model for your application

Image Credit: https://www.pexels.com/@markus-winkler-1430818/

Introduction to Large Language Models (LLM)

Definition and Role of LLMs in Generative AI

LLM or Large language models are a type of Natural language processing model in the broader area of Artificial intelligence. Here are some of the most popular Large language models:

Generative pre-trained transformer (GPT)
Bidirectional Encoder Representation from Transformer (BERT)
Text to text transfer Transformer (T5)

We will go deeper into the above models later in the post.

Importance and capabilities of LLMs, including flexibility in tasks

LLM or large language models are important in making generative AI accessible to everyone. LLMs can also help with different types of tasks such as:

Copywriting - LLMs can generate an article regarding anything you prompt it. LLMs can be useful to generate content for both short-form and long-form content, as per your requirement.
Boilerplate code generation - LLMs such as ChatGPT can generate a boilerplate code for your application so that you can build upon it.
Agentic Framework - LLMs can be linked with an agentic layer or framework such that it can perform tasks such as transcribing audio, searching the web, etc. Agentic frameworks such as Langgraph, Google’s Agent Development Kit (ADK) . These agentic frameworks allow the LLMs to perform tasks by linking the LLM to an external framework, such as a web search, etc.
With the prominence of Neural network-based Image Diffusion methods, LLMs can generate images and, in certain Large Models, even small videos. This can be particularly useful as creating a representative image or diagram from scratch can be a lot of work. AI can simplify this for you by generating an image with the help of a prompt.
Complex Calculations - LLMs now can perform deep calculations and even present unfound new theories - Ramanujan Mahine, a new AI software has uncovered new complex patterns in numbers (livescience). Similarly, AI can help demystify research by performing different analyses on your findings in almost real-time.

LLMs have numerous different applications apart from the ones mentioned above.

Examples of prominent LLMs:

GPT-3

GPT-3 or Generative Pre-Trained Transformer (3) is a Pre-Trained or decoding type-based Transformer LLM developed by OpenAI. It was the widely used LLM in ChatGPT until it was recently replaced by GPT-4, the newer model of GPT developed by OpenAI.

ChatGPT

ChatGPT is a commercial generative AI application that uses the GPT-based LLM models for generating a response. The application is considered among the most widely successful AI applications ever launched with over 180 million users currently and counting.

Claude 2

Claude 2 is the LLM developed by Anthropic. It is a popular LLM among developers, and despite its success, Anthropic has not revealed the source code of its LLM Architecture.

Training and Adaptability of LLMs

BERT - Bidirectional Encoder Representation from Transformer

BERT's or Bidirectional Encoder Representation from Transformers is an LLM developed by Google, which has been added to its proprietary Search algorithm. It is known for its ability to understand nuances and is popular for its sentiment analysis capabilities

"XLNet is an autoregressive transformed model that comes included with the Python Hugging Face library. It is known for its ability to generate word permutations for generating text. XLNet is also known for predicting patterns in a given text to generate the best response possible.",

T5 or Text-to-Text transfer transformer

T5 or text-to-text transfer transformer is an LLM model architecture also packaged with the Hugging Face library. T5 is known especially for its adaptability of the model across various different language tasks and prompts given. T5 was originally developed by Google AI.

RoBERTa or a robustly trained BERT pre-training approach

RoBERTa is an LLM model developed upon the BERT LLM model developed by Google. It modifies key parameters and training with much larger mini-training batches. This results in improvements over BERT for performance as it is pre-trained much better.

Llama-2 or Large Language Model Meta AI-2

LLama-2 is a robust open-source LLM developed by Meta. Llama uses a GPT like architecture as a base. It was trained on a huge dataset of 2 trillion tokens. It is known for its efficiency and its benchmark performance. Llama 3, the succeeding model family, also features multimodal models, which can accept both image and text prompts.Llama model family are dedicated to keeping their models open source.

Special Non-LLM mention

Sentence Transformer

Although not an LLM by itself, Sentence transformer can perform semantic operations on different pairs of documents or texts, or can perform vector embedding operations on a given selection of text, and the embeddings can later be retrieved using RAG or retrieval augmented generation.

Criteria for Model Selection

Task relevance & functionality

Classification

Verifying that an LLM can correctly process the user input and generate the most appropriate response without much hallucinations or nonsensical and non-factual texts generated can negatively affect the user's experience.

Text summarization

See if the model can summarize a complex topic in an understandable and appropriate form. Summarizing a text is a complex task and shows the efficiency of the model in handling complex tasks.

Data privacy considerations for sensitive information

The LLM shouldn't, at any cost, reveal the personal information of a user's prompt either to a third party or during a hallucination. This compromises the integrity of the LLM provider and shows the model is not compatible with Safe AI practices.

Resource and infrastructure limitations: compute resources, memory, storage

The LLM should be able to retain the highest level of performance and efficiency with the lowest load on the infrastructure. This way it allows the LLM to run in resource-limited environments. Choosing a low-load yet high-efficiency LLM like BERT is usually more lightweight, yet produces a decent response to a prompt. The smaller the size of the LLM, the less memory it takes.

Performance evaluation: real-time performance, latency, throughput

You can conduct a performance evaluation by recording and comparing the vectors of different LLMs such as real-time performance, latency, and throughput. This can be done in a controlled setting with the same computing resources. This can help measure the latency, and throughput. The response of the LLM can also be tested using either semantic based testing or LLM-as-a-Judge based testing.

PranavUpadhyaya.org

Search This Blog