Cracking a generative AI interview

If I were asked to talk about generative AI in 616 words or approximately 819 tokens (1.33 tokens per word on average) in an interview, I would say the following:

“Large language models are a subset of generative AI. They generate text by simply predicting the next word. A base model is the first raw form of a model, while a foundation model is an instruction, code, or math-tuned model. You can use a foundation LLM as it is or contextualise it by either fine-tuning or through RAG (Retrieval Augmented Generation). This kind of fine-tuning is called supervised fine-tuning (SFT). RAG knowledge source usually stores token embeddings, which are numerical representations of tokens called vectors. The context window length is the total number of tokens sent into and received from the LLM. The primary disadvantage of a long context window is LLM hallucinations towards later sections of the generation. In some sense, RAG and context window length are pitted against each other. Open source LLMs can be downloaded and used for inferencing on your local system.

Quantisation, pruning, and distillation are preferred LLM compression techniques to manage the running of downloaded LLMs locally. Closed-source LLMs cannot be downloaded and used locally. The LLM world is not all rosy. Lack of ethics, ownership of generated content, and regulations against misuse are concerns. The energy consumption due to the computation required for training an LLM from scratch and for inferencing is very high and, hence, is a considerable challenge.

Data privacy and hallucination concerns are the primary blockers against the adoption of generative AI for critical business processes. In a GPU-enabled processing unit, the LLM fine-tuning or LLM inferencing is shared between the CPU and GPU. Operations such as matrix multiplications get processed faster in a GPU than in a CPU. RNNs process text sequentially. Transformers process them in parallel and, hence, are faster. LLMs are created using transformers. You can argue that LLMs are built using supervised or unsupervised learning methods, depending on how you look at the model training process. The top closed-source LLMs are GPT-4o (from OpenAI), Claude (from Anthropic), and Gemini (from Google).

The popular open-source LLMs are Llama (from Meta), Mistral Models (from Mistral AI), Falcon (from Technology Innovation Institute), Grok (from X), and Gemma (from Google). LangChain, LlamaIndex, and Haystack are frameworks that interact with LLMs and are used to build LLM-based applications. MetaGPT, CrewAI, AutoGen, and ChatDev are multi-agent frameworks used to create LLM-powered applications. There are multiple evaluation metrics for LLMs, such as TruLens, TruEra, Ragas, MMLU, GPQA, HumanEval, TruthfulQA and many more.

An LLM could handle a single modality (text) or multiple modalities (text, audio, images, and videos). Highly popular use cases of LLMs are in document processing, code generation, AI chatbots, speech synthesis, language translation, data analysis, synthetic data generation, creative writing, and much more. You can gain the user’s trust by making the LLM and the LLM-based application more explainable and introducing human-in-the-loop (HITL). An LLM can understand and respond in a single or multiple languages (multilingual LLMs). LLMs, by themselves, are good at text generation tasks but lack computational ability. To overcome the computational challenge, LLMs can use external tools through agents.

Two popular examples of commercial code generation tools are GitHub Copilot and Cursor AI. Navarasa, Dhenu, Odia Llama, Kannada Llama, OpenHathi, Tamil Llama, Krutrim, Bhashini, BharatGPT, and project Indus are Indic (born in India) LLMs. DALL-E, StableDiffusion, and MidJourney are powerful text-to-image generation models. Ollama and LM Studio are tools for managing the download and inferencing with LLMs. Azure OpenAI Service from Microsoft, Amazon Bedrock from Amazon, and Vertex AI from Google offer cloud LLM services. Llama3.1 LLM has three variants. They are 8B, 70B, and 405B. The three variants have 8 billion, 70 billion, and 405 billion parameters. Parameters are the weights and biases in the neural network used to train the models.”

All the best for your generative AI interview.

Facebook
Twitter
Linkedin
Email