AI Week in Review 26.01.24

Figure 1. Abominable Snowman on a London Street, sill from a Runway Gen-4.5. Quality is getting close to passing the visual Turing Test.

Capable local AI coding models are back! Z.ai (formerly Zhipu) released GLM-4.7-Flash, a 30B parameter mixture-of-experts (MoE) LLM with only 3B parameters active but with impressive performance for local AI coding and agentic workloads. With 59.2% on SWE-bench verified and 79.5% on tau-bench, GLM-4.7-Flash achieves SOTA performance for a model of its size on coding and agentic task benchmarks, well above prior best AI models Qwen 3 30B and gpt-oss-20B.

According to Bijan Bowen’s tests, running GLM-4.7-Flash with OpenCode is an agentic coding beast. GLM-4.7-Flash is a great option for local AI coding and browser agent tools, supporting automations, coding and agentic browser tasks without sending data to cloud providers. If you update Ollama to the latest version, you can pull GLM-4.7-Flash from Ollama or get it via Hugging Face.

Runway introduced Image to Video for Runway Gen-4.5. The model takes static images as input and animates them into video sequences that maintain the original visual style and character details. The feature includes transforming multiple shots to a single video. This update focuses on giving filmmakers more control over the temporal evolution of their scenes with precise camera control and consistent characters. The Gen-4.5 update is available for all paid plans.

Gino Carranza on X says “Runway Gen 4.5 Image-to-Video is on another level:

Camera moves that actually obey. Action that reads. You’re not “generating” anymore — you’re directing. This changes everything.

Runway also claims its new Gen-4.5 AI video model successfully fooled 90% of viewers in a “Real vs. Fake” test, closing the gap between synthetic video and actual camera recordings. It’s passing the video generation equivalent of the Turing test and getting good enough for wider Hollywood use.

Alibaba’s Qwen team announced Qwen3-TTS, an open-source family of text-to-speech models in 0.6 B and 1.7 B parameter sizes featuring multilingual support across 10 main languages and many dialects. The release announcement touts “high-speed, high-fidelity speech reconstruction” with low latency (97 milliseconds), as well as voice cloning from just three seconds of audio. It is available on Hugging Face and supports fine-tuning.

Inworld AI launched TTS-1.5, new text-to-speech models that come in a mini and max version and offer leading top-ranking quality with sub-250 millisecond latency. The company claims the model ranks first on TTS Arena benchmarks relative to major providers like 11Labs and MiniMax, and the model supports speech applications across 15 languages. With pricing around half a cent per minute, TTS-1.5 is both faster and cheaper than many existing commercial TTS systems.

FlashLabs introduced Chroma 1.0, which it claims is the world’s first open-source real-time speech-to-speech model with voice cloning. The model allows for instantaneous translation or voice modification (150 millisecond end-to-end latency) while preserving the speaker’s original tone and accent. Unlike conventional pipelines that chain ASR, LLM, and TTS, the model operates directly between speech inputs and outputs. FlashLabs published a Technical Paper on Chroma 1.0, and they open-sourced inference code on GitHub and model weights on HuggingFace.

Liquid AI introduced LFM2.5-1.2B Thinking, a 1.2B parameter AI reasoning model for edge devices such as smartphones that uses under 900 MB of memory while achieving fast decoding speeds (239 tokens/sec on AMD CPUs and 82 tokens/sec on mobile NPUs). The model can be used for lightweight assistants, local retrieval tasks, and on-device AI workflows.

Overworld announced Waypoint-1, an open-source real-time AI world model capable of running at 60 fps on consumer GPUs. The real-time diffusion-based world model is designed for interactive environments and simulations that require continuous world representation. Waypoint-1 is an initial release intended for experimentation and exploration. World models are not yet ready for prime time.

Google announced Personal Intelligence is extended into AI Mode in Search, rolling out a feature that tailors results based on context from a user’s Gmail and Photos data. Personal Intelligence leverages personal information from Google apps (such as Gmail) to provide specific AI-driven answers, and Google aims to bring this personalized AI experience to their full ecosystem of apps.

I said Claude Skills could be bigger than MCPs in October, and it’s proving to be true: Skills are becoming a new AI industry standard for extending Agent capabilities. Gemini CLI recently added Agent Skills. This week, Vercel launched skills.sh, a package repository and distribution tool described as “npm for AI agent skills” that aims to standardize skill deployment across agent systems. The platform quickly reached tens of thousands of installs, enabling developers to share and reuse agent capabilities.

Anthropic made its Claude Code VS Code extension generally available, bringing full agentic coding capabilities directly into the Visual Studio Code IDE. They also upgraded the Todo feature to Tasks, enabling Claude Code users to plan, track, and execute more complex projects across multiple sessions. Tasks can define dependencies between tasks and maintain them in metadata, making Claude Code more powerful for complex workflows and long-running projects.

YouTube announced a feature that allows creators to produce Shorts featuring AI-generated versions of themselves. The tool creates a digital avatar that can perform and speak based on the creator’s input, facilitating content production without filming. YouTube content creators have already been using other Avatar video-gen tools such as HeyGen to do this, but this lowers the barriers further.

Adobe Acrobat has introduced AI-powered PDF Editing into Adobe Acrobat Studio, including AI chat for PDFs and a feature that uses AI to convert PDF documents into podcast-style audio tracks. The Generate podcast tool analyzes the text within a document and generates a spoken-word summary or reading, mimicking the format of an audio program. The AI chat feature provides a way to edit PDFs via natural language prompts.

Anthropic published a new “Constitution for Claude,” an update on their original Claude Constitution from 2023 that defines the behavior and ethical values guiding its Claude AI models. Rather than rigid rules, the 80-page updated constitution outlines explanatory values intended to guide behavior and alignment, so Claude models are broadly safe, broadly ethical, genuinely helpful, and compliant with guidelines.

OpenAI confirmed a new AI hardware device is set for 2026 release. Details remain scarce, but it may involve wearables such as earbuds or a pin-like gadget.

Meanwhile, The Information reports that Apple is developing an AI-powered wearable pin equipped with cameras, targeting a release around 2027. The device is reported to be a screenless accessory the size of an AirTag that uses voice and visual recognition to interact with the user’s environment.

Apple is revamping Siri into a native AI chatbot built on advanced foundations, reportedly using Google’s Gemini models. Codenamed “Campos,” this upgraded virtual assistant will handle complex, multi-turn conversations similar to OpenAI’s ChatGPT and Google’s Gemini. It is planned for release later this year on iPhone, iPad, and Mac. Better late than never, Apple.

Stanford researchers have developed the AI model SleepFM, capable of detecting risks for over 100 diseases using data from a single night of sleep. The system analyzes physiological signals gathered during sleep to identify early biomarkers for conditions ranging from cardiovascular issues to neurological disorders.

Researchers have developed an AI choker that helps stroke patients speak, offering individuals who have lost their voice a way to communicate. The Nature paper “Wearable intelligent throat enables natural speech in stroke patients with dysarthria” explains how it works. A wearable sensor detects minute muscle and tissue vibrations that occur when a user attempts to speak, then uses an LLM to interpret and convert the signals into synthesized audio. The result is “fluent, emotionally expressive speech” from patients unable to speak.

Microsoft has revealed Rho-alpha AI, a new model for robot training that enables robots to learn complex tasks via natural language commands. The system allows operators to instruct robots using plain English, which the AI then translates into physical motor actions.

As intelligence moves into scientific research, drug discovery, energy systems, and financial modeling, new economic models will emerge. Licensing, IP-based agreements, and outcome-based pricing will share in the value created. That is how the internet evolved. Intelligence will follow the same path. – Sarah Friar, OpenAI CFO

OpenAI’s CFO shared a company memo indicating OpenAI’s 2026 strategic focus will prioritize practical adoption of AI in areas like health, science, and enterprise. OpenAI is also seeking to diversify their monetization pathways beyond subscription models, extending into ads and “outcome-based pricing.”

OpenAI’s outcome-based pricing model for AI discoveries would take a share of revenue from customers who generate significant discoveries or innovations using their AI. Their goal is partnership-based monetization where OpenAI benefits from the commercial success of its user outputs.

TechCrunch reports that 55 U.S.-based AI startups raised at least $100 million in funding rounds in 2025, as venture capital firms make AI their main bet for startup funding. Meanwhile, AI stocks still attract investor interest in 2026, as all types of investors are looking for exposure to AI growth.

India’s Principal Scientific Adviser published a proposed “techno-legal” AI governance framework. The proposal intended to balance innovation with risk management, reflecting emerging government policy approaches to AI regulation.

Google has acqui-hired Hume AI, hiring the Hume AI CEO for DeepMind and Gemini along with his team as part of a significant licensing deal to bolster its DeepMind and Gemini projects. The acquisition of talent and technology aims to enhance Google’s capabilities in empathetic AI and emotional intelligence processing.

Analysis from Originality.ai suggests over half of LinkedIn posts were likely AI-generated in 2025. Authentic human voice and quality content is said to outweigh purely generative content in engagement metrics, but AI is faking that authenticity better than ever.

ElevenLabs has released The Eleven Album, a musical project created entirely using Eleven Labs AI audio tools by 13 different artists, including Art Garfunkel. The album demonstrates the company’s capabilities in generating high-quality music and vocals for the creative arts.

It’s worth watching Deep Mind’s Demis Hassabis and Anthropic’s Dario Amodei debate AGI at Davos. In addition, Demis Hassabis has made key points in various interviews at Davos.

Demis says the main missing features needed to get to AGI are memory, continuous learning, and long-term planning. He notes the hardest high-level reasoning capability, which AI cannot yet do, is asking the right questions and generating hypotheses. His timeline for AGI is 2030. Since his perspectives on AI progress are balanced and have been proven out so far, his predictions are worth considering.

Originally Appeared Here