Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
As AI-generated content continues to gain traction, startups building the technology for it are raising the bar on their products. Just a couple of weeks ago, RunwayML opened access to its new, more realistic model for video generation. Now, London-based Haiper, the AI video startup founded by former Google Deepmind researchers Yishu Miao and Ziyu Wang, is launching a new visual foundation model: Haiper 1.5.
Available on the company’s web and mobile platform, Haiper 1.5 is an incremental update and allows users to generate 8-second-long clips from text, image and video prompts — twice as long as Haiper’s initial model.
The company also announced a new upscaler capability that enables users to enhance the quality of their content as well as plans to venture into image generation.
The move comes just four months after Haiper emerged from stealth. The company is still at a nascent stage and not as heavily funded as other AI startups but claims to have onboarded over 1.5 million users on its platform — which signals its strong positioning. It is now looking to grow this user base with an expanded suite of AI products and take on Runway and others in the category.
“The race in video generative AI is not just necessarily in the power of the models but also in what these models are scaled to recreate. Our distributed data processing and scaled model training will allow us to continue to train and iterate our powerful foundation model with this aim in mind. As this update highlights, we’re making continued advancements as we build a model that doesn’t just produce more and more beautiful videos and longer and longer ones, but also can replicate imagery we can all truly recognize as the world around us,” Miao, who is also the CEO of the company, told VentureBeat.
What does Haiper AI video platform bring to the table?
Launched in March, Haiper has followed the likes of Runway and Pika to provide users with a comprehensive platform for video generation, powered by a perceptual foundation model trained in-house. At the core, it’s pretty straightforward to use: the user just has to enter a text prompt describing anything they could imagine and the model produces the content based on it, complete with prompts to adjust elements such as characters, objects, backgrounds and artistic styles.
Initially, Haiper processed text prompts or animated existing images into 2-4 second-long clips. The capability did the job, but the length of the content was not enough to target broader use cases — a common concern the company heard from creators. Now, with the launch of the latest model, it is solving this problem by doubling up the length of the generations to eight seconds.
Haiper’s 8-sec generated video
It can even extend user’s prior 2 and 4-second generations to 8 seconds in a way similar to what we have seen on other AI video tools such as Luma’s new Dream Machine model.
“Since launching less than four months ago, the response to our video generation models has been inspiring. Our goal of continually pushing the boundaries of this technology has led to our latest eight-second model, doubling the length of video generation on the platform,” Miao said in a statement.
But, that’s not all.
Originally, Haiper produced high-definition videos of just two seconds while longer clips came out in standard definition. The latest update changes that by giving users the ability to generate a clip of any length with SD or HD quality.
There’s also an integrated upscaler that enables users to enhance all of their video generations to 1080p in a single click, without disturbing existing workflows. The tool will even work with images and videos users already have. They’ll just have to upload them to the upscaler to improve the quality.
In addition to the upscaler, Haiper is also adding a new image model to its platform. This will enable users to generate images from text prompts and then animate them through the text-to-video offering for perfect video results. Haiper says the integration of image generation in the video generation pipeline will allow users to test out, review and rework their content before moving towards the animation stage.
“At Haiper we don’t iterate for the mere sake of it, we want to listen and bring our users’ ideas to life. Debuting our new upscaler and Text2Image tools is a testament to the fact we are the video generative AI platform for the community, engaging with our users and actively improving for them,” Miao added.
Building AGI with the perception of the world
While Haiper’s new model and updates look promising, especially given the samples shared by the company, they are yet to be tested by the wider community. When VentureBeat tried accessing the tools on the company’s website, the image model was unavailable, while eight-second-long generations and the upscaler were restricted only to those paying for the company’s Pro plan priced at $24/month, billed yearly.
Miao told us the company plans to make 8-second videos more widely available via a few methods, including a credit system, and the image model will debut later this month for free – with an option to upgrade for faster and more concurrent generations.
In terms of quality, the two-second videos from the platform appear more consistent than the longer ones, which are still a hit or miss. The four-second videos we generated blurred out at times with a lack (or overuse) of subject and object details, especially in the case of motion-heavy content.
However, with these updates and more planned for the future, the quality of generations from Haiper is expected to improve. The company says it plans to enhance its perceptual foundation models’ understanding of the world, essentially creating AGI that could replicate the emotional and physical elements of reality – covering the tiniest visual aspects, including light, motion, texture and interactions between objects – for creating true-to-life content.
“Each frame of a video carries an array of minute visual information…For AI to create visually stunning content that is true to life, it will require an inherent understanding of the world and the physics behind it. AI capable of understanding, interpreting, and generating such complexities in video content will possess a deeper knowledge and perceptual ability that takes us one step closer to AGI. A model with such capabilities could have the potential to transcend content creation and storytelling and have far-reaching applications, in sectors such as robotics or transportation,” Miao explained.
It will be interesting to see how the company builds in this direction and takes on rivals like Runway, Pika and OpenAI that still appear ahead in the AI video race.
VB Daily
Stay in the know! Get the latest news in your inbox daily
Thanks for subscribing. Check out more VB newsletters here.
An error occured.