Alibaba’s Qwen2-VL AI can analyze videos longer than 20 minutes

Be part of our day by day and weekly newsletters for the most recent updates and unique content material on industry-leading AI protection. learn more

Alibaba Cloud, the cloud companies and storage arm of the Chinese language e-commerce big, Announcing the release of Qwen2-VLits newest superior visible language mannequin is designed to reinforce visible understanding, video understanding, and multilingual text-image processing.

It already has spectacular efficiency in third-party benchmarks in comparison with different main state-of-the-art fashions akin to Meta’s Llama 3.1, OpenAI’s GPT-4o, Anthropic’s Claude 3 Haiku, and Google’s Gemini-1.5 Flash.

Qwen 2VL 7B and 2B are right here – Apache 2.0 licensed smol visible language mannequin competing with GPT 4o mini – with video understanding, perform calling and extra! ?
> 72B (to be launched later) beats 3.5 Sonnet & GPT 4o
> Capable of perceive movies as much as 20 minutes lengthy
> Deal with any… pic.twitter.com/JmP6zpGNml
— Vaibhav (VB) Srivastav (@reach_vb) August 29, 2024

Supported languages embrace English, Chinese language, most European languages, Japanese, Korean, Arabic and Vietnamese.

Glorious skill to investigate pictures and movies and even present on the spot technical help

With the brand new Qwen-2VL, Alibaba is seeking to set new requirements for the way synthetic intelligence fashions work together with visible materials, together with the power to investigate and acknowledge handwriting in a number of languages, establish, describe and distinguish a number of objects in nonetheless pictures, and even analyze Immediate information capabilities.

Because the Qwen analysis crew wrote in a Github weblog put up in regards to the new Qwen2-VL collection of fashions: “Along with static pictures, Qwen2-VL extends its capabilities to video content material evaluation. It will possibly summarize video content material and reply questions associated to it. related questions and preserve a steady dialog stream in actual time, offering reside chat help. This function permits it to behave as a private assistant to assist customers by offering insights and knowledge extracted instantly from the video content material.

As well as, Alibaba claims that it might analyze movies longer than 20 minutes and reply questions in regards to the content material.

Alibaba even confirmed examples of the brand new mannequin, accurately analyzed and described within the following video:

The next is a abstract of Qwen-2VL:

The video begins with a person chatting with the digicam, adopted by a gaggle of individuals sitting in a management room. The digicam then cuts to 2 males floating contained in the area station, who will be seen chatting with the digicam. The boys seem like astronauts, sporting area fits. The station is full of every kind of kit and equipment, and the digicam strikes round to point out totally different areas of the station. The boys proceed to talk to the digicam, showing to debate their mission and the assorted duties they’re performing. General, the footage supplies an fascinating look into the world of area exploration and the day by day lives of astronauts.

Three sizes, two of that are totally open supply beneath the Apache 2.0 license

Alibaba’s new mannequin is available in three variants with totally different parameter sizes – Qwen2-VL-72B (72 billion parameters), Qwen2-VL-7B and Qwen2-VL-2B. (As a reminder, parameters describe the inner settings of the mannequin, and extra parameters often imply a extra highly effective mannequin.)

The 7B and 2B variants can be found beneath the open supply Apache 2.0 license, permitting enterprises to make use of them for business functions at will, making them a sexy choice for potential resolution makers. They’re designed to ship aggressive efficiency at extra accessible scale and can be found on the next platforms: Face hugging and Model scope.

Nevertheless, the biggest 72B mannequin has not but been launched to the general public and might be made out there later by way of a separate license and utility programming interface (API) from Alibaba.

Operate calls and human visible notion of classes

The Qwen2-VL collection builds on the Qwen mannequin collection and brings vital developments in a number of key areas:

These fashions will be built-in into units akin to cell phones and robots to allow automated operations primarily based on visible environments and textual directions.

This function highlights the potential of Qwen2-VL as a robust software for duties requiring complicated reasoning and decision-making.

As well as, Qwen2-VL helps perform calls (integration with different third-party software program, functions and instruments) and intuitive extraction of data from these third-party data sources. In different phrases, the mannequin can see and perceive “flight standing, climate forecasts, or bundle monitoring,” which Alibaba says allows it to “facilitate interactions just like human notion of the world.”

Qwen2-VL introduces a number of architectural enhancements designed to reinforce the mannequin’s skill to course of and perceive visible materials.

this Naive dynamic decision Help permits fashions to course of pictures of various resolutions, making certain consistency and accuracy in visible interpretation. additionally, Multimodal rotational place embedding (M-ROPE) The system allows fashions to concurrently seize and combine location data from textual content, pictures and movies.

What’s subsequent for the Qwen crew?

Alibaba’s Qwen crew is dedicated to additional bettering the capabilities of visible language fashions primarily based on the success of Qwen2-VL, and plans to combine extra modes and improve the mannequin’s usefulness in a wider vary of functions.

Qwen2-VL fashions are actually out there, and the Qwen crew encourages builders and researchers to discover the potential of those cutting-edge instruments.

VB Day by day

Keep knowledgeable! Get the most recent information in your inbox on daily basis

By subscribing, you comply with VentureBeat’s Terms of Service.

Thanks in your subscription. See extra VB Newsletter is here.

An error occurred.

Source link

What's Hot

Iga Swiatek: Five-time Grand Slam champion accepts one-month ban over doping case Tennis News

Ethan Nwaneri: Understanding the rise of Arsenal’s star-in-waiting | Football News

Frank Lampard: Coventry appoint ex-Chelsea and Everton boss as successor to Mark Robins Football News

Alibaba’s Qwen2-VL AI can analyze videos longer than 20 minutes

This new app makes artificial intelligence writing undetectable – £30 for life

Grab a VPN while it lasts

X suspends reporter Ken Klippenstein after publishing JD Vance dossier

Here’s how to try Meta’s new Llama 3.2 with Vision for free

Watch Florida road conditions with live webcam as Hurricane Helen approaches

Stephen King’s Vampire Adaptation Review

Liberal Party vs. Chase Oliver

Interlock launches ThreatSlayer Web3 security extension and incentivized crowdsourced cybersecurity community

Telemedicine company accused of being an Adderall pill factory says it will continue treating patients

Iga Swiatek: Five-time Grand Slam champion accepts one-month ban over doping case Tennis News

Ethan Nwaneri: Understanding the rise of Arsenal’s star-in-waiting | Football News

Frank Lampard: Coventry appoint ex-Chelsea and Everton boss as successor to Mark Robins Football News

England spinner Shoaib Bashir says Ben Stokes’ faith ‘brought out the best in me’ after four Tests against New Zealand ball news

Most Popular

Women in Defense initiative needs greater transparency and oversight

Grayscale Ethereum Trust achieves zero outflows for the first time after ETF conversion

Aaron Wan-Bissaka: West Ham sign Manchester United defender on seven-year contract Football News

Our Picks

Iga Swiatek: Five-time Grand Slam champion accepts one-month ban over doping case Tennis News

Ethan Nwaneri: Understanding the rise of Arsenal’s star-in-waiting | Football News

Frank Lampard: Coventry appoint ex-Chelsea and Everton boss as successor to Mark Robins Football News

Subscribe to Updates

What's Hot

Alibaba’s Qwen2-VL AI can analyze videos longer than 20 minutes

Glorious skill to investigate pictures and movies and even present on the spot technical help

Three sizes, two of that are totally open supply beneath the Apache 2.0 license

Operate calls and human visible notion of classes

What’s subsequent for the Qwen crew?

Related Posts