DeepSeek V3 Announced on the Hugging Face AI Platform

The Chinese company DeepSeek has released its latest large language model (LLM) called DeepSeek-V3-0324. This 641 gigabyte model was released on the Hugging Face AI platform with minimal pre-announcement, in line with the company’s practice of restrained product announcements.

The model is unique in that its license allows for free commercial use. Early benchmarks show that DeepSeek-V3-0324 is capable of running on commercially available hardware, such as Apple’s Mac Studio with the M3 Ultra processor.

AI scientist Awni Hannun from Apple reported that this model is capable to achieve a processing speed of over 20 tokens per second using this Apple hardware setup.

This ability to run a large language model on local ready-to-use hardware is the exact opposite of the conventional way of using other AI models that require massive data center infrastructures to support high-performance AI models.

According to DeepSeek, initial tests have shown significant improvement compared to previous versions.

The model has been rigorously tested by internal stakeholders and has excellent performance, potentially surpassing all other competitive models and even Anthropic’s Claude Sonnet 3.5 in non-logical tasks.

However, unlike subscription-based models like Sonnet and ChatGPT, DeepSeek-V3-0324 is free to download and use.

Technically, the model is a Mixture of Experts (MoE) architecture. It selectively uses around 37 billion of its 671 billion parameters per task, enhancing efficiency by reducing computational requirements while maintaining performance.

The model also uses Multi-Head Latent Attention (MLA) and Multi-Token Prediction (MTP) technologies, which contribute to improved context retention and faster output speed.

Access to the model can be obtained through Hugging Face, the OpenRouter API and chat interface, and the DeepSeek chat platform, if desired. The Hyperbolic Labs inference provider also offers access to the model.

Key Features and Performance

The model was pre-trained on 14.8T tokens using 2.788M GPU hours
DeepSeek-V3 outperforms Llama 3.1 405B and GPT-4o on key benchmarks
It demonstrates exceptional capabilities in coding and mathematical tasks
The model is designed for a wide range of natural language processing tasks

DeepSeek AI also offers API access and an online demo for those looking to test the model’s capabilities. The company originally released DeepSeek R1 on Jan. 20, 2025, which is an LLM that runs on open source license which is free to use, and has shaken the AI technology market with its affordability in terms of costs.

Spread the love

DeepSeek V3 Announced on the Hugging Face AI Platform

Key Features and Performance

Related Posts

About Harris Andrea

Leave a Reply