• Latest
  • Trending
13 2 Toolz Guru China's AI model could not stop a single harmful prompt

China’s AI model could not stop a single harmful prompt

February 18, 2025
pixart trainium inferentia 1120x630 Toolz Guru Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia

Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia

May 15, 2025
Social share Device trust.width 1300 Toolz Guru Device Trust from Android Enterprise

Device Trust from Android Enterprise

May 15, 2025

Detecting misbehavior in frontier reasoning models

May 14, 2025
TAS Gemini Across Devices Blog Header.width 1300 Toolz Guru Gemini is coming to watches, cars, TV and XR devices

Gemini is coming to watches, cars, TV and XR devices

May 14, 2025

New tools for building agents

May 14, 2025

Driving growth and ‘WOW’ moments with OpenAI

May 14, 2025

OpenAI’s proposals for the U.S. AI Action Plan

May 14, 2025

The court rejects Elon’s latest attempt to slow OpenAI down

May 14, 2025

New in ChatGPT for Business: March 2025

May 14, 2025

EliseAI improves housing and healthcare efficiency with AI

May 14, 2025

Introducing next-generation audio models in the API

May 14, 2025
TAS Material 3 Expressive Blog Header 1.width 1300 Toolz Guru Google launches Material 3 Expressive redesign for Android, Wear OS devices

Google launches Material 3 Expressive redesign for Android, Wear OS devices

May 14, 2025
Toolz Guru
  • Home
    Social share Device trust.width 1300 Toolz Guru Device Trust from Android Enterprise

    Device Trust from Android Enterprise

    TAS Gemini Across Devices Blog Header.width 1300 Toolz Guru Gemini is coming to watches, cars, TV and XR devices

    Gemini is coming to watches, cars, TV and XR devices

    TAS Material 3 Expressive Blog Header 1.width 1300 Toolz Guru Google launches Material 3 Expressive redesign for Android, Wear OS devices

    Google launches Material 3 Expressive redesign for Android, Wear OS devices

    Googles Geothermal Agreement SS 1920x1080.max 1440x810 Toolz Guru Google’s new model for clean energy approved in Nevada

    Google’s new model for clean energy approved in Nevada

    Superpollutants SS 1920x1080.max 1440x810 Toolz Guru We’re announcing two new partnerships to eliminate superpollutants and help the atmosphere.

    We’re announcing two new partnerships to eliminate superpollutants and help the atmosphere.

    Searchscams SS 1920x1080.max 1440x810 Toolz Guru Google’s new report on fighting scams in search results

    Google’s new report on fighting scams in search results

    AIFF SS.width 1300 Toolz Guru Google’s AI Futures Fund works with AI startups

    Google’s AI Futures Fund works with AI startups

    GFSA AI for Energy demo copy blog banner v24.width 1300 Toolz Guru Google for Startup Accelerator: AI for Energy opens

    Google for Startup Accelerator: AI for Energy opens

  • AI News
  • AI Tools
    • Image Generation
    • Content Creation
    • SEO Tools
    • Digital Tools
    • Language Models
    • Video & Audio
  • Digital Marketing
    • Content Marketing
    • Social Media
    • Search Engine Optimization
  • Reviews
No Result
View All Result
Toolz Guru
  • Home
    Social share Device trust.width 1300 Toolz Guru Device Trust from Android Enterprise

    Device Trust from Android Enterprise

    TAS Gemini Across Devices Blog Header.width 1300 Toolz Guru Gemini is coming to watches, cars, TV and XR devices

    Gemini is coming to watches, cars, TV and XR devices

    TAS Material 3 Expressive Blog Header 1.width 1300 Toolz Guru Google launches Material 3 Expressive redesign for Android, Wear OS devices

    Google launches Material 3 Expressive redesign for Android, Wear OS devices

    Googles Geothermal Agreement SS 1920x1080.max 1440x810 Toolz Guru Google’s new model for clean energy approved in Nevada

    Google’s new model for clean energy approved in Nevada

    Superpollutants SS 1920x1080.max 1440x810 Toolz Guru We’re announcing two new partnerships to eliminate superpollutants and help the atmosphere.

    We’re announcing two new partnerships to eliminate superpollutants and help the atmosphere.

    Searchscams SS 1920x1080.max 1440x810 Toolz Guru Google’s new report on fighting scams in search results

    Google’s new report on fighting scams in search results

    AIFF SS.width 1300 Toolz Guru Google’s AI Futures Fund works with AI startups

    Google’s AI Futures Fund works with AI startups

    GFSA AI for Energy demo copy blog banner v24.width 1300 Toolz Guru Google for Startup Accelerator: AI for Energy opens

    Google for Startup Accelerator: AI for Energy opens

  • AI News
  • AI Tools
    • Image Generation
    • Content Creation
    • SEO Tools
    • Digital Tools
    • Language Models
    • Video & Audio
  • Digital Marketing
    • Content Marketing
    • Social Media
    • Search Engine Optimization
  • Reviews
No Result
View All Result
Toolz Guru
No Result
View All Result
Home AI Tools

China’s AI model could not stop a single harmful prompt

by Maxim Makedonsky
February 18, 2025
in AI Tools, SEO Tools
0 0
13 2 Toolz Guru China's AI model could not stop a single harmful prompt
Share on FacebookShare on Twitter

Headline-hitting DeepSeek R1, a new chatbot by a Chinese startup, has failed abysmally in key safety and security tests conducted by a research team at Cisco in collaboration with researchers from the University of Pennsylvania.

“DeepSeek R1 exhibited a 100% attack success rate, meaning it failed to block a single harmful prompt,” said the research team.

Related Post

13 2 Toolz Guru China's AI model could not stop a single harmful prompt

China’s AI model could not stop a single harmful prompt

February 18, 2025
sora ai

Sora: Revolutionizing AI Video Creation

February 28, 2025

This new chatbot has garnered massive attention for its impressive performance in reasoning tasks at a fraction of the cost. Reportedly, DeepSeek R1’s development involved around $6 million in training expenses compared to the billions invested by other major players like OpenAI, Meta, and Gemini.

“DeepSeek has combined chain-of-thought prompting and reward modeling with distillation to create models that significantly outperform traditional large language models (LLMs) in reasoning tasks while maintaining high operational efficiency,” explained the team.

However, the Cisco report has exposed flaws that render DeepSeek R1 highly susceptible to malicious use.

“Our findings suggest that DeepSeek’s claimed cost-efficient training methods, including reinforcement learning, chain-of-thought self-evaluation, and distillation may have compromised its safety mechanisms,” added the report.

Researchers used algorithmic jailbreaking

The team employed “algorithmic jailbreaking,” a technique used to identify vulnerabilities in AI models by constructing prompts designed to bypass safety protocols. They tested DeepSeek R1 against 50 prompts from the HarmBench dataset.

“The HarmBench benchmark has a total of 400 behaviors across 7 harm categories including cybercrime, misinformation, illegal activities, and general harm,” highlighted the team.

The results of this evaluation are concerning. DeepSeek R1 exhibited a 100% attack success rate. This means that for every single harmful prompt presented, the AI failed to recognize the danger and provided a response, bypassing all its internal safeguards.

“This contrasts starkly with other leading models, which demonstrated at least partial resistance,” said the team.

To provide further context, the research team also tested other leading language models for their vulnerability to algorithmic jailbreaking. For example, Llama 3.1-405B had a 96% attack success rate, GPT 4o had 86%, Gemini 1.5 pro had 64%, Claude 3.5 Sonnet had 36%, and O1 preview had 26%.

These other models, while not impervious, possess some level of internal safeguards designed to prevent the generation of harmful content. DeepSeek R1 appears to lack these safeguards.

Controversies surrounding DeepSeek R1

The research team’s analysis points to a potential trade-off between efficiency and safety in DeepSeek’s approach. While the company has succeeded in developing a high-performing model at a fraction of the usual cost, it appears to have done so at the expense of robust safety mechanisms.

“Our findings suggest that DeepSeek’s claimed cost-efficient training methods, including reinforcement learning, chain-of-thought self-evaluation, and distillation may have compromised its safety mechanisms,” concluded the researchers.

Notably, since its launch, DeepSeek R1 has faced several controversies. Recently, independent research company SemiAnalysis suggested that the training cost of developing this AI model could have been around a staggering $1.3 billion, much higher than the company’s claim of $6 million.

Besides, OpenAI has accused DeepSeek of data theft. Sam Altman’s company said that the Chinese AI startup has used its proprietary models’ outputs to train a competing chatbot. However, it is interesting to note that OpenAI itself has been sued for alleged copyright infringement and data misuse on multiple occasions.

Meanwhile, a group of researchers in the United States have claimed to reproduce the core technology behind DeepSeek’s headline-grabbing AI at a total cost of roughly $30.

While developing an AI chatbot in a cost-effective way is certainly tempting, the Cisco report underscores the need for not neglecting safety and security for performance.



Source link

Donation

Buy author a coffee

Donate
Tags: DeepSeekInventions and MachinesOpenAI
Maxim Makedonsky

Maxim Makedonsky

  • ChatGPT

    The Rise of the Content Creator: How to Build Your Brand in the Digital Age

    36 shares
    Share 14 Tweet 9
  • Grok AI Upgrade

    27 shares
    Share 11 Tweet 7
  • Junia AI: Content Generation & SEO Tools

    26 shares
    Share 10 Tweet 7
  • Boost Your WordPress Speed: Quick Tips!

    25 shares
    Share 10 Tweet 6
  • Cool Tech Gifts for Your Valentine

    23 shares
    Share 9 Tweet 6
pixart trainium inferentia 1120x630 Toolz Guru Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia

Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia

by Maxim Makedonsky
May 15, 2025
0

PixArt-Sigma is a diffusion transformer model that is capable of image generation at 4k resolution. This model shows significant improvements...

Social share Device trust.width 1300 Toolz Guru Device Trust from Android Enterprise

Device Trust from Android Enterprise

by Maxim Makedonsky
May 15, 2025
0

Integrated security, all in one viewMobile security has often been treated as a silo, separate from endpoint and identity security....

Detecting misbehavior in frontier reasoning models

by Maxim Makedonsky
May 14, 2025
0

Frontier reasoning models exploit loopholes when given the chance. We show we can detect exploits using an LLM to monitor...

TAS Gemini Across Devices Blog Header.width 1300 Toolz Guru Gemini is coming to watches, cars, TV and XR devices

Gemini is coming to watches, cars, TV and XR devices

by Maxim Makedonsky
May 14, 2025
0

Make your drive more productive and enjoyable, hands-freeHands-free voice commands with Google Assistant have always been at the core of...

No Content Available
Facebook Twitter Instagram Youtube
Currently Playing

Recent Posts

  • Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia
  • Device Trust from Android Enterprise
  • Detecting misbehavior in frontier reasoning models

Categories

  • AI News
  • AI News Feeds
  • AI Tools
  • Blogging Tips
  • Business
  • ChatGPT
  • Content Markeeting
  • Digital
  • Digital Marketing
  • Digital Tools
  • Image Generation
  • Language Models
  • Productivity
  • Prompts
  • Reviews
  • Search Engine Optimization
  • SEO Tools
  • Social Media
  • Technology
  • Video & Audio
  • Videos

2025 by Toolz Guru

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Home

2025 by Toolz Guru

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
Go to mobile version