This article was automatically translated from the original Turkish version.
Artificial intelligence (AI) has increasingly become integrated into software development processes in recent years, driving revolutionary changes in software engineering. In particular, large language models (LLMs) used for code generation enable developers to work faster and more productively. However, the advancement of these technologies has also introduced various security challenges.
The security of AI-generated code has become one of the most pressing issues for both developers and security experts today. The use of AI in code production can introduce vulnerabilities that malicious actors may exploit. Additionally, the quality and diversity of the data used to train these systems directly impact the security of the resulting code.
Although AI models generate code through pattern recognition and learning, they do not fully understand the context, purpose, or security requirements of the code they produce. This makes AI-generated code potentially more susceptible to security flaws. For example, an analysis of code written by GitHub Copilot found that 40% of the suggested snippets contained security vulnerabilities (Perry et al., 2022).
The rapid adoption of AI technologies has improved efficiency in software development while simultaneously introducing new and complex security threats. Several recent incidents have clearly demonstrated how AI can create conditions conducive to security vulnerabilities.
At the beginning of 2025, a significant security vulnerability was discovered in the database of the Chinese AI chatbot DeepSeek. This flaw led to the exposure of user chat histories and personal information. In response, countries including Australia, South Korea, Canada, and the United States banned the use of DeepSeek in government institutions and expressed serious concerns about data security.
In January 2025, a municipality in the U.S. state of Maine fell victim to a sophisticated phishing attack using AI-generated fake emails and deepfake audio messages. The attackers deceived municipal officials into authorizing fraudulent payments, resulting in financial losses amounting to tens of thousands of dollars.
In 2024, the JarkaStealer malware was detected being distributed through fake packages on the Python Package Index (PyPI) that imitated popular AI tools such as ChatGPT and Claude. This malware was used to steal sensitive information from developers’ systems.
In 2024, the hacker group Scattered Spider infiltrated the Snowflake cloud data platform and gained access to customer data from major organizations including AT&T, Ticketmaster, and Santander Bank. This attack was part of a broader series of cyberattacks targeting AI experts and stealing sensitive data.
In 2024, the threat actor SweetSpecter, linked to China, targeted U.S.-based AI experts using the SugarGh0st RAT malware. These attacks, which affected both academic circles and private sector employees, were assessed as espionage operations aimed at stealing sensitive information.
AI code generation models frequently produce code snippets with security vulnerabilities because they learn from flawed or malicious examples in the vast code repositories on which they are trained. An empirical study found that 29.5% of Python snippets and 24.2% of JavaScript snippets generated by GitHub Copilot contained at least one security vulnerability. Furthermore, Perry and colleagues demonstrated that Copilot proposed insecure code in over 40% of high-risk CWE scenarios. This highlights the serious risks of directly integrating AI-generated code snippets into production environments.
AI models can be copied by attackers using “few-shot model extraction” techniques, enabling them to mimic the model and thereby violating intellectual property rights and bypassing security measures. Adversarial attacks can cause LLMs to generate unintended or harmful outputs by feeding them artificial inputs that fall outside the model’s training distribution. Additionally, prompt injection attacks can bypass system commands or security guidelines; according to the OWASP GenAI report, these vulnerabilities lead to critical data leaks when user inputs are directly passed to the model. In real-world cases, Carnegie Mellon University researchers demonstrated that simple text manipulations could defeat security safeguards in major models such as ChatGPT, Bard, and Claude.
Data poisoning attacks undermine the security of future code outputs by injecting malicious examples into the training data. Studies show that even small amounts of injected malicious code—less than 1%—can degrade code quality. Improta’s “un-repairing code” position paper emphasizes that minor modifications added to training data can result in security vulnerabilities that are difficult to detect in the generated code.
SQL Injection allows attackers to manipulate database queries by injecting malicious input that is interpreted as SQL commands. For example, if user input is not properly sanitized when generating dynamic SQL with Hibernate, unexpected commands may be executed.
Cross-Site Scripting (XSS) occurs when a web application fails to properly neutralize user input, allowing malicious scripts to execute in visitors’ browsers. For instance, if user input is embedded into server-generated HTML content and rendered without proper HTML encoding, attackers can launch stored or reflected XSS attacks.
Improper Authentication occurs when client-provided credentials are inadequately or not at all verified. This vulnerability can permit unauthorized access, for example, by skipping signature validation in session token usage.
Insecure Deserialization allows the execution of malicious objects during the deserialization of untrusted data streams due to inadequate validation methods. According to Cloud Defense analysis, directly loading incoming JSON data in Python without signature or type checks can lead to remote code execution risks.
Exposure of Sensitive Information violates user privacy by leaking data to unauthorized actors. For example, API keys or passwords stored explicitly in configuration files can enable attackers to access system resources in the event of a breach.
While AI-assisted code generation enhances speed and efficiency in software development, findings indicate that nearly 29.5% of generated code contains security vulnerabilities. This necessitates a security-focused design from the very beginning of the code lifecycle. Adopting “shift-left” security approaches in code generation, preserving training data integrity, and integrating automated testing tools significantly reduce security risks. Additionally, proactive measures must be taken to defend against model-based attack vectors such as adversarial attacks and prompt injection.
Recent Security Vulnerabilities Attributable to Artificial Intelligence
1. DeepSeek Data Breach
2. AI-Assisted Phishing Attack Against the City of Maine
3. Distribution of JarkaStealer Malware via PyPI
4. Snowflake Data Breach
5. Espionage Activities Targeting AI Experts Using SugarGh0st RAT
Security Risks of AI-Generated Code
Unsafe Code Generation
Model Vulnerabilities
Risks from Training Data
Common Vulnerabilities (Illustrated with CWE-Based Examples)
SQL Injection (CWE-89)
Cross-Site Scripting (CWE-79)
Improper Authentication (CWE-287)
Insecure Deserialization (CWE-502)
Exposure of Sensitive Information (CWE-200)
Methods for Preventing Vulnerabilities
Secure Coding Practices
Use of Static and Dynamic Analysis
Training Data Cleansing
Security Testing Tools for AI Code Generators
DeVAIC (Detecting Vulnerabilities in AI-Generated Code)
ACCA (AI Code Completeness Analyzer)
AICodeReview
Codexity
Semgrep and GitHub CodeQL
Recommendations