Large Language Models (LLMs) have burst onto the scene like the hottest new band, promising to revolutionise everything from customer service to content generation. Picture a shiny, futuristic metropolis where AI-powered assistants handle mundane tasks with a flick of their digital wrists, leaving human innovators free to dream up the next big thing. But looming on the horizon is the shadowy figure of cybersecurity risks, sneaking in like a mischievous gremlin intent on wreaking havoc in our high-tech utopia.
Starting with the disclaimer that WeakLink Security is not here to demonise the use of LLMs, we believe that LLMs are not only a competitive advantage but an inevitable reality for businesses in this brave new world. This is precisely why understanding the cybersecurity risks associated with LLMs is beyond essential. And, of course, there are many cybersecurity applications of LLMs, which might be (hint) a topic for discussion in a future blogpost.
For the moment, however, let’s turn our gaze towards the topic of cybersecurity risks related to LLMs and what we can do to safeguard against various threats that can compromise their confidentiality, integrity, availability and functionality. The significance of LLM security cannot be overstated. A single breach can lead to catastrophic consequences—valuable knowledge stolen, trust eroded, and the model’s reputation tarnished. Protecting the data that flows through these models with robust encryption, strict access controls, and regular audits helps prevent such breaches, ensuring the integrity and confidentiality of the information. Furthermore, maintaining strong security measures fosters trust and reliability in AI applications. After all, a trustworthy model is a usable model.
Potential Cybersecurity Risks
There are many ways in which we can approach the topic of potential cybersecurity risks. For our purposes here, we will use OWASP’s Top 10 LLM Security Risks [1] but structure them into the broader framework so we can create a more at-a-glance comprehensive overview for protecting LLMs. This way, we can visualise the most common cybersecurity risks through the lens of their impact area. For further information about the mitigation techniques, consider checking out the excellent LLM Applications Cybersecurity and Governance Checklist [2], again produced and distributed for free by OWASP.
Data Privacy and Confidentiality
Starting with data privacy and confidentiality and data security in general, within the context of LLMs, a significant threat is the risk of data exposure during training and usage. With regards to data exposure, the risks begin with the start of data collection and the preprocessing stages, where sensitive or personal data might be inadvertently included in the training dataset. LLMs may further unintentionally memorise and reproduce sensitive information included in the training data. If not properly managed, this can lead to the unintentional disclosure of sensitive information in responses generated by the model.
On the other hand, during model training, LLMs require substantial computational resources and involve transmitting data across networks. Without proper encryption and secure transmission protocols, this data can be intercepted by malicious actors, leading to unauthorised access.
Last but not least, data security comes with access control. During both training and deployment, inadequate access controls can result in unauthorised individuals gaining access to sensitive data. This includes internal threats from employees who may misuse their access privileges. Ensure that data remains confidential and inaccessible to unauthorised users.
Sensitive Information Disclosure occurs when LLMs inadvertently reveal confidential data, proprietary algorithms, or other critical details through their outputs. This risk can lead to significant privacy violations, unauthorised access to sensitive information, and compliance risks. For your organisation, this might mean facing legal repercussions, losing your customers’ trust, and suffering damage to your reputation.
What you can do about it revolves around implementing robust data sanitisation and encryption techniques. In terms of availability, consider encrypting data at rest and/or in transit, and do not forget your security updates and patches. Of course, access control plays a big role in preventing sensitive information disclosure. By restricting who can interact with the data and the model, you support the prevention of unauthorised modifications or access.
Insecure Output Handling is another one of OWASP’s Top 10 LLM Threats. This security risk refers to the inadequate validation, sanitisation, and management of outputs generated by LLMs before they are passed downstream to other systems. This vulnerability can lead to severe security issues such as cross-site scripting (XSS), cross-site request forgery (CSRF), server-side request forgery (SSRF), privilege escalation, and remote code execution [3].
Here, the best course of action is to implement strict input validation on responses from the model and encode outputs to prevent the execution of malicious code. Consider applying role-based access controls (RBAC) to limit access based on user roles and enforce the principle of least privilege (PoLP). We know this might be a hard one, especially for tech start-ups and small businesses, where each team member wears many hats. In this case, you might approach an outside team of experts to conduct a thorough security audit and give you helpful guidelines. We are very aware of the cost of conforming to a zero-trust approach where the model's outputs are treated as untrusted by default, which is not easily applicable to a wider context; however, if the resources and guidance are available for your use case, it could prove an option as well.
Supply Chain Vulnerabilities in LLMs can compromise the integrity of training data, models, and deployment platforms. These vulnerabilities often arise from third-party components such as pre-trained models and datasets that may be tampered with or poisoned. Attackers can exploit these weak points to introduce biased outcomes and security breaches or even cause complete system failures. Attack scenarios range from exploiting package registries to distribute malicious software to poisoning datasets that subtly favour certain entities [4].
What you can do here is to vet suppliers thoroughly, maintain an up-to-date inventory of components via a Software Bill of Materials (SBOM), and apply rigorous security checks on plugins and external models. Ensure training data is sourced from trusted and verified entities and employ advanced data preprocessing techniques to detect and mitigate potential biases. Anticipate data biases and ensure, as best as possible, the inclusion of diverse, representative data while protecting individual data points.
Model Security and Integrity
With the broader category of model security, we mean taking into account relevant risks to safeguard your model’s architectures and parameters from tampering (or, otherwise, integrity protection). This also includes defending against adversarial actions that seek to manipulate or compromise the model.
Model security is important as it also concerns the usability of your AI-driven solutions. A compromised model security might result in an overall degraded performance, where the accuracy of your model is greatly affected, thus reducing its effectiveness during use.
Integrity risks could also be considered a sort of “Pandora’s Box” as integrity-related vulnerabilities will, more often than not, open doors to further security breaches, as manipulated outputs might trigger vulnerabilities in connected systems.
Prompt Injection is one of OWASP’s Top 10 LLM Vulnerabilities we would like to include under the category of model security. Prompt injections involve manipulating the inputs to an LLM in a way that causes the model to execute unintended actions or reveal sensitive information. Attackers craft inputs that exploit vulnerabilities in the model's handling of prompts, potentially gaining access to backend systems or manipulating the model’s behaviour [5].
As with all injection attacks, consequences can be considered very severe, ranging from data exfiltration to unauthorised actions and hitting every spot in between. We know that data exfiltration can also fall into the category of data security, as it means that attackers may extract sensitive data from the model, including personal information, proprietary data, and other confidential content. We considered it as a model security issue, however, as it also opens the door to further threats, such as privilege escalation, or other cause the model to perform unauthorised actions, such as modifying or deleting data, executing harmful code, or generating inappropriate responses that could harm model’s integrity and the overall organisation’s reputation.
Implementing robust security measures can help prevent prompt injection attacks. Establishing strict access controls to limit who can interact with the model’s prompts, as well as other forms of privilege controls, is a first step. Of course, implementing comprehensive input validation and sanitisation processes to ensure that all inputs are checked for malicious content before being processed by the model is also paramount. If possible, apply whitelisting approaches to accept only known safe inputs and, otherwise said, segregate external content to restrict inputs from potentially untrusted sources.
We will not get tired of repeating it—conduct security audits to help your team catch and patch vulnerabilities. In this case, this might include reviewing input handling and prompt processing mechanisms.
Training Data Poisoning involves tampering with the data used during a model’s pre-training, fine-tuning, or embedding stages to introduce vulnerabilities. This manipulation can lead to compromised model security, performance degradation, and biased outputs [6]. Simply put, this means that a potential adversarial party might inject falsified or malicious information into the training datasets, which can then be reflected in the model’s responses.
A recommended mitigation action for this particular risk is to conduct rigorous verification of data sources and implement measures like sandboxing to control data ingestion. You might approach techniques such as differential privacy and adversarial robustness to detect and prevent poisoning attempts.
Model Theft is pretty straightforward concept. It involves the unauthorised access and extraction of machine learning models, including Large Language Models (LLMs). This can happen through various means, such as exploiting vulnerabilities in the system, insider threats, or sophisticated attacks that extract model parameters and architecture details [7]. The impact of model theft, however, could be devastating, especially in terms of economic losses.
Consider the significant investment in terms of time, computational resources, and specialised expertise required around an LLM. Unauthorised extraction of a model can lead to substantial financial losses as competitors or malicious actors can use the stolen models without incurring the same development costs. Furthermore, proprietary models often provide a competitive edge and often incorporate proprietary algorithms and optimisations that are considered intellectual property. Theft of these models constitutes a serious breach of intellectual property rights and a loss of competitive advantage and trust.
Last but not least, as LLMs can potentially be reverse-engineered to find and exploit weaknesses, model theft poses further security risks, especially when considering that they can generate malicious outputs or misinformation if used unethically.
Implementing robust access control mechanisms to ensure that only authorised personnel have access to the models is one of the most common recommendations to safeguard against model theft. This includes using multi-factor authentication (MFA), role-based access control (RBAC), and ensuring that access is granted on a need-to-know basis. Of course, using strong encryption protocols is another must to protect data integrity, but also techniques such as model watermarking, where unique identifiers are embedded into the model to trace ownership and detect unauthorised copies, could be applied (data masking and obfuscation).
Another common recommendation is continuous monitoring and maintaining detailed audit logs of all access and modifications to the models. These logs should also be reviewed regularly to detect any unauthorised activities from the inside. Consider implementing policies and technologies to detect and prevent insider threats.
Legal protections such as patents and trade secrets to legally safeguard models could be an appropriate step for those of you using proprietary LLMs, along with establishing clear terms of use and non-disclosure agreements (NDAs) with employees and partners.
Infrastructure Security and Availability
In this category, we broadly include the risks to the availability of your models by approaching them through the lens of the underlying infrastructure. Before we even start discussing the threats related to this category, we want to specify that, as with other assets that require availability security, some general rules apply.
Such is the case with monitoring and anomaly detection, where the network traffic and model usage patterns should be constantly scrutinised. Another common approach recommended for many emerging technologies is dynamic resource monitoring and management.
Beyond the pure availability of the model’s functionality, cybersecurity threats related to the infrastructure security of your models will expose you to the risk of broader system compromise; thus, addressing these risks is of importance to the confidentiality and integrity of the model as well.
Model Denial of Service (DoS) attacks occur when an attacker overwhelms the LLM with resource-intensive queries, consuming excessive computational resources and rendering the model unresponsive or significantly degraded in performance. The impact of this attack, as in the cases of DoS attacks in general, is that it results in service degradation, meaning that legitimate users experience slow responses or service unavailability, which can disrupt operations and erode user experience [8].
Critical applications relying on the LLM may face downtime, impacting business continuity and productivity during model DoS attacks. Furthermore, excessive computational load can strain the underlying hardware and network infrastructure, potentially causing failures or requiring costly upgrades.
The impact here is obvious and also very much measurable financially. Deploying comprehensive input validation mechanisms to filter out resource-intensive or malformed queries before they reach the LLM is a key approach to helping identify and block malicious requests.
Enforcing API rate limits to control the number of requests a user or IP address can make within a given timeframe might also be a viable strategy, depending on your operation, to prevent any single user from overwhelming the system with excessive requests.
Implementing an appropriate resource allocation strategy was already discussed at the beginning of this chapter. Still, it is of paramount importance for this scenario as well, as it might be the only approach to ensure that critical services remain operational even under attack.
Use anomaly detection systems to flag and respond to potential threats in real time or at least be able to analyse your traffic.
Insecure Plugin Design: Maintaining a secure environment for service availability supports data integrity and confidentiality throughout the LLM's lifecycle. One key issue related to providing a secure environment is plugins. Insecure Plugin Design refers to vulnerabilities in the interactions between the LLM and external plugins, which can be exploited to execute unauthorised code or exfiltrate sensitive data.
If left unaddressed, it could expose your model to potential malicious actors who can exploit vulnerabilities to execute arbitrary code on the host system, potentially taking control of the infrastructure [9]. This means that your sensitive data might be left vulnerable to access and extraction by unauthorised entities, leading to data breaches and privacy violations.
An approach to test here would be sandboxing. Run plugins in isolated environments (sandboxes) to contain any potential security breaches. This limits the impact of a compromised plugin on the overall system. Also, consider implementing continuous monitoring of plugin activities to detect and respond to suspicious behaviour. This includes logging all interactions and analysing logs for signs of potential exploitation.
Ethical Risk Considerations for Security
In recent years, we have witnessed the adverse effects of inadequately managed large language models (LLMs), ranging from biased outputs to security vulnerabilities. To mitigate these issues, it is crucial to take proactive measures to ensure transparency and accountability in all processes. For guidelines, especially in the European context, you might consider checking out the Assessment List for Trustworthy Artificial Intelligence by the EU’s High-Level Expert Group on Artificial Intelligence (AI HLEG). [10]
Excessive Agency: Overly autonomous LLM systems can act independently beyond their intended scope, making decisions or taking actions without appropriate human oversight. The consequences of this can range from unintended actions that could compromise security to propagating harmful content or violating ethical standards.
To prevent excessive autonomy, it is crucial to limit the functionalities of LLMs and incorporate human oversight through functionality constraints, human-in-the-loop systems and periodic reviews of models’ capabilities.
Transparency and Accountability Risks: Remember the importance of maintaining clear records of data sources and training processes. This documentation should detail where the data comes from, how it is processed, and any transformations applied during training.
Compliance Risks: Adherence to regulatory standards and facilitation of audits are key components of accountability. Compliance involves staying up-to-date with current regulations and industry standards, implementing necessary changes to meet these requirements, and being prepared for regular audits to verify adherence.
Overreliance: Overreliance on LLM outputs occurs when users trust the model’s responses without proper verification, potentially leading to the spread of misinformation or other legal issues. This unchecked trust can result in significant consequences, such as the dissemination of false information, making business decisions based on inaccurate data, and legal repercussions from relying on erroneous outputs.
To mitigate the risk of overreliance, it is essential to regularly monitor and cross-check outputs and communicate the associated risks to users.
In Conclusion
As LLMs continue to evolve, so too will the threats they face. It is inevitable, but the simplest thing you can do is maintain continuous vigilance. Adapting to emerging risks and ensuring the responsible and ethical use of LLMs is not an easy task. It begins by staying informed about the latest developments in AI security and regularly updating security measures so you can protect your assets and maintain the trust of your customers and stakeholders.
We structure common cybersecurity risks of LLMs into four broad categories:
- Data Privacy and Confidentiality. We stressed the importance of robust encryption, access controls, and regular audits to protect sensitive data from exposure and leaks.
- Model Security and Integrity. In this category, we highlighted the need for integrity protection and adversarial defences to safeguard models from tampering and poisoning.
- Infrastructure Security and Availability. For this category, we stressed the significance of protecting hardware and networks, implementing input validation, and managing resource allocation to prevent DoS attacks, for instance.
- Ethical Risk Considerations for Security. This category has many risks, many of which stem from cybersecurity, technical robustness, and transparency issues. Here, we stress the need to advocate for applying human-in-the-loop principles, transparency in data sourcing, and adherence to regulatory standards to prevent harmful outputs and ensure accountability.
Being proactive in securing your LLM applications is the key takeaway here. This involves implementing AI safety frameworks and establishing trust boundaries in the first place. Some milestones are:
- Use AI safety frameworks and implement structured guidelines.
- Establish trust boundaries and segregate user inputs and system prompts.
- Logging and monitoring to ensure transparency and accountability.
- Consider error handling and validation checks and clear error messages.
- Ensure secure interaction with plugins.
- Apply query and access controls, and more specifically, robust authentication and rate limiting.
- Secure APIs by protecting data exchange and implementing strong authentication.
In conclusion, while LLMs offer immense potential, their deployment must be accompanied by a commitment to robust cybersecurity practices. By understanding and mitigating the associated risks, we can harness the benefits of LLMs while safeguarding our data, systems, and society from the shadows of cyber threats.
References
- https://genai.owasp.org/llm-top-10/
- https://genai.owasp.org/resource/llm-applications-cybersecurity-and-governance-checklist-english/
- https://embracethered.com/blog/posts/2023/chatgpt-cross-plugin-request-forgery-and-prompt-injection./
- https://aivillage.org/large language models/threat-modeling-llm/
- [2306.05499] Prompt Injection attack against LLM-integrated Applications (arxiv.org) - https://arxiv.org/abs/2306.05499
- https://atlas.mitre.org/studies/AML.CS0009/
- https://www.computer.org/csdl/proceedings-article/sp/2023/933600a432/1OXGUZDR5QI
- https://sourcegraph.com/blog/security-update-august-2023
- https://embracethered.com/blog/posts/2023/chatgpt-plugin-vulns-chat-with-code/
- https://digital-strategy.ec.europa.eu/en/library/assessment-list-trustworthy-artificial-intelligence-altai-self-assessment