U.K. and U.S. has, along with international partners from 16 other countries, released new guidelines for the development of safe artificial intelligence (AI) systems.
The US Cybersecurity and Infrastructure Security Agency (CISA) said, “This approach prioritizes ownership of security outcomes for customers, embraces fundamental transparency and accountability, and establishes organizational structures where secure design is the highest priority.”
The goal is to raise the cybersecurity level of AI and help ensure that the technology is designed, developed, and deployed safely, the National Cyber Security Center (NCSC) said.
The guidelines also build upon the U.S. government’s ongoing efforts to manage the risks posed by AI by ensuring that new tools are tested adequately before public release, there are guardrails in place to address societal harms, such as bias and discrimination, and privacy concerns, and setting up robust methods for consumers to identify AI-generated material.
The commitments also require companies to commit to facilitating third-party discovery and reporting of vulnerabilities in their AI systems through bug bounty systems so that they can be found and fixed faster.
The latest guidelines “help developers ensure that cyber security is both an essential prerequisite of AI system safety and integral to the development process from the outset and throughout, known as a ‘secure by design’ approach,” NCSC said.
This encompasses secure design, secure development, secure deployment, and secure operation and maintenance, covering all significant areas within the AI system development life cycle, requiring that organizations model the threats to their systems as well as safeguard their supply chains and infrastructure.
The aim, the agencies noted, is to also combat adversarial attacks targeting AI and machine learning (ML) systems that aim to cause unintended behavior in various ways, including affecting a model’s classification, allowing users to perform unauthorized actions, and extracting sensitive information.
“There are many ways to achieve these effects, such as prompt injection attacks in the large language model (LLM) domain, or deliberately corrupting the training data or user feedback (known as ‘data poisoning’),” NCSC noted.