AWS Security Blog
Customize federated sign-in with new Amazon Cognito Lambda trigger 4 June 2026 at 17:49

Customize federated sign-in with new Amazon Cognito Lambda trigger

4 June 2026 at 17:49

You can use Amazon Cognito user pools to add sign-up and sign-in functionality to your web and mobile applications. You can authenticate users directly with Amazon Cognito managed accounts using passwords, passwordless flows, or custom authentication flows, or let users federate in through external identity providers (IdP) using SAML, OpenID Connect, or social providers such as Google, Facebook, Sign in with Apple, or Login with Amazon. For consumers, identity federation means fewer passwords to remember and a smoother sign-in experience. For business-to-business (B2B) software as a service (SaaS) providers, it means your tenants’ organizations keep control of their own identities rather than managing credentials on their behalf. But federation can also introduce challenges for enterprises and application developers. What happens when your enterprise customer’s SAML provider sends hundreds of group memberships that exceed attribute size limits? Or when your ecommerce customer forgets they already have an account and tries to sign in with a different social provider, creating duplicate records?

In this blog post, I introduce the inbound federation Lambda trigger for Amazon Cognito, a new feature that gives you programmatic control over federated authentication flows. This AWS Lambda trigger intercepts the federated authentication response immediately after your external identity provider responds to Cognito, so you can transform, filter, and enrich user attributes before the user profile is created and user attributes are mapped in your user pool.

Understanding the inbound federation Lambda trigger

The inbound federation Lambda trigger is invoked after your Amazon Cognito user pool has received and verified the response from the external IdP. The request payload for the federated IdP response is then sent from Cognito to your Lambda function and you will receive the following information:

The common parameters of Amazon Cognito Lambda triggers (including userPoolId and clientId)
Which external IdP was used (for example, providerName)
The providerType (SAML, OIDC, Login with Amazon, and so on)
Attribute data from the external IdP specific to the user signing in

The specific format of this attribute data depends on the provider type, view the Inbound federation Lambda trigger parameters section in the docs to learn more. If the external IdP is a SAML provider, you will receive a JSON key-pair listing of the user’s attributes from the IdP assertion. If the external IdP is an OIDC provider (or social provider), you will receive the access token and attribute data from the /userinfo endpoint, along with an ID token if one was provided. See Figure 1 for a detailed flow of a federated sign-in with an Amazon Cognito user pool configured to use the inbound federation Lambda trigger.

Figure 1: Sequence flow of a federated login configured with the inbound federation Lambda trigger

The user begins using the application but is required to sign in first.
The managed login is rendered, and the user can select which IdP they want to sign in with. If identifiers are used with SAML or OIDC providers, the user enters their email address and Amazon Cognito looks up the domain of their provided email and routes them to the appropriate IdP.
Alternatively, the managed login can be bypassed by the client providing the identity_provider request parameter.
Amazon Cognito sends the authentication request to the appropriate IdP.
The external IdP challenges the user to sign in.
The user completes the sign-in process required by the external identity provider.
The challenge response is sent to the external IdP.
The IdP verifies that the sign-in is successful. If there are any subsequent challenges, such as multi-factor authentication (MFA), additional rounds of authentication challenges and responses take place. This is determined by the configuration and settings of the external IdP.
The external IdP sends a response to the Amazon Cognito user pool, and Cognito validates the cryptographic signature and that it hasn’t been tampered with.
Amazon Cognito sends attribute data from the IdP to the inbound federation Lambda function
Attribute data for the authenticated user and the common parameters for Amazon Cognito are available for the Lambda function to add, modify, or suppress according to your requirements.
Your added, modified, or suppressed attributes are returned to Amazon Cognito. These are attribute values that map to the user’s profile in Cognito—whether the user profile was just created or is being updated for a returning user.
Continuing the OAuth 2.0 authorization code grant, Amazon Cognito sends an authorization code to the client.
The client then calls the /token endpoint with the authorization code.
Note: It’s a security best practice to use confidential clients and to use OAuth 2.0 Proof Key for Code Exchange (PKCE) extension whenever possible.
An access, ID, and refresh token is returned to the client.
The user has signed into the application. ID tokens can be used to identify who the user is (authentication), and access tokens can be used to determine what the user can do (authorization).

Common federation challenges and use cases

Federation introduces complexity that varies depending on your use case. For B2B and SaaS applications, you’re often not in control of your customers’ IdPs, including what attributes they send or how they format them. As an example, an enterprise customer will configure their SAML response to include every group a user belongs to. This could be hundreds of groups or long group identifiers, and if the group membership of the user is mapped to an Amazon Cognito attribute, this can lead to a scenario where the Cognito attribute size limit is exceeded, causing federated sign-ins to fail.

Challenges for business-to-customer (B2C) applications can differ from B2B use cases. For B2C applications, organizations shouldn’t be required to think about identity providers. The ability to sign-up and sign-in should be seamless for consumer-facing applications. Customers visiting a consumer-facing application might create an account with email and password, forget they created created it, and then later try signing in with Facebook (or other social provider). Without proper account linking in Amazon Cognito, you then have multiple user records for the same user, which could lead to fragmented purchase history and a frustrating customer experience.

Both B2B and B2C use cases might need to look up external data just prior to completing the sign-in process, such as additional roles and access for B2B users or looking up active orders for B2C users. Another example could be the need to normalize data just prior to storing it in the user profile within the Amazon Cognito user pool or even discarding personally identifiable information (PII) prior to storing it in your Cognito user pool.

With the inbound federation Lambda trigger, you can handle these B2B and B2C use cases programmatically, and do so without requiring modification of your applications or coordinating IdP-specific changes with external IdPs. In this section, I dive deeper into two common use cases: oversized group attributes, common with B2B customers, and automated account linking, common with B2C customers.

Use case 1: Filtering oversized group attributes

If you have B2B and SaaS use cases, it’s a common practice to use group membership from the IdP to determine the level of access you have within the SaaS service. This is a great way to still provide some access control back to the enterprise customers themselves. The groups can be used to represent the roles a user will have or for some form of coarse-grained authorization. However, your customers might inadvertently send a large number of groups a user is a member of, thus leading to an oversized attribute payload.

Another common scenario is where the syntax and format of group name a user belongs to can arrive in various formats across different IdPs; such as a canonical name (for example, example.com/groups/myApp-readOnly), a distinguished name (common with LDAP based systems and such as cn=myApp-readOnly,OU=groups,DC=example,DC=com), or a plain text string (such as myApp-readOnly). Instead of having downstream authorization logic to accommodate different variations of a group name, you can now normalize how groups are represented prior to storing the user’s attribute data using the inbound federation Lambda trigger.

To expand this, imagine your enterprise customer uses a SAML IdP, such as Active Directory Federation Services (AD FS), in front of Active Directory (AD). When their users authenticate, AD FS sends a groups attribute containing every AD group the user belongs to. For users in large organizations, this can be hundreds of groups, and the attribute is mapped to an Amazon Cognito attribute, this could result in a string that exceeds 2,048-character limit per attribute of Cognito. Authentication would fail in this scenario, ultimately leading to support tickets because enterprise customers would be unable to sign in. Even if certain users didn’t exceed this limit, because of a smaller number of group memberships, this would result in the collection and storing of unnecessary data in your Cognito user pool.

Previously, you would need to work with your customer’s IT department to modify their SAML configuration to filter groups at the source—a process that could take weeks and require multiple approval cycles because it involves a change to the federation configuration. Especially for SaaS customers, this isn’t a scalable approach because you could integrate with hundreds of external IdPs. With the inbound federation Lambda trigger, you can solve this by filtering the groups to only those relevant to your application and normalizing the nomenclature of these groups. The following Lambda function filters the groups attribute to include only groups relevant to your application and normalizes the names of groups.

// Configure the group prefix to filter on (e.g. "App1-", "myApp-", etc.)
// Change this to match the prefix your IdP uses for relevant group names.
const GROUP_PREFIX = process.env.GROUP_PREFIX || 'myApp-';

// The SAML attribute/claim name that contains group membership.
// Common values: "groups", "memberOf", "http://schemas.xmlsoap.org/claims/Group", etc.
const GROUP_ATTRIBUTE = process.env.GROUP_ATTRIBUTE || 'groups';

/**
 * Extracts the short group name from common IdP formats:
 *   - Plain text:       "myApp-readOnly"
 *   - Leading slash:    "/myApp-readOnly"
 *   - Canonical/URL:    "example.com/groups/myApp-readOnly"
 *   - Distinguished name (DN): "cn=myApp-readOnly,OU=groups,DC=example,DC=com"
 * Returns the last meaningful segment so all formats normalize to "myApp-readOnly".
 */

function extractGroupName(raw) {
  let name = raw.trim();

  // Some IdPs prefix group names with "/" to indicate a top level group — strip it before format detection
  if (name.startsWith('/')) {
    name = name.substring(1);
  }

  // DN format — extract the CN (common name) value
  if (/^cn=/i.test(name) || /,\s*(ou|dc)=/i.test(name)) {
    const cnMatch = name.match(/^cn=([^,]+)/i);
    return cnMatch ? cnMatch[1].trim() : name;
  }

  // URL / path format — take the last segment after the final "/"
  if (name.includes('/')) {
    const segments = name.split('/').filter(Boolean);
    return segments[segments.length - 1];
  }

  return name;
}
export const handler = async (event) => {
  try {
    console.log('Full event:', JSON.stringify(event, null, 2));
    console.log('Provider type:', event.request?.providerType);

    // Initialize the response structure
    event.response = event.response || {};

    if (event.request?.providerType?.toLowerCase() === "saml") {
      const samlResponse = event.request.attributes?.samlResponse;

      if (samlResponse) {
        console.log('Original SAML Attributes:', JSON.stringify(samlResponse, null, 2));

        // Build the attribute map — you MUST include every attribute you want Cognito to retain. Anything omitted from userAttributesToMap is dropped.
        const mappedAttributes = {};

        Object.keys(samlResponse).forEach(key => {
          if (key === GROUP_ATTRIBUTE) {
            // Parse the groups JSON string from the SAML assertion
            let groupsArray = [];
            try {
              groupsArray = JSON.parse(samlResponse[GROUP_ATTRIBUTE]);
            } catch (error) {
              console.error(`Error parsing ${GROUP_ATTRIBUTE}:`, error);
            }

            // Normalize each group name, then filter to the configured prefix
            const normalizedGroups = groupsArray.map(extractGroupName);
            const filteredGroups = normalizedGroups.filter(group =>
              group.startsWith(GROUP_PREFIX)
            );

            console.log(`Original ${GROUP_ATTRIBUTE}:`, groupsArray);
            console.log(`Normalized ${GROUP_ATTRIBUTE}:`, normalizedGroups);
            console.log(`Filtered ${GROUP_ATTRIBUTE}:`, filteredGroups);

            // Only include the groups attribute if there are matching groups
            if (filteredGroups.length > 0) {
              mappedAttributes[GROUP_ATTRIBUTE] = filteredGroups.map(group => `'${group}'`).join(', ');
            }
          } else {
            // Pass all other SAML attributes through unchanged
            mappedAttributes[key] = samlResponse[key];
          }
        });

        event.response.userAttributesToMap = mappedAttributes;
        console.log('Response to Cognito:', JSON.stringify(event.response, null, 2));
      }
    }

    // For any unhandled provider type (or missing samlResponse), this intentionally does NOT set userAttributesToMap and tells Cognito to keep all original IdP attributes unchanged (no-op).

    // To handle OIDC or social providers, add additional logic here using event.request.attributes.idToken, .userInfo, and/or .tokenResponse.

    return event;
  } catch (error) {
    console.error('Error in Lambda:', error);
    throw error;
  }
};

This approach reduces a large group list to only what is applicable to your application. Authentication succeeds, and you maintain control over your user pool’s data without depending on external configuration changes.

Use case 2: Automatic account linking

The second use case addresses a challenge that’s particularly common in B2C facing ecommerce or any consumer-facing applications; although it can also be applicable to B2B scenarios. Imagine you’re running an online retail store. A customer creates an account with their email and password to make a purchase. A few months later, they return to your site but forgot they already created an account and they see the Login with Amazon button and decide to sign in this way. Without account linking, Amazon Cognito creates a new federated user because these are technically distinct accounts, and now this customer has two separate accounts with different purchase histories and saved preferences.

This fragmentation creates a poor customer experience and complicates your business operations. You can’t see the customer’s complete purchase history, loyalty points are split across accounts, and your analytics show two distinct customers instead of one.

The inbound federation Lambda trigger can be used to solve this by automatically linking federated identities to existing local accounts based on email address. While account linking can also be implemented in a pre-sign-up Lambda trigger, the inbound federation trigger runs on every federated sign-in, not just the first, giving you access to the latest IdP attributes and the ability to apply linking logic continuously rather than only at initial account creation. If no local Amazon Cognito account exists, you can create one and then link the social provider account to it. The local account can serve as the primary identity, ensuring consistent JSON Web Tokens (JWTs) regardless of how the user signs in. The following is an example of an inbound federation Lambda trigger that can help address this use case.

import { 
  CognitoIdentityProviderClient, 
  ListUsersCommand,
  AdminCreateUserCommand,
  AdminLinkProviderForUserCommand
} from "@aws-sdk/client-cognito-identity-provider";

const client = new CognitoIdentityProviderClient();

export const handler = async (event) => {
  try {
    console.log('Full event:', JSON.stringify(event, null, 2));
    
    const { userPoolId, request, userName } = event;
    const { providerName, providerType, attributes } = request;
    
    // Extract email and profile attributes based on provider type
    const { email, givenName, surname } = extractAttributes(providerType, attributes);
    
    if (!email) {
      console.error('No email found in federated response');
      return event;
    }
    
    console.log(`Processing federated login for email: ${email}, provider: ${providerName} (${providerType})`);
    
    // Check if a local user exists with this email
    const existingUser = await findLocalUserByEmail(userPoolId, email);
    
    if (existingUser) {
      console.log(`Found existing local user: ${existingUser.Username}`);
      if (isAlreadyLinked(existingUser, providerName, userName)) {
        console.log(`Federated identity ${providerName}:${userName} is already linked to ${existingUser.Username}, skipping link`);
      } else {
        await linkFederatedUser(userPoolId, existingUser.Username, providerName, userName);
      }
    } else {
      console.log('No existing local user found, creating new one');
      const newUsername = await createLocalUser(userPoolId, email, givenName, surname);
      await linkFederatedUser(userPoolId, newUsername, providerName, userName);
    }
    
    return event;
    
  } catch (error) {
    console.error('Error in account linking Lambda:', error);
    throw error;
  }
};


/**
 * Check if the federated identity is already linked to the local user by inspecting the identities attribute from the ListUsers response.
 */
function isAlreadyLinked(user, providerName, federatedUsername) {
  const identities = user.Attributes?.find(a => a.Name === 'identities');
  if (!identities?.Value) return false;

  try {
    const parsed = JSON.parse(identities.Value);
    return parsed.some(id => id.providerName === providerName && id.userId === federatedUsername);
  } catch {
    return false;
  }
}

/**
 * Extract email and profile attributes based on provider type.
 * - SAML: attributes come from samlResponse
 * - OIDC/Social: attributes come from userInfo, falling back to idToken (if one exists)
 */
function extractAttributes(providerType, attributes) {
  if (providerType?.toLowerCase() === 'saml') {
    const saml = attributes?.samlResponse;
    return {
      email: saml?.email || null,
      givenName: saml?.givenName || '',
      surname: saml?.surname || ''
    };
  }

  // OIDC and social providers: prefer userInfo, fall back to idToken
  const userInfo = attributes?.userInfo;
  const idToken = attributes?.idToken;

  const source = userInfo?.email ? userInfo : idToken;

  return {
    email: source?.email || null,
    givenName: source?.given_name || '',
    surname: source?.family_name || ''
  };
}

/**
 * Find a local Cognito user (not EXTERNAL_PROVIDER) by email address.
 */
async function findLocalUserByEmail(userPoolId, email) {
  try {
    const command = new ListUsersCommand({
      UserPoolId: userPoolId,
      Filter: `email = "${email}"`
    });
    
    const response = await client.send(command);
    console.log('ListUsers response:', JSON.stringify(response, null, 2));
    
    if (!response.Users || response.Users.length === 0) {
      return null;
    }

    // Find the first user that is a true local account (not a federated-only profile)
    const localUser = response.Users.find(u => u.UserStatus !== 'EXTERNAL_PROVIDER');
    return localUser || null;
  } catch (error) {
    console.error('Error finding user by email:', error);
    throw error;
  }
}

/**
 * Create a new local Cognito user without a password.
 * With passwordless (email OTP) enabled on the user pool, the user is created with UserStatus=CONFIRMED and no FORCE_CHANGE_PASSWORD state.
 */
async function createLocalUser(userPoolId, email, givenName, surname) {
  try {
    const userAttributes = [
      { Name: 'email', Value: email }
    ];

    if (givenName) userAttributes.push({ Name: 'given_name', Value: givenName });
    if (surname) userAttributes.push({ Name: 'family_name', Value: surname });

    const command = new AdminCreateUserCommand({
      UserPoolId: userPoolId,
      Username: email,
      UserAttributes: userAttributes,
      MessageAction: 'SUPPRESS'
    });
    
    const response = await client.send(command);
    console.log(`Created local user: ${email}`, JSON.stringify(response, null, 2));
    
    return email;
  } catch (error) {
    console.error('Error creating local user:', error);
    throw error;
  }
}

/**
 * Link a federated user identity to a local Cognito user.
 * The local user becomes the primary profile — all future JWTs will represent this local user regardless of sign-in method.
 */
async function linkFederatedUser(userPoolId, localUsername, providerName, federatedUsername) {
  try {
    const command = new AdminLinkProviderForUserCommand({
      UserPoolId: userPoolId,
      DestinationUser: {
        ProviderName: 'Cognito',
        ProviderAttributeValue: localUsername
      },
      SourceUser: {
        ProviderName: providerName,
        ProviderAttributeName: 'Cognito_Subject',
        ProviderAttributeValue: federatedUsername
      }
    });
    
    const response = await client.send(command);
    console.log(`Linked federated user ${federatedUsername} to local user ${localUsername}`);
    console.log('Link response:', JSON.stringify(response, null, 2));
    
    return response;
  } catch (error) {
    if (error.name === 'AliasExistsException' || error.message?.includes('already linked')) {
      console.log(`User already linked: ${error.message}`);
      return;
    }
    console.error('Error linking federated user:', error);
    throw error;
  }
}

Every federated sign-in will invoke the inbound federation Lambda trigger, and the logic is straightforward. When a user authenticates with an external identity provider, the trigger extracts their email from the federated response and searches the user pool for a local Cognito account with that same email. If one exists—such as if the user originally signed up with email and password—the Lambda function links the federated identity to that existing local account. If no local account exists, the trigger creates one on the fly as a passwordless account (confirmed, suppressing any emails, and ready for passwordless email one-time passcode (OTP) sign-in), then links the federated identity to it. In both cases, the local account is set as the primary profile. This means the user’s JWTs always carry the same sub-claim regardless of how they sign in—directly, or through Google, Facebook, or SAML—your application sees one consistent identity. The preceding Lambda trigger is also smart enough to check whether a linked account already exists before making the call, so returning users who’ve already been linked don’t generate unnecessary API calls. And because the local account supports passwordless authentication, a user who first arrived through federation can later sign in directly with an emailed OTP—or even add a password later through your applications account settings. The local account is always the anchor.

Best practices

As you implement these patterns, keep a few best practices in mind. Your Lambda function must be completed within 5 seconds, so optimize for speed to help ensure the federated sign-in process is able to successfully complete. If you’re making external calls within the inbound federation Lambda function, like Amazon DynamoDB queries or API requests, implement caching where possible. Handle errors gracefully—if your Lambda function throws an exception or an error, authentication could fail for the user. Consider logging the error and returning the original event back to Amazon Cognito rather than failing authentication for a legitimate user attempting to sign in. Here are some additional best practices for working with Lambda functions.

For the account linking use case, automatic linking relies on matching the email from the federated identity to a local account. However, there are scenarios where this match won’t exist. For example, Apple’s Hide My Email feature generates a unique alias for each app, so the federated email won’t match any existing local account. This is an effective privacy feature but it also blocks the ability to automatically link accounts. In cases like these, your application will need to implement a user-initiated account linking flow, such as prompting the user to verify ownership of both email addresses before calling the AdminLinkProviderForUser API to complete the link.

Monitor your Lambda function performance using Amazon CloudWatch metrics. Set up alarms for errors, timeouts, and throttling so you can respond quickly if issues arise. I also recommend capturing sample event payloads from a CloudWatch log group during your initial development and deployment—these will be valuable for local testing and debugging which can lead to quicker resolution if issues arise in your production environment. This is especially important as different IdPs (namely SAML and OIDC providers) may respond with varying attribute and value syntaxes. Consider implementing CloudWatch alarms to alert your security and operational teams if authentication failures spike, which could indicate an attempted attack, misconfiguration, or provide insight into further optimization of your inbound federation Lambda trigger.

Conclusion

In this post, you learned about the new inbound federation Lambda trigger for Amazon Cognito and how it can solve various use cases. You walked through two common federation challenges and reviewed some sample code to help resolve those challenges. For B2B and SaaS applications, the inbound federation Lambda trigger gives you control when dealing with oversized attributes from external identity providers (such as group membership) without requiring coordination with enterprise IT teams. For B2C and consumer-facing applications, it enables seamless account linking across multiple authentication methods, creating a unified customer experience.

The new Lambda trigger works with SAML, OIDC, and supported social providers, and is available now in AWS Regions where Amazon Cognito is available. To learn more about the new Lambda trigger and others, see the Amazon Cognito Developer Guide.

What federation challenges are you facing in your applications? I’d love to hear about your use cases in the comments below and over at AWS re:Post.

AWS Security Blog
Identify unused AWS KMS keys and prevent accidental key deletions 2 June 2026 at 21:01

Identify unused AWS KMS keys and prevent accidental key deletions

AWS Security Blog

By: Andrea Rossi

2 June 2026 at 21:01

As you scale your use of Amazon Web Services (AWS), managing KMS keys becomes increasingly important. Whether you manage a handful of keys or thousands across multiple AWS accounts and AWS Regions, there’s often a need to audit key usage to help you meet compliance requirements, evaluate your risk posture, and optimize key management costs. However, determining which keys are actively in use and which have been sitting idle can be a time consuming and complex task.

To help with this, AWS Key Management Service (AWS KMS) has launched the GetKeyLastUsage API, a new feature that you can use to quickly determine when each key was last used for a cryptographic operation, significantly enhancing your audit capabilities and key lifecycle management. For more information, see Determine past usage of a KMS key.

Before this launch, the primary way to audit key usage was through AWS CloudTrail logs. CloudTrail captures every cryptographic operation by default, so the data is available. The difficulty is turning that data into actionable insight. You need to identify which keys to examine, query the right logs, and repeat that process frequently enough to maintain an accurate view. For the most recent 90 days, CloudTrail event history makes this manageable. Beyond that, you need to create a dedicated trail to deliver logs to Amazon Simple Storage Service (Amazon S3) for long-term retention, then query those logs using tools such as Amazon Athena to determine when a key was last used.

Determine when a key was last used

AWS KMS now provides a direct way to see when a key was last used for cryptographic operations. You can also see this information using the AWS Management Console for AWS KMS and the AWS Command Line Interface (AWS CLI).

The GetKeyLastUsage API returns the date and time of the most recent cryptographic operation performed with a KMS key, without requiring you to search through CloudTrail logs. The API returns the date and time of the last key operation, the type of operation performed, CloudTrail event ID, and KMS request ID. You can access this information for all customer-managed keys and AWS managed keys irrespective of key spec, key origin, key store, or key usage type.

In addition, you can restrict a key from being disabled or scheduled for deletion if it was recently used, by incorporating this usage information as a condition within the KMS key policy. See the Preventing accidental key deletion with policy controls section for implementation details.

About the tracking period

One of the important concepts you must understand before relying on the last usage information reported on a KMS key is the tracking period. The tracking period is the date from which AWS KMS began tracking cryptographic activity for the key. Tracking began on April 23, 2026, for most AWS Regions. Understanding the tracking period is critical because it determines whether the absence of usage information means a key has never been used or only hasn’t been used since tracking started.

For example, if you have a key created on January 1, 2026, and you check its usage, any cryptographic operations that occurred between January 1 and April 22 wouldn’t be captured in the usage information. Thus, you can’t conclude that it’s never been used, because it might have been used in the months before tracking began.

Getting started

There’s nothing to enable or additional configuration required to view usage information on last cryptographic operation performed on your KMS keys.

To view KMS key usage:

Go to the AWS KMS console and choose Customer-managed keys in the navigation pane and select a key. Look for Last used on the general configuration.

Figure 1: KMS key general configuration page
Choose the link under Last used to see additional details such as Timestamp, Operation, and the CloudTrail event ID.

Figure 2: View last used details including timestamp, operation, and event ID
The Last used column is also shown when you attempt to schedule key deletion, so that you can make informed decisions.

Figure 3: Scheduled key deletion warning

API reference

See the following examples for ideas on how to use the GetKeyLastUsage API to better understand KMS key usage.

Use case 1: Cost optimization through unused key cleanup

If you manage thousands of AWS KMS keys distributed across multiple AWS accounts, you might have keys that have remained unused since creation or keys that are no longer needed. By cleaning up these keys, you can reduce operational costs and minimize your security footprint. However, without visibility into which keys are actively performing cryptographic operations, it can be difficult to distinguish between keys protecting critical workloads and those that can be safely decommissioned.

Note that there are some precautions that you should take before scheduling key deletion. While the last usage information can help identify unused keys, it shouldn’t be the only factor in deciding whether to delete or disable a key. The last usage information tells you when a key was last used, not whether it will be needed in the future. A key might be unused for months but still required to decrypt files, for compliance scenarios or disaster recovery as shown in figure 4.

When you identify a potentially unused key, first disable it using DisableKey and monitor your applications and services for any encryption or decryption failures.

Figure 4: A use case where GetKeyLastUsage doesn’t accurately reflect whether a KMS key is still required

As an example, Amazon EBS volumes only interact with KMS keys during specific lifecycle events like volume creation, attachment, and detachment. After a volume is attached to an Amazon Elastic Compute Cloud (Amazon EC2) instance, the plaintext data encryption key is cached in the Nitro Card hardware, and all subsequent read/write operations use this cached key without any further AWS KMS API calls. This means a production volume running continuously for months or years will show no KMS activity during that entire period. However, the volume remains completely dependent on that KMS key for any future operations like instance restarts, volume reattachments, or disaster recovery scenarios. If someone deletes the KMS key, the encrypted data key stored with the volume can never be decrypted again, making the volume’s data permanently and irreversibly inaccessible. Before deleting any KMS key, you must verify it has no associated EBS volumes or snapshots, regardless of how long ago the last KMS API call occurred.

AWS provides a mechanism where you can create a CloudWatch alarm that notifies you if a key pending deletion is being accessed, giving you an opportunity to cancel the deletion before data becomes inaccessible.

Solution with GetKeyLastUsage API

Here’s a sample script that scans all customer-managed keys in an account and retrieves each key’s last usage date through the GetKeyLastUsage API. It accepts two optional inputs: a threshold in days and an AWS Region. The script filters and displays only keys that haven’t been used within the specified period, presenting results in a table with the key name, AWS account ID, AWS Region, and last usage date. This can help you identify unused encryption keys.

The following is an example to scan all keys that haven’t been used in the last 180 days in the us-east-1 Region:

./script.sh 180 us-east-1
#!/bin/bash
DAYS=${1:-90}
REGION=${2:-$(aws configure get region)}
CUTOFF=$(date -v-${DAYS}d +%s 2>/dev/null || date -d "-${DAYS} days" +%s)
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
printf "Showing keys not used in the last %s days (Region: %s)\n\n" "$DAYS" "$REGION"
printf "%-50s %-15s %-20s %-15s\n" "Key Name" "Account ID" "Region" "Last Usage Date"
printf "%.0s-" {1..100}
printf "\n"
for key_id in $(aws kms list-keys --region $REGION --query 'Keys[*].KeyId' --output text); do
key_manager=$(aws kms describe-key --region $REGION --key-id $key_id --query 'KeyMetadata.KeyManager' --output text)
if [ "$key_manager" = "CUSTOMER" ]; then
last_usage=$(aws kms get-key-last-usage --region $REGION --key-id $key_id)
timestamp=$(echo $last_usage | jq -r '.KeyLastUsage.TimeStamp // empty')
if [ -z "$timestamp" ]; then
last_epoch=0
else
last_epoch=$(date -jf "%Y-%m-%dT%H:%M:%S" "$(echo $timestamp | cut -d. -f1)" +%s 2>/dev/null || date -d "$timestamp" +%s)
fi
if [ "$last_epoch" -lt "$CUTOFF" ]; then
key_alias=$(aws kms list-aliases --region $REGION --key-id $key_id --query 'Aliases[0].AliasName' --output text)
key_name=${key_alias:-$key_id}
[ "$key_name" = "None" ] && key_name=$key_id
if [ -z "$timestamp" ]; then
tracking_date=$(echo $last_usage | jq -r '.TrackingStartDate' | cut -d'T' -f1)
last_used="${tracking_date}*"
else
last_used=$(echo $timestamp | cut -d'T' -f1)
fi
printf "%-50s %-15s %-20s %-15s\n" "$key_name" "$ACCOUNT_ID" "$REGION" "$last_used"
fi
fi
done
printf "\n* = No operations performed since tracking started\n"

Use case 2: Preventing accidental key deletion with policy controls

Organizations frequently face the risk of accidental key deletions, which can have severe operational consequences. Despite precautions and safety measures, accidents can happen. A key might be deleted because someone believes it’s no longer in use, only to discover that critical applications or workloads depend on it. This results in data access failures, application downtime, and emergency recovery procedures. Without visibility into recent key usage, teams lack the information needed to make safe disable decisions or implement effective safeguards.

Solution with policy based controls

To prevent KMS keys from being accidentally Disabled or Deleted use the kms:TrailingDaysWithoutKeyUsage condition key in key policies to automatically block deletion or disabling of recently used keys:

Open the AWS KMS console and choose Customer managed keys in the navigation pane.
Select the key you want to protect.
In the Key policy tab, choose Edit.
In the policy editor, add the following statement:

{
  "Sid": "PreventDeletionOfRecentlyUsedKeys",
  "Effect": "Deny",
  "Principal": "*",
  "Action": [
    "kms:ScheduleKeyDeletion",
    "kms:DisableKey"
  ],
  "Resource": "*",
  "Condition": {
    "NumericLessThanEquals": {
      "kms:TrailingDaysWithoutKeyUsage": "365"
    }
  }
}

Choose Save changes.

The policy prevents deletion or disabling a key if it was used within the past 365 days. You can adjust the threshold to match your organization’s requirements. For more information about the condition key, see kms:TrailingDaysWithoutKeyUsage.

Important considerations

When reviewing key usage for possible deletion, consider the following:

Key deletion is irreversible and makes encrypted data unrecoverable. AWS enforces a 7–30 day waiting period. During this time, monitor usage attempts and cancel the deletion if necessary. Delete a key only if you’re certain that no data has been encrypted or will be encrypted with it. Consider disabling the key first to test the impact of unavailable keys.
CloudTrail remains authoritative because it provides the full audit trail. GetKeyLastUsage quickly tells you when and what operations occurred, but CloudTrail shows you who made the request and with what parameters. Learn more about logging KMS API calls with CloudTrail.

Conclusion

The GetKeyLastUsage API enhances your KMS key management capabilities by providing immediate access to usage data that was previously only present in CloudTrail logs. Start by opening the AWS KMS console and checking the Last used field for any customer-managed keys and AWS managed keys to see this information in action. For broader key auditing, integrate the API into your existing automation scripts using the AWS CLI examples provided.

If you have feedback about this post, submit comments in the Comments section below.

AWS Security Blog
Simplifying policy management with URL and Domain Category filtering on AWS Network Firewall 28 May 2026 at 20:57

Simplifying policy management with URL and Domain Category filtering on AWS Network Firewall

AWS Security Blog

By: Lawton Pittenger

28 May 2026 at 20:57

Network administrators face a persistent challenge: maintaining domain blocklists and allowlists that keep pace with the internet. New websites and services emerge daily, and keeping these lists current requires constant manual updates that leave gaps in coverage. This challenge intensifies when managing access to rapidly evolving categories like AI services, where new tools launch on a regular basis.

AWS Network Firewall is a managed, stateful network firewall and intrusion detection and prevention service for fine-grained control of your virtual private cloud (VPC) network traffic. With URL and domain category filtering, security teams can use predefined categories to control access instead of managing individual domains. AWS-managed URL and domain categories stay current automatically as new domains are registered, removing the need for manual list maintenance.

This feature is especially useful for organizations navigating AI governance. Instead of manually tracking every new AI service, you can control access to the entire Artificial Intelligence and Machine Learning category while creating exceptions for approved services. The same approach works for social media, streaming sites, gambling, and dozens of other categories, all with built-in audit trails for compliance reporting.

In this post, we walk through URL and domain category filtering configurations for AWS Network Firewall, from basic rules to exception handling and monitoring strategies that give you visibility into how your workloads interact with external services.

Streamlined policy management with predefined categories

With URL and domain category filtering, you control website access using predefined categories instead of individually specifying sites in a domain list rule group. You can select from AWS-managed categories such as Social Networking, Gambling, or Artificial Intelligence and Machine Learning to implement and maintain filtering policies. AWS keeps these categories current automatically, so you don’t need to update firewall policies when new domains are registered.

Network Firewall offers two category filtering options. Domain category filters by domain name using the TLS Server Name Indication (SNI) field, with no decryption required. URL category filters by the full URL path, which requires TLS inspection for HTTPS traffic. To keep things straightforward, this post focuses on domain category filtering. To set up URL category filtering with TLS inspection, see Creating a TLS inspection configuration in Network Firewall.

Prerequisites

To follow the steps in this post, start by making sure that you have the following prerequisites in place:

An existing Network Firewall deployment: This walkthrough assumes you have an existing Network Firewall deployment to filter egress traffic flows from your Amazon Virtual Private Cloud (Amazon VPC) in place. If you aren’t already using Network Firewall, see Getting started with AWS Network Firewall to set up your firewall before proceeding.
The HOME_NET variable set correctly at the firewall policy level: The rules in this post use the $HOME_NET variable to scope traffic to your internal network. In the AWS Management Console for Amazon VPC, select your firewall policy under the Firewall policies tab, select the Details tab, and check the policy variables section under HOME_NET variable override values. We recommend setting this to all RFC 1918 private IP address ranges: 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16. When you set $HOME_NET at the policy level, all rule groups associated with that policy inherit the value automatically. Network Firewall automatically maps $EXTERNAL_NET to the inverse of $HOME_NET, so configuring HOME_NET correctly also configures $EXTERNAL_NET.

Figure 1: Firewall policy details tab showing the HOME_NET variable override values set to RFC 1918 private IP address ranges

Create a category rule using the console rule builder

To get started quickly, you can create a domain category rule using the console’s built-in rule builder. In this example, we create a single alert rule for the Artificial Intelligence and Machine Learning category.

Open the AWS Management Console, search for and open the Amazon VPC console.
In the left navigation, scroll to Network Firewall and select Rule groups.
Choose Create rule group.
For Rule group type, select Stateful rule group.
For Rule group format, select Standard stateful rules.
For Rule evaluation order, select Strict order. Choose Next.

Figure 2: Create Network Firewall rule group page showing Stateful rule group type, Standard stateful rules format, and Strict order evaluation selected
Enter Domain-Category-Rules for the Name, Domain Category Rules for the Description, and 50 for the Capacity. Choose Next.
In the rule group editor, select the Category Matching radio button.
Under Category Matching, select Match all selected categories.
Under AWS category type, select Domain Category from the dropdown.
Under Categories, select Artificial Intelligence and Machine Learning.
For Protocol, select TLS.
For Source, select Custom, then enter $HOME_NET in the dialog box.
Set the Destination IP to Any.
For Action, select Alert.
Choose Add rule to add this rule to the rule group. Choose Next.

Figure 3: Completed category matching rule showing TLS protocol, $HOME_NET source, Any destination, and Alert action added to the rule group
Under Customer managed key, leave the default setting (Customize encryption settings should remain unchecked).
Under Add tags – optional, leave the default setting of no tags.
Choose Next, then Create rule group.

This rule generates an alert log entry each time a connection matches a domain in the Artificial Intelligence and Machine Learning category. It doesn’t block traffic. To block traffic, change the action to Drop or Reject in step 15.

Creating the same rule using Suricata compatible rule strings

The console rule builder is a quick way to get started, but we recommend using Suricata compatible rule strings for production deployments. Suricata rules give you full control over rule options, make rules straightforward to copy, edit, share, and back up, and support the majority of the Suricata engine. For more information, see Limitations and caveats for stateful rules in AWS Network Firewall.

The following walkthrough creates the same alert rule you built with the console rule builder, this time using a Suricata rule string.

In the Amazon VPC console, navigate to Network Firewall, then select Network Firewall rule groups.

Choose Create rule group.
For Rule group type, select Stateful rule group.
For Rule group format, select Suricata compatible rule string.
For Rule evaluation order, select Strict order. Choose Next.

Figure 4: Create Network Firewall rule group page showing Stateful rule group type, Suricata compatible rule string format, and Strict order evaluation selected
Enter Suricata-Domain-Category-Rules for the Name, Suricata Domain Category Rules for the Description, and 50 for the Capacity. Choose Next.
Leave the Rule variables section empty. The $HOME_NET variable is inherited from the firewall policy, as configured in the prerequisites.
Leave IP set references empty.

Paste the following rule into the Suricata compatible rule string editor:

alert tls $HOME_NET any -> $EXTERNAL_NET any (msg:"Artificial Intelligence and Machine Learning Category"; aws_domain_category:Artificial Intelligence and Machine Learning; sid:1000001;)

Choose Next.

Figure 5: Suricata compatible rule string editor with the domain category alert rule pasted in and the rule variables section left empty
Under Customer managed key, leave the default setting (Customize encryption settings should remain unchecked).
Under Add tags – optional, leave the default setting of no tags. Choose Next.
Choose Create rule group.
After creating the rule group, return to your firewall policy and add it under Stateful rule groups. We recommend associating new rule groups in a development or test environment first to validate behavior before deploying to production.

The following table explains each component of this rule:

alert	Action: generate an alert log entry when the rule matches. Other actions include pass, drop, and reject.
tls	Protocol: inspect TLS traffic, matching against the SNI field in the TLS Client Hello.
$HOME_NET any -> $EXTERNAL_NET any	Source and destination: match traffic from any internal IP address (HOME_NET) and port to any external IP address (EXTERNAL_NET) and port. The HOME_NET variable defines your internal network ranges, and the EXTERNAL_NET variable is automatically set to the inverse.
msg:”Artificial Intelligence and Machine Learning Category”	The message written to the alert log when this rule is triggered.
aws_domain_category:Artificial Intelligence and Machine Learning	The AWS-managed domain category to match against. The firewall looks up the destination domain in the category database and matches if the domain belongs to this category.
sid:1000001	A unique signature ID for this rule. Each rule in a rule group must have a unique SID.

Managing exceptions for approved services

You can manage exceptions to keep business-critical websites accessible. For example, say you need to allow access to OpenAI while blocking all other AI and ML traffic. To do this, return to the Suricata-Domain-Category-Rules rule group you created earlier and replace the basic alert rule with the following ruleset. Select the Suricata-Domain-Category-Rules rule group, under the Rules section, choose Edit.

Figure 6: Selecting Suricata-Domain-Category-Rules rule group to edit with new rules

Paste in the following rules and choose Save rule group.

# Allow OpenAI (TLS)
pass tls $HOME_NET any -> $EXTERNAL_NET any (tls.sni; dotprefix; content:".openai.com"; nocase; endswith; flow:to_server; alert; msg:"Allow OpenAI over TLS"; sid:1000001;)

# Allow OpenAI (HTTP)
pass http $HOME_NET any -> $EXTERNAL_NET any (http.host; dotprefix; content:".openai.com"; nocase; endswith; flow:to_server; alert; msg:"Allow OpenAI over HTTP"; sid:1000002;)

# Block all other AI/ML category traffic (TLS)
reject tls $HOME_NET any -> $EXTERNAL_NET any (msg:"Block non-approved AI/ML sites over TLS"; aws_domain_category:Artificial Intelligence and Machine Learning; flow:to_server; alert; sid:1000003;)

# Block all other AI/ML category traffic (HTTP)
reject http $HOME_NET any -> $EXTERNAL_NET any (msg:"Block non-approved AI/ML sites over HTTP"; aws_url_category:Artificial Intelligence and Machine Learning; flow:to_server; alert; sid:1000004;)

Figure 7: Suricata compatible rule string editor with the exception-based ruleset containing pass rules for OpenAI and reject rules for the AI/ML category

With strict order evaluation, the firewall evaluates rules in the order you define them. The pass rules for OpenAI appear first, so matching traffic is allowed before the broader category block rules run.

To verify the rules are working as expected, test from a host that routes traffic through your network firewall. These commands suppress the response body and check the exit code of the curl request. If curl completes a TCP connection, it prints CONNECTION ALLOWED. If the firewall resets the connection, curl exits with a non-zero code and prints CONNECTION BLOCKED.

A request to openai.com should succeed because it matches the pass rule:

curl -s -o /dev/null https://openai.com && echo "CONNECTION ALLOWED" || echo "CONNECTION BLOCKED"

Result: CONNECTION ALLOWED

A request to chat.mistral.ai should be rejected because it matches the broader AI/ML category block rule:

curl -s -o /dev/null https://chat.mistral.ai && echo "CONNECTION ALLOWED" || echo "CONNECTION BLOCKED"

Result: CONNECTION BLOCKED

How to monitor category usage

When you add a domain category rule to your firewall policy, Network Firewall performs a category lookup for every connection that matches the rule’s protocol and IP specifications. The rules in this post match on $HOME_NET any -> $EXTERNAL_NET any, which means the firewall looks up the category for all outbound traffic originating from your internal network. This is why it’s important to have the $HOME_NET variable configured correctly at the firewall policy level. With this configuration, a single category rule is enough for category metadata to appear in your firewall logs across all matching connections, not just connections that match the specific category in your rule.

Each log entry includes an aws_category field containing a JSON array of all categories the destination domain belongs to. A single domain can map to multiple categories. For example, a request to chat.mistral.ai produces a log entry with “aws_category": "[\"Social Networking\",\"Artificial Intelligence and Machine Learning\"]” because that domain belongs to both categories.

You can access firewall logs through Amazon CloudWatch, Amazon Simple Storage Service (Amazon S3), and Amazon Data Firehose. These logs show which categorized websites your workloads access, helping you track usage patterns and enforce acceptable use policies.

The following sample log entry shows what a blocked request to chat.mistral.ai looks like using the exception-based rules from the previous section. The alert.signature field contains the rule’s msg value, and the aws_category field lists all categories the destination domain belongs to:

{ 

     "firewall_name": "egress-and-east-west-firewall", 

     "availability_zone": "us-east-1a", 

     "event_timestamp": "1775599146", 

     "event": { 

          "aws_category": "[\"Social Networking\",\"Artificial Intelligence and Machine Learning\"]", 

          "tx_id": 0, 

          "app_proto": "tls", 

          "src_ip": "10.1.1.100", 

          "src_port": 58664, 

          "event_type": "alert", 

          "alert": { 

                    "severity": 3, 

                    "signature_id": 1000003, 

                    "rev": 1, "signature": 

                    "Block non-approved AI/ML sites over TLS", 

                    "action": "blocked", 

                    "category": "" 

          }, 

          "flow_id": 763153567844057, 

          "dest_ip": "172.66.2.203", 

          "proto": "TCP", 

          "verdict": { 

                    "action": "drop", 

                    "reject-target": "to_client", 

                    "reject": [ 

                         "tcp-reset" 

                    ] 

          }, 

          "tls": { 

               "sni": "chat.mistral.ai", 

               "version": "UNDETERMINED" 

          }, 

          "dest_port": 443, 

          "pkt_src": "geneve encapsulation", 

          "timestamp": "2026-04-07T21:59:06.906761+0000", 

          "direction": "to_server" 

     } 

}

The aws_category field shows the domain belongs to both the “Social Networking” and “Artificial Intelligence and Machine Learning” categories. The verdict field confirms the connection was dropped with a TCP reset sent to the client.

Traffic that matches a pass rule with the alert keyword also generates a log entry with the aws_category field populated. For example, a connection to chat.openai.com that matches the OpenAI exception rule from the earlier section produces a log entry with alert.action set to “allowed” and the same category metadata. This means your queries capture both blocked and allowed traffic.

Querying logs with CloudWatch Logs Insights

If you send your firewall logs to Amazon CloudWatch Logs, you can use CloudWatch Logs Insights to analyze category traffic patterns. A single connection can generate multiple log entries (for example, a reject rule log and a default action log for the same flow), so the following queries deduplicate by flow_id to count each connection only once. Because a single domain can belong to multiple categories, results are grouped by category combination. For example, traffic to a domain categorized as both “Social Networking” and “Artificial Intelligence and Machine Learning” appears as a single combined entry.

To get started, navigate to the CloudWatch console. In the left navigation pane under Logs, select Logs Insights. Under Query scope, leave Log group name selected, then select your AWS Network Firewall alert logs log group. For the time window, we recommend starting with the default of 1 hour to keep the queries light. Enter each of the following queries into the editor and choose Run query to review the results. Note that CloudWatch Logs Insights queries incur charges based on the amount of data scanned. See Amazon CloudWatch pricing for details.

Most accessed categories

This query shows which category combinations your workloads connect to most frequently:

fields @timestamp, event.aws_category, event.flow_id
| filter ispresent(event.aws_category) and event.aws_category != "[]"
| stats latest(event.aws_category) as categories by event.flow_id
| stats count(*) as connections by categories
| sort connections desc
| limit 20

Figure 8: CloudWatch Logs Insights query results showing the most frequently accessed category combinations sorted by connection count

Least accessed categories

This query reverses the sort order to surface category combinations with the fewest connections, helping you identify categories that might not be relevant to your environment or that warrant further investigation:

fields @timestamp, event.aws_category, event.flow_id
| filter ispresent(event.aws_category) and event.aws_category != "[]"
| stats latest(event.aws_category) as categories by event.flow_id
| stats count(*) as connections by categories
| sort connections asc
| limit 20

Figure 9: CloudWatch Logs Insights query results showing the least frequently accessed category combinations sorted by connection count ascending

Most accessed categories, allowed traffic only

The event.verdict.action field indicates the actual outcome of each connection:drop for blocked traffic and alert for allowed traffic. This query shows which category combinations have the most allowed connections:

fields @timestamp, event.aws_category, event.flow_id, event.verdict.action
| filter ispresent(event.aws_category) and event.aws_category != "[]"
| stats latest(event.aws_category) as categories, latest(event.verdict.action) as verdict by event.flow_id
| filter verdict = "alert"
| stats count(*) as connections by categories
| sort connections desc
| limit 20

Figure 10: CloudWatch Logs Insights query results showing the most accessed category combinations filtered to allowed traffic only

Most accessed categories, blocked traffic only

The same query filtered to blocked connections. Change the verdict filter to drop:

fields @timestamp, event.aws_category, event.flow_id, event.verdict.action
| filter ispresent(event.aws_category) and event.aws_category != "[]"
| stats latest(event.aws_category) as categories, latest(event.verdict.action) as verdict by event.flow_id
| filter verdict = "drop"
| stats count(*) as connections by categories
| sort connections desc
| limit 20

Figure 11: CloudWatch Logs Insights query results showing the most accessed category combinations filtered to blocked traffic only

Drill down into a specific category

This query uses a like filter to find all traffic where the aws_category field contains a specific category, regardless of what other categories the domain also belongs to. In this example, the query returns all domains your workloads have connected to that map to the Artificial Intelligence and Machine Learning category, broken down by domain and verdict. Replace the category name in the like filter to investigate any category.

fields @timestamp, event.tls.sni, event.aws_category, event.verdict.action, event.flow_id
| filter ispresent(event.aws_category) and event.aws_category like /Artificial Intelligence and Machine Learning/
| stats latest(event.tls.sni) as sni, latest(event.verdict.action) as verdict by event.flow_id
| stats count(*) as connections by sni, verdict
| sort connections desc
| limit 20

Figure 12: CloudWatch Logs Insights query results showing a drill down into the Artificial Intelligence and Machine Learning category with connections broken down by domain and verdict

Bandwidth consumption by category

This query shows which category combinations consume the most egress bandwidth. It correlates flow logs (which contain byte counts) with alert logs (which contain category data) using the shared flow_id field. To run this query, select both your alert log group and your flow log group in CloudWatch Logs Insights.

fields @timestamp
| filter ispresent(event.netflow.bytes) or ispresent(event.aws_category)
| stats sum(event.netflow.bytes) as flowBytes, latest(event.aws_category) as categories by event.flow_id
| filter ispresent(categories) and categories != "[]"
| stats sum(flowBytes) as totalBytes by categories
| sort totalBytes desc
| limit 20

Figure 13: CloudWatch Logs Insights query results showing bandwidth consumption by category combination sorted by total bytes descending

These queries help you identify which categories your workloads access by volume, surface blocked and allowed traffic patterns, and pinpoint where the bulk of your egress bandwidth is going.

Conclusion

In this post, you walked through how to set up URL and domain category filtering on AWS Network Firewall, from creating your first category rule using both the console rule builder and Suricata compatible rule strings, to managing exceptions for approved services and monitoring category traffic patterns with CloudWatch Logs Insights. With AWS-managed categories that stay current automatically, you can control access to broad classes of websites without maintaining individual domain lists, and the built-in aws_category log field gives you the visibility to track how your workloads interact with external services.

This feature is available in all AWS commercial regions where AWS Network Firewall is supported.

To learn more, visit the AWS Network Firewall product page and the feature documentation.

AWS Security Blog
Well-architected best practices for software supply chain security 26 May 2026 at 19:03

Well-architected best practices for software supply chain security

AWS Security Blog

By: Trevor Schiavone

26 May 2026 at 19:03

There have been multiple notable supply chain attacks using the npm Registry since September: Shai-Hulud, Chalk/Debug, one abusing tea.xyz tokens, and recently axios. Thanks to community efforts involving the Amazon Inspector team, the Open Source Security Foundation, and others, the affected packages were quickly flagged, which reduced the impact of these incidents.

Supply chain attacks like Shai-Hulud exploit vulnerabilities on two fronts: compromised maintainer accounts that publish malicious packages, and consumer environments that download and execute those packages. The Shai-Hulud attack, shown in Figure 1, succeeded because maintainer credentials were compromised through phishing, enabling threat actors to publish malicious versions of popular packages. Incidents like these highlight the need for strong security practices within the software supply chain, and effective defense requires addressing both sides. Package maintainers need protections that prevent account compromise and limit sprawl when credentials are stolen. Package consumers need layered defenses that detect malicious packages, prevent their deployment, and limit damage when compromise occurs.

In this post, we explore best practices for package consumers. These practices are aligned with the AWS Well-Architected Framework – Security Pillar and you can use them to reduce exposure to similar threats and limit their impact if they occur.

Figure 1: Architecture diagram showing Shai-Hulud attack flow

Use temporary credentials and grant least privilege

When Shai-Hulud executed in developer environments and continuous integration and delivery (CI/CD) pipelines, it scanned for secrets such as npm tokens, GitHub tokens, and AWS Identity and Access Management (IAM) access keys. Long-term credentials exposed in this way enabled threat actors to propagate the malware further and access cloud resources. Recent incidents have shown organizations discovering multiple leaked IAM credential pairs, with concerns about additional exposed credentials and potential compromise of CI/CD pipelines.

Removing long-lived credentials from your developer environments and CI/CD pipelines reduces the scope of exposure in the event a system is compromised. For developers working locally, the new AWS CLI login command (aws login) simplifies the process of acquiring short-lived CLI credentials and removes the need to store long-lived credentials in configuration files. AWS IAM Identity Center also provides a straightforward way to acquire temporary credentials that expire automatically. For CI/CD pipelines, OpenID Connect (OIDC) federation with GitHub Actions, GitLab CI, or other platforms provide temporary credentials for each job without storing long-lived tokens. IAM can also federate AWS Identities to external services, allowing your AWS workloads to securely access external services without using long-term credentials. Temporary credentials expire automatically, limiting the window of exposure if a pipeline is compromised.

If you’re interacting with a third-party service that doesn’t support temporary credentials, consider storing the credentials to centralized storage using AWS Secrets Manager. Limit access to these secrets, require the use of temporary credentials, and apply automatic rotation and audit logging to reduce the risk of exposure of these credentials.

To reduce risk from credential exposure:

Use temporary credentials: Federate users from a central identity provider into AWS and use IAM roles for access. SEC02-BP02: Use temporary credentials.
Grant least privilege: Assign only the minimum permissions necessary and use temporary elevated permissions where higher privilege is occasionally required. SEC03-BP02: Grant least privilege access.
Audit and rotate long-term credentials if they are required for specific use cases. SEC02-BP05: Audit and rotate credentials periodically.

In the event of a security incident where credentials might be exposed, immediately rotate all long-term credentials to limit the scope of impact. Use Amazon GuardDuty and AWS CloudTrail to detect abnormal IAM activity and identify which credentials might have been compromised.

Implement defense in depth

Even with temporary credentials and least privilege, a single compromised account can enable threat actors to publish malicious packages or access sensitive resources. Defense in depth creates multiple layers of protection that work together to prevent sprawl after initial compromise. While adding approval workflows to every operation would dramatically decrease deployment speed, implementing them strategically for sensitive workloads provides balanced security.

The key principle is to ensure that if one credential or account is compromised, additional controls prevent that compromise from spreading across your organization. This includes multi-factor authentication (MFA) for access combined with different IAM roles for sensitive workloads. For single-developer open source projects, MFA becomes even more critical because there’s no separation of duties through multiple maintainers.

For package maintainers working in team environments, requiring multiple approvers to release packages to production creates separation of duties. However, developers can still trigger merge requests that initiate deployment pipelines, so multi-party approval should be implemented within the pipeline itself for sensitive deployments. This ensures that even if a developer’s credentials are compromised and they trigger a deployment, the pipeline requires additional approval before releasing to production.

For package consumers, multi-approval workflows in deployment pipelines help ensure that if a malicious package passes initial scanning, human review can catch suspicious changes before production deployment. Artifact signing provides a complementary cryptographic layer that works alongside these process controls.

The Shai-Hulud attack succeeded because compromised maintainer credentials allowed threat actors to publish malicious packages directly to the public npm registry. For package consumers, the defense is ensuring that packages pulled from public registries cannot reach production without verification. Artifact signing is one concrete implementation of this layered approach. By cryptographically binding a package or container image to the identity that produced it, signing creates a verification layer that is independent of the credential used to trigger the build—meaning a compromised developer credential alone isn’t sufficient to introduce an unverified artifact into your deployment pipeline.

Artifact signing as part of defense in depth

AWS Signer provides cryptographic signing for packages, creating an additional verification layer within your defense in depth strategy. The signing authorization model separates concerns: developer credentials shouldn’t have signing permissions. Only CI/CD pipeline roles should have signing permissions through the signer:StartSigningJob API. Signer uses FIPS 140-3 Level 3 validated hardware security modules (HSMs) to store signing keys, providing strong cryptographic protection.

The container image signing workflow, shown in Figure 2, demonstrates how signing integrates seamlessly into existing processes:

The developer or CI/CD pipeline builds a container image and pushes it to Amazon Elastic Container Registry (Amazon ECR)
With Amazon ECR managed signing (new capability), Amazon ECR automatically triggers signing when the image is pushed
Amazon ECR calls Signer with the image digest and signing profile
Signer verifies the signing profile permissions and signs the image digest using keys stored in FIPS 140-3 Level 3 validated HSMs
The signature is stored alongside the image in Amazon ECR using Open Container Initiative (OCI) artifact format
At deployment time, admission controllers (Kyverno for Amazon Elastic Kubernetes Service (Amazon EKS)), lifecycle hooks for Amazon Elastic Container Service (Amazon ECS)) verify signatures before allowing deployment

Figure 2: Diagram showing Signer signing workflow

The benefits of Signer compared to building custom signing infrastructure include:

Fully managed: No need to build custom signing infrastructure or manage certificate lifecycle
Automated: Amazon ECR managed signing happens automatically on image push without manual steps
Centralized governance: Single signing profile can be used across multiple accounts and pipelines
Native integration: Built-in integration with Notation and Kyverno for signature verification in Amazon EKS
FIPS 140-3 Level 3 compliance: Meets stringent regulatory requirements for cryptographic operations

Audit logging for all signing operations enables detection of unusual patterns such as signing from new IP addresses, unusual times, or rapid succession of signing jobs. For every credential in your system, consider who can access what, how they prove their identity, and what second layer prevents sprawl if that credential is compromised. Artifact signing, combined with centralized storage and Software Bills of Materials (SBOMs), provides layered protection against tampering and malicious packages.

Centralize dependency management

By centralizing package and dependency management, you can validate and approve dependencies before they’re used in applications and quickly audit dependencies in the event of a supply chain security incident. Recent incidents have shown organizations discovering compromised npm packages in their internal artifact repositories, requiring rapid assessment of which applications might be affected.

On AWS, you can use AWS CodeArtifact to host and manage your organization’s software packages. You can use the package group configuration to define an approved list of upstream sources and block access to all others—a direct control against typosquatting attacks, where malicious packages are published under names that closely resemble legitimate ones. Rather than relying on developers to identify suspicious package names at install time, package group configuration enforces the boundary at the repository level. Centralization also helps you pin versions of dependencies to prevent automatic updates from pulling in malicious versions and quickly remove compromised dependencies across your software portfolio when an incident occurs.

For container images, Amazon ECR provides centralized image storage with AWS Key Management Service (AWS KMS) encryption and lifecycle policies. Combine Amazon ECR with Amazon Inspector scanning to continuously validate image integrity.

For additional guidance see SEC11-BP05: Centralize services for packages and dependencies.

npm provenance attestation

For npm packages specifically, provenance attestations provide a complementary control on the consumer side. Available since npm 9.5, npm provenance links a published package to the specific source repository and CI/CD workflow that produced it, using Sigstore as the underlying signing infrastructure. When a package is installed, the npm CLI can verify that the published artifact matches the attested build provenance—providing confidence that the package wasn’t tampered with between build and publication. For organizations consuming open source npm packages, checking for provenance attestations before adding a new dependency is a low-friction signal of supply chain integrity. Package maintainers publishing to npm can enable provenance by running npm publish with the –provenance flag from a supported CI/CD environment such as GitHub Actions.

For additional guidance see SEC11-BP06: Deploy software programatically and, from the DevOps Lens of the AWS Well-Architected Framework, DL.CS.2: Sign code artifacts after each build

Scan dependencies throughout the software development lifecycle

AWS provides services to help you scan dependencies continuously, from development through deployment:

In development: Kiro can perform software composition analysis during code reviews to identify vulnerable third-party code.
In code repositories and pipelines: Amazon Inspector scans first-party code, third-party dependencies, and Infrastructure as Code for vulnerabilities.
For container images: Amazon Inspector provides continuous vulnerability scanning of Amazon ECR images. Amazon Inspector can also be integrated directly into CI/CD pipelines to scan images before they are pushed to Amazon ECR or deployed, helping you block compromised dependencies earlier in the release cycle.

Traditional vulnerability scanners focus on known CVEs—publicly disclosed vulnerabilities with assigned identifiers. Supply chain attacks like Shai-Hulud involve malicious packages that function as zero-days: they’re intentionally crafted by threat actors and actively exploited before a CVE is assigned. Traditional vulnerability scanners that rely on CVE databases won’t detect these packages until they’ve been formally identified and catalogued, which can take days or weeks.

Detecting these requires behavioral analysis at scale and community collaboration, not just static signature matching. The operational scale of AWS—with threat intelligence from sources like MadPot and incident response data across millions of customers—enables detection of suspicious package behavior across multiple environments simultaneously. When a newly published package exhibits credential-harvesting behavior in multiple customer accounts within hours of publication, that cross-account signal enables rapid identification. These findings are contributed to community-maintained databases like the OpenSSF Malicious Packages Repository (github.com/ossf/malicious-packages), which assigns a formal identifier (MAL-ID) and shares it across the security community. For the tea.xyz token farming campaign, the average time from submission to formal identification was approximately 30 minutes. AWS services like Amazon Inspector participate in this community loop, contributing findings and ingesting newly assigned MAL-IDs to surface threats in your environment.

A related threat model worth understanding is the sleeper package: a package that appears benign at publication and activates malicious behavior only after a delay or trigger condition. Static analysis alone is insufficient to catch these packages because the malicious payload isn’t present or active at install time. Amazon Inspector behavioral analysis is specifically designed to detect this class of threat, complementing static vulnerability scanning.

Software Bills of Materials (SBOMs) in SPDX or CycloneDX format enable you to quickly assess exposure during incidents. When responding to supply chain incidents, use SBOMs to identify which applications contain compromised packages, prioritize remediation, and assess blast radius. In the Shai-Hulud incident, the compromised packages (MAL-2025-46974 and CVE-2025-59144) were identified early, providing actionable findings that customers could remediate quickly. Organizations that had scanning enabled but experienced alert fatigue may have missed critical alerts, highlighting the importance of proper alert routing and prioritization.

Figure 3: Screenshot of Amazon Inspector console showing malicious package findings

For additional guidance see SEC11-BP02: Automate testing throughout the development and release lifecycle.

Configure logging and monitoring

Visibility into activity is essential to detect anomalous behavior early. Configure logging and centralize those logs for analysis:

Enable application and service logging
Centralize and monitor logs across accounts
Use GuardDuty to continuously monitor for malicious activity and anomalous API calls
Aggregate findings with AWS Security Hub and enforce configuration best practices with AWS Config

CloudTrail logging provides audit trails for credential access and API activity. When responding to supply chain incidents, review CloudTrail logs for specific events that indicate credential compromise or malicious activity: sts:AssumeRole calls from unexpected IP addresses or regions, secretsmanager:GetSecretValue or ssm:GetParameter calls from unfamiliar sources, ecr:PutImage from developer workstations bypassing CI/CD pipelines, lambda:UpdateFunctionCode outside normal deployment windows, iam:CreateAccessKey followed by immediate API activity, and codecommit:GitPush or codebuild:StartBuild from unusual IP addresses. When combined with Amazon EventBridge rules, you can trigger automated responses when Amazon Inspector detects malicious packages or when these unusual credential access patterns occur.

Organizations affected by recent supply chain attacks have used CloudTrail analysis to determine the scope of credential exposure and identify which resources might have been accessed by compromised credentials. This forensic capability is essential for understanding blast radius and ensuring complete remediation.

For additional guidance see the best practices in SEC04: Detection.

These components of a defense in depth strategy work together to prevent sprawl after initial compromise and help you to detect and respond to the event. Figure 4. shows how these fit together.

Figure 4: Architecture diagram showing defense architecture

Additional best practices

The Security pillar of the Well-Architected Framework also provides organizational best practices that are applicable to improving security processes across all dimensions of your organization. Relevant best practices include SEC11-BP01: Train for application security; SEC11-BP08: Build a program that embeds security ownership in workload teams; and SEC10: Incident Response.

Conclusion

Recent incidents including Shai-Hulud, Chalk/Debug, and tea.xyz reflect ongoing efforts by threat actors to target the the package registries, CI/CD pipelines, and developer credentials for increased attack surface and propagation. A single compromised maintainer account or malicious package can propagate across thousands of consumer environments simultaneously. The controls described in this post are designed with that threat model in mind. Temporary credentials limit the value of stolen tokens, centralized dependency management and upstream blocking reduce the attack surface at the registry level, artifact signing ensures that even if a build pipeline is compromised, unsigned artifacts cannot reach production, and dependency scanning throughout your software lifecycle helps you identify compromised packages early, before they can impact you. Each layer narrows the window of opportunity for a threat actor.

Learn more

See the following blogs and workshops to dive deeper on this topic:

If you have feedback about this post, submit comments in the Comments section below.

AWS Security Blog
Governing infrastructure as code using pattern-based policy as code 19 May 2026 at 18:15

Governing infrastructure as code using pattern-based policy as code

AWS Security Blog

By: Guptaji Teegela

19 May 2026 at 18:15

Organizations often struggle to enforce security and compliance requirements consistently across their cloud infrastructure. In one environment, a workload might be deployed in an AWS Region that was never approved for that class of data. In another, a security group might allow broader access than intended. Required tags might be missing. Encryption might be assumed but not configured. These gaps create risk, increase review effort, and make audits harder than they need to be.

Many organizations already have standards that describe what good infrastructure looks like. The more difficult problem is making sure those expectations are checked the same way across repositories, environments, and teams before infrastructure is deployed. Manual review helps, but it doesn’t scale when delivery moves faster and more teams provision infrastructure directly.

Policy as code helps address this problem. It turns control intent into preventive checks that run in delivery workflow.

A pattern-based policy model makes those checks more straightforward to review, maintain, and explain. Teams can organize policy checks around recurring control patterns such as required metadata, allowed configuration, exposure restriction, protection enforcement, and privilege constraint, as shown in Figure 1. This structure simplifies policy coverage across security, governance, risk, and compliance (GRC), and engineering teams.

This post shows you how to use Open Policy Agent (OPA) in continuous integration and continuous delivery (CI/CD) pipelines to validate Amazon Web Services (AWS) infrastructure changes before deployment. You will learn how to structure policy checks around recurring control patterns, fit those checks into a gated delivery workflow, and retain validation artifacts that support both release decisions and later audit review.

The Compliance Engineering and Automation team from AWS Security Assurance Services (AWS SAS) frequently helps customers implement policy as code as part of broader control design and compliance automation efforts. This post focuses on the pre-deployment layer. Runtime monitoring and post-deployment controls still matter, but they are outside the scope of this article.

Figure 1: Pattern-based policy as code in a gated delivery workflow

Organize policies around recurring patterns

Teams sometimes build rules one service at a time, which can make policy as code libraries difficult to review and extend as the library grows. Similar control requirements can be expressed differently across repositories, and teams lose a common way to discuss what the policies are enforcing.

A pattern-based approach organizes policies around recurring control intent rather than service-specific checks, as shown in Figure 2. This makes coverage more straightforward to review, explain, and evolve as infrastructure changes.

A practical set of patterns includes:

Required metadata – for tags and other fields used for ownership, support, cost allocation, and automation.
Allowed configuration – for approved Regions, accepted deployment boundaries, and other approved settings.
Exposure restriction – for configurations that make infrastructure more reachable than intended, such as public ingress or internet-facing resources in the wrong environment.
Protection enforcement – for baseline safeguards such as encryption, logging, or deletion protection.
Privilege constraint – for AWS Identity and Access Management (IAM) definitions and access patterns that need tighter validation.

Figure 2: Recurring control patterns used to organize policy as code checks

Where OPA fits in a layered governance model

This post focuses on the preventive layer. You still need runtime controls, drift monitoring, remediation workflows, and compliance reporting. On AWS, AWS Organizations, AWS Control Tower, AWS Config, and AWS Security Hub remain important after resources exist.

OPA fits earlier in the process and validates that infrastructure changes align with expectations. OPA evaluates structured input (HashiCorp Terraform plan JSON) against policy logic. It doesn’t replace AWS governance services that provide organizational guardrails, continuous monitoring, and resource level enforcement after resources exist.

As shown in Figure 3:

OPA – Checks proposed changes before deployment
AWS Organizations and Control Tower – Establish organizational guardrails
AWS Config and Security Hub – Provide visibility and monitoring after resources exist
Service-level protections – Enforce settings at the resource boundary

Figure 3: OPA validates changes pre-deployment; AWS services enforce guardrails, monitoring, and controls post-deployment

How to implement policy validation in your CI/CD pipeline

Use the following steps to integrate OPA policy evaluation into your delivery workflow:

Submit a change through a pull request or merge request.

Run early validation checks such as formatting, syntax validation, and dependency checks.
Generate a Terraform plan and convert it to JSON format.
Evaluate the plan (JSON format) against the shared OPA policy library.
Publish the validation report as an artifact.
Run additional automated quality checks as needed.
Use the validation artifact during approval decisions for higher-risk environments.
Deploy approved changes.
Continue post-deployment monitoring through AWS-native governance services.

Quality gates provide automated pass or fail results based on defined criteria. Approval gates control whether a change moves into a protected environment. This separation matters—manual approval isn’t the first place where anyone notices missing tags, a disallowed AWS Region, or public ingress. Automated checks identify those issues earlier. OPA belongs in the automated gate layer. Its output also feeds the approval process.

Structure your policy library by control domain and intent

A pattern-based library structure, as shown in the following sample, keeps the policy model closer to how teams talk about controls.

  opa-policies/
  ├── patterns/
  │ ├── baseline/ # Foundational security
  │ ├── tagging/ # Required tags
  │ ├── networking/ # Network controls
  │ ├── logging/ # Logging enablement
  │ ├── encryption/ # Encryption at rest and transit
  │ └── iam/ # IAM best practices
  ├── shared/
  │ ├── helpers.rego
  │ └── messages.rego
  ├── tests/
  ├── fixtures/
  └── docs/

A compliance engineer might describe a requirement as mandatory metadata. A cloud engineer might describe the same requirement as a tagging standard. The pattern structure helps both teams talk about the same thing.

Example 1: Enforce secure transport for Amazon S3

This example demonstrates the protection enforcement pattern for Amazon Simple Storage Service (Amazon S3). The goal is to verify that S3 bucket access is protected in transit by requiring a bucket policy that denies requests when aws:SecureTransport is set to false.

The policy checks two things: whether an S3 bucket policy includes a deny statement that blocks non-encrypted requests, and whether an S3 bucket has any corresponding bucket policy at all. The rule evaluates both create and update actions in the Terraform plan JSON.

This example uses an explicit deny rather than an allow statement for secure transport. An explicit deny overrides allow statements that might exist elsewhere in the policy set, making it the stronger enforcement pattern.

package compliance.amazon_s3.ssl

import future.keywords.in
import future.keywords.contains
import future.keywords.if

# Deny: S3 bucket policy missing SecureTransport deny statement
deny contains msg if {
    resource := input.resource_changes[_]
    resource.type == "aws_s3_bucket_policy"
    is_create_or_update(resource.change.actions)

    policy_value := resource.change.after.policy
    policy := json.unmarshal(policy_value)

    not has_secure_transport_deny(policy)

    msg := sprintf(
        "[S3-OPA-1] Resource '%s' does not enforce SSL/TLS. Bucket policy must include a Deny statement with Condition Bool aws:SecureTransport set to \"false\".",
        [resource.address]
    )
}

# Deny: S3 bucket created without any corresponding bucket policy
deny contains msg if {
    resource := input.resource_changes[_]
    resource.type == "aws_s3_bucket"
    is_create_or_update(resource.change.actions)

    bucket_name := resource.change.after.bucket
    not has_bucket_policy(bucket_name)

    msg := sprintf(
        "[S3-OPA-1] Resource '%s' (bucket '%s') has no bucket policy. A bucket policy with a Deny statement for aws:SecureTransport \"false\" is required.",
        [resource.address, bucket_name]
    )
}

is_create_or_update(actions) if { actions[_] == "create" }
is_create_or_update(actions) if { actions[_] == "update" }

has_bucket_policy(bucket_name) if {
    bp := input.resource_changes[_]
    bp.type == "aws_s3_bucket_policy"
    is_create_or_update(bp.change.actions)
    bp.change.after.bucket == bucket_name
}

has_secure_transport_deny(policy) if {
    stmt := policy.Statement[_]
    stmt.Effect == "Deny"
    stmt.Condition.Bool["aws:SecureTransport"] == "false"
    stmt.Principal == "*"
    action := stmt.Action
    action == "s3:*"
}

When you adapt this example, decide whether you want to require one exact policy shape or support several equivalent forms of enforcement. A strict rule is more straightforward to reason about, but it might create false positives if teams already use alternate policy structures that achieve the same outcome.

Example 2: Restrict public ingress on sensitive ports

This example implements the exposure restriction pattern. The goal is to identify Amazon Virtual Private Cloud (Amazon VPC) security group configurations that allow public ingress on sensitive ports before those rules are deployed.

The policy evaluates both inline aws_security_group ingress rules and standalone aws_security_group_rule resources, because customer repositories often use both modeling styles.

This example checks directly for public ingress on sensitive ports rather than trying to infer whether later controls might reduce actual exposure. Security group rules are a direct expression of intended network reachability, making them the right place to enforce this pattern early.

package compliance.amazon_vpc.ingress

import future.keywords.in
import future.keywords.contains
import future.keywords.if

# Sensitive ports that must not be open to the internet
sensitive_ports := {22, 3389, 5432}

# Deny: aws_security_group with inline ingress open to 0.0.0.0/0 on sensitive ports
deny contains msg if {
    resource := input.resource_changes[_]
    resource.type == "aws_security_group"
    is_create_or_update(resource.change.actions)

    ingress := resource.change.after.ingress[_]
    ingress.cidr_blocks[_] == "0.0.0.0/0"

    port := sensitive_ports[_]
    ingress.from_port <= port
    ingress.to_port >= port

    msg := sprintf(
        "[VPC-OPA-1] Resource '%s' allows ingress from 0.0.0.0/0 on port %d. Restrict access to specific CIDR ranges.",
        [resource.address, port]
    )
}

# Deny: aws_security_group_rule with type "ingress" open to 0.0.0.0/0 on sensitive ports
deny contains msg if {
    resource := input.resource_changes[_]
    resource.type == "aws_security_group_rule"
    is_create_or_update(resource.change.actions)

    resource.change.after.type == "ingress"
    resource.change.after.cidr_blocks[_] == "0.0.0.0/0"

    port := sensitive_ports[_]
    resource.change.after.from_port <= port
    resource.change.after.to_port >= port

    msg := sprintf(
        "[VPC-OPA-1] Resource '%s' allows ingress from 0.0.0.0/0 on port %d. Restrict access to specific CIDR ranges.",
        [resource.address, port]
    )
}

is_create_or_update(actions) if { actions[_] == "create" }
is_create_or_update(actions) if { actions[_] == "update" }

When you adapt this example, review which ports to treat as sensitive, whether both IPv4 and IPv6 exposure need checking, and how to handle approved exceptions.

Example 3: Enforce least privilege trust policy for IAM roles

This example implements the privilege constraint pattern for IAM role trust policies. The goal is to identify trust relationships that allow overly broad principals to assume a role. The policy inspects the assume_role_policy document for aws_iam_role resources and looks for wildcard principals in three valid representations: Principal is "*", Principal.AWS is "*", and Principal.AWS is an array containing "*". A wildcard principal allows a broader set of callers than most environments intend to permit. By treating wildcard principals as the prohibited pattern, the rule enforces a safer default and returns a clear result that reviewers can understand quickly.

package compliance.amazon_iam.trust

import future.keywords.in
import future.keywords.contains
import future.keywords.if

# Deny: IAM role with wildcard principal in trust policy
deny contains msg if {
    resource := input.resource_changes[_]
    resource.type == "aws_iam_role"
    is_create_or_update(resource.change.actions)

    policy_value := resource.change.after.assume_role_policy
    policy := json.unmarshal(policy_value)

    stmt := policy.Statement[_]
    stmt.Effect == "Allow"
    has_wildcard_principal(stmt)

    msg := sprintf(
        "[IAM-OPA-2] Resource '%s' has a wildcard principal in its trust policy. Specify explicit account ARNs, service principals, or federated providers instead of \"*\".",
        [resource.address]
    )
}

# Principal is directly "*"
has_wildcard_principal(stmt) if {
    stmt.Principal == "*"
}

# Principal.AWS is "*"
has_wildcard_principal(stmt) if {
    stmt.Principal.AWS == "*"
}

# Principal.AWS is an array containing "*"
has_wildcard_principal(stmt) if {
    stmt.Principal.AWS[_] == "*"
}

is_create_or_update(actions) if { actions[_] == "create" }
is_create_or_update(actions) if { actions[_] == "update" }

When you adapt this example, decide what least privilege means for your IAM trust model. The key design choice is whether your policy checks for a single prohibited pattern or validates trust relationships against an approved set of trusted principals and conditions.

AWS Labs provides IAM Policy Autopilot, an open-source Model Context Protocol (MCP) server and command-line tool that helps generate baseline identity-based IAM policies from application code. That is adjacent to the pattern shown here —IAM Policy Autopilot helps with policy generation, while this example focuses on validating whether IAM role trust policies are scoped appropriately in infrastructure changes.

CI/CD implementation examples

The following examples show the same operating model in two common CI/CD systems. The syntax changes, but the sequence stays the same: validate, plan, evaluate policy, retain the artifact, and use the result during promotion and approval. These examples assume OPA is installed in your CI/CD environment, the opa-policies directory contains your policy library, and Terraform is configured with appropriate credentials.

GitLab CI

stages:
  - validate
  - plan
  - policy_check

variables:
  TF_IN_AUTOMATION: "true"

terraform_validate:
  stage: validate
  script:
    - terraform fmt -check
    - terraform init
    - terraform validate

terraform_plan:
  stage: plan
  script:
    - terraform plan -out=tfplan
    - terraform show -json tfplan > tfplan.json
  artifacts:
    paths:
      - tfplan.json

opa_policy_check:
  stage: policy_check
  script:
    - opa eval --format pretty --data opa-policies --input tfplan.json "data.terraform.deny"
    - opa eval --format json --data opa-policies --input tfplan.json "data.terraform.deny" > policy-report.json
  artifacts:
    paths:
      - policy-report.json

GitHub Actions

name: Terraform Policy Check
on:
  pull_request:

jobs:
  policy-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3

      - name: Terraform Format Check
        run: terraform fmt -check
      - name: Terraform Init
        run: terraform init
      - name: Terraform Validate
        run: terraform validate
      - name: Terraform Plan
        run: terraform plan -out=tfplan
      - name: Convert Plan to JSON
        run: terraform show -json tfplan > tfplan.json
      - name: Run OPA Policy Check
        run: |
          opa eval --format pretty --data opa-policies --input tfplan.json "data.terraform.deny"
          opa eval --format json --data opa-policies --input tfplan.json "data.terraform.deny" > policy-report.json
      - name: Upload Validation Artifact
        uses: actions/upload-artifact@v4
        with:
          name: policy-report
          path: policy-report.json

Retain validation artifacts for review and audit support

In mature delivery workflows, policy results don’t disappear into pipeline logs but are retained as validation artifacts. Those artifacts help reviewers decide whether a change is ready for approval, supports exception handling by showing which controls failed and why, and can stay with the change record for later audit discussions. At a minimum, the artifact identifies the change or pipeline run, the evaluated scope, the policy package or version, the checks that ran, and the pass or fail results.

Test the policy model like software

The first few rules are usually straightforward.The real work starts when the library grows and multiple teams depend on it. Testing includes:

Positive and negative test cases – Each policy has cases that show valid input and cases that show expected failures.
Regression coverage – Shared helpers need regression coverage.
Realistic fixtures – Terraform plan fixtures look like real changes rather than tiny made-up samples.
Impact analysis – When a rule changes, teams can tell quickly what else might be affected.

If developers stop trusting the results, they stop treating policy as a useful mechanism.

A phased approach to rolling out policy checks

You don’t need broad coverage on day one. A phased rollout works better than an all at once enforcement approach.

Phase 1: Assess and pilot

Start in advisory mode so teams can see results without being blocked.
Identify two or three high-confidence patterns such as required metadata, approved Regions, or public exposure restrictions.
Run OPA against existing pipelines and review the output for accuracy.

Phase 2: Begin enforcement

Enforce the small set of high-confidence patterns after the output is stable and the failures are useful.
Integrate validation artifacts into your approval workflow.
Establish ownership and exception handling processes for shared packages.

Phase 3: Operationalize and expand

Formalize versioning for shared policy packages.
Expand pattern coverage based on team feedback and organizational priorities.
Connect pre-deployment validation with post-deployment monitoring through AWS Config, AWS Security Hub, and AWS Organizations.

Conclusion

Policy as code helps narrow the distance between what an organization says it expects and what its delivery system checks. By implementing these OPA patterns in your CI/CD pipelines, you can build a preventive layer that evaluates infrastructure changes before deployment. With a pattern-based library, validation artifacts, and clear ownership, policy as code becomes a repeatable way to help translate control intent into day-to-day delivery, while AWS governance services continue to provide visibility and monitoring after resources exist.

To learn more about policy as code and AWS governance capabilities, see:

Contact AWS Security Assurance Services – Get help with your compliance engineering journey
Open Policy Agent Documentation – Read the official OPA documentation and policy language reference
AWS Security Hub User Guide – Learn how to aggregate and prioritize security findings
AWS Well-Architected Framework: Security pillar – Review security best practices for your workloads
AWS Config Developer Guide – Learn how to monitor and record resource configurations
IAM Policy Autopilot – An open source command line interface (CLI) and MCP server from AWS Labs that helps generate IAM policies

If you have feedback about this post, submit comments in the Comments section below.

AWS Security Blog
Regional routing for AWS access portals: Implementing custom vanity domains for IAM Identity Center 14 May 2026 at 22:42

Regional routing for AWS access portals: Implementing custom vanity domains for IAM Identity Center

AWS Security Blog

By: Georgi Baghdasaryan

14 May 2026 at 22:42

AWS IAM Identity Center provides a web-based access portal that gives your workforce a single place to view their AWS accounts and applications. With the recent launch of IAM Identity Center multi-Region replication, customers can replicate their IAM Identity Center instance across multiple AWS Regions to improve resilience and reduce latency for a globally distributed workforce. As a result, users have a dedicated access portal URL in each Region where Identity Center is replicated, and where administrators need a consistent way to manage these portals to ensure that each user reaches the right one.

This post walks you through building a custom vanity domain (for example, aws.mycompany.com) that serves as a single, memorable entry point for access to IAM Identity Center through the AWS Management Console. The solution uses latency-based routing to automatically redirect users to their nearest healthy access portal endpoint and provides a mechanism to trigger failovers when a Regional Identity Center instance, or the broader AWS Region, is impaired. Because this solution operates outside of Identity Center—at the DNS and load balancer layer—users are transparently redirected to the appropriate Regional access portal URL. Note that the vanity domain itself will not appear in the browser’s address bar.

This guide is structured in three progressive phases: a single-Region redirect, multi-Region latency routing, and automatic health-based failover. You can adopt each phase independently, depending on your organization’s needs.

Note: While this guide focuses on IAM Identity Center access portal endpoints, the same approach using Amazon Route 53 latency-based routing, Application Load Balancer (ALB) redirects, and Amazon Application Recovery Controller (ARC) Region switch can be applied to build a custom vanity domain and intelligent routing layer for any other HTTP endpoint type.

Background

IAM Identity Center supports multiple access portal URL formats that resolve to the same web portal. The following table summarizes the supported formats in the standard AWS (classic) partition, along with their capabilities:

Format	IPv4	Dual-stack	Multi-Region*	Example
https://{directoryId}.awsapps.com/start	Yes	No	No	https://d-1234567890.awsapps.com/start
https://{alias}.awsapps.com/start	Yes	No	No	https://mycompany.awsapps.com/start
https://{idcInstanceId}.{region}.portal.amazonaws.com	Yes	No	Yes	https://ssoins-1234567890.us-west-2.portal.amazonaws.com
https://{idcInstanceId}.portal.{region}.app.aws ★	Yes	Yes	Yes	https://ssoins-1234567890.portal.us-west-2.app.aws

* Each Regional URL resolves only to its own Region’s portal instance and doesn’t fail over to another Region. Multi-Region here means the URL format is available in every Region where IAM Identity Center is replicated. To route users across Regions dynamically, use the vanity domain approach described in this post.

Note: The ★ highlighted row (https://{idcInstanceId}.portal.{region}.app.aws) is the recommended URL format. It supports both dual-stack (IPv4 and IPv6) and IAM Identity Center multi-Region replication. The awsapps.com formats aren’t always available in newer Regions and don’t support multi-Region capabilities. In additional replicated Regions, the custom alias isn’t supported, and the awsapps.com parent domain isn’t available.

Working with multiple Regional endpoints

As you expand your IAM Identity Center footprint through multi-Region replication, each replicated Region provides a dedicated access portal URL—directing your users to the low-latency entry point closest to their location. A user connecting from Europe and one connecting from Asia Pacific each benefit from their respective Regional endpoint. To deliver the best experience, organizations need a consistent, centrally managed way to direct users to the correct Regional destination; there are a few common approaches you can use to achieve this.

Customers typically start with a single Regional endpoint, which is straightforward to configure, but users in distant Regions experience higher latency, and a Regional incident can affect all users regardless of location. Others maintain per-Region bookmarks or configuration, which gives each user population the right endpoint but requires ongoing IT coordination and clear communication to users.

Custom vanity domains give you full control over DNS routing, health checks, and failover of your access portal connections; all behind a single, brand-aligned domain name (for example, aws.mycompany.com) that users access. A vanity domain makes this start URL memorable and consistent for users, regardless of the underlying IAM Identity Center configuration – a single address to remember and share, compared to maintaining a separate bookmark for each Regional endpoint or managing a growing list of application tiles in your external identity provider. The rest of this guide walks you through how to deploy this solution step by step.

Solution overview

The solution builds a lightweight routing and redirect layer in front of the IAM Identity Center access portal Regional endpoints. The architecture has the following components:

AWS IAM Identity Center – Your existing Identity Center instance
Amazon Route 53 – Manages your vanity domain’s hosted zone, latency-based routing policy, and health checks
AWS Certificate Manager (ACM) – Issues and automatically renews TLS certificates for your vanity domain in each Region
Application Load Balancer (ALB) – Handles HTTP and HTTPS traffic, issuing 302 redirects to the appropriate Regional access portal endpoint
Amazon Application Recovery Controller (ARC) Region switch – Orchestrates Regional failovers by controlling Route 53 health check states, so traffic is automatically shifted away from an unhealthy Region

This guide is structured in three progressive phases. You can adopt each phase incrementally based on your needs:

Phase 1: Sets up the vanity domain with a redirect to a single Regional access portal endpoint. Suitable for organizations with a single-Region Identity Center deployment.
Phase 2: Extends Phase 1 across multiple Regions with latency-based routing, so users are automatically directed to the nearest Regional endpoint. Requires IAM Identity Center multi-Region replication.
Phase 3: Adds an ARC Region switch for managed Regional failover. Without Phase 3, a Regional impairment requires manual DNS updates to redirect traffic. ARC automates this with rehearsable, controlled failover plans.

Figure 1: Solution architecture for custom vanity domain routing with IAM Identity Center.

When a user navigates to aws.mycompany.com, the following happens:

Route 53 evaluates the latency records and routes traffic to the ALB in the lowest-latency healthy Region.
The ALB terminates TLS using an ACM-managed certificate and issues a 302 redirect to the corresponding Regional Identity Center access portal URL.
The user’s browser follows the redirect and loads the access portal directly. Subsequent authentication traffic flows between the browser and AWS—the ALB isn’t in the path.

If you’ve implemented Phase 3, ARC controls Route 53 health check states for each Region. With this configuration, you can stop routing traffic to any Region considered unhealthy.

Prerequisites

Before you begin to build the solution, ensure you have the following in place:

An existing top-level domain (TLD) (for example, mycompany.com).
An AWS IAM Identity Center organization instance configured.
For Phases 2 and 3, you need IAM Identity Center multi-Region replication configured with at least two Regions. See Setting up IAM Identity Center multi-Region replication for instructions.
AWS Identity and Access Management (IAM) permissions on a dedicated networking or shared services account in your organization to manage Route 53, ACM, Amazon Elastic Compute Cloud (Amazon EC2), ALB (phase 1 and 2), and ARC (phase 3).

Phase 1: Redirect to a single predefined access portal endpoint

In this phase, you create the foundational infrastructure: a Route 53 hosted zone, an ACM-managed TLS certificate, and an internet-facing ALB that issues a 302 redirect to your Regional access portal URL. By the end, users who navigate to aws.mycompany.com will be seamlessly redirected to your Identity Center portal.

Create a Route 53 hosted zone for your vanity domain

The hosted zone holds the DNS records that control how aws.mycompany.com resolves. If your top-level domain (mycompany.com) is already registered in Route 53, you create a subdomain hosted zone. If it’s registered with another registrar, you create a public hosted zone and configure name server (NS) delegation manually.

In the AWS Management Console, navigate to Route 53 and choose Hosted zones, then Create hosted zone.
Enter your vanity domain in the Domain name field (for example, aws.mycompany.com).
Select Public hosted zone as the type, then choose Create hosted zone.
Note the four NS records that Route 53 creates for the new hosted zone. You will need these in the next step.

Figure 2: Route 53 hosted zone details

Delegate your subdomain from the parent domain

To make Route 53 authoritative for aws.mycompany.com, you must add an NS record in the parent zone (mycompany.com) pointing to the name servers of the new hosted zone.

If mycompany.com is hosted in Route 53: Open the mycompany.com hosted zone, choose Create record, set the record name to aws, the type to NS, and paste the four NS values from the previous step. Choose Create records.
If mycompany.com is hosted elsewhere: Sign in to your registrar’s DNS management console and add an NS record for aws.mycompany.com using the four name server values from the previous step.

Note: DNS propagation for NS delegation can take up to 48 hours, though it typically completes within a few minutes for Route 53-to-Route 53 delegation.

Figure 3: Create a NS record type to delegate your subdomain from the parent domain

Request an ACM certificate

Your ALB requires a TLS certificate for aws.mycompany.com to serve HTTPS traffic. ACM provides free public certificates with automatic renewal.

Go to the Certificate Manager console in the primary Region of IAM Identity Center (for example, us-east-2) and choose Request a certificate.
Select Request a public certificate and choose Next.
Enter your domain name (for example, aws.mycompany.com). Choose Add another name to this certificate and enter your Regional sub-domain (for example, us-east-2.aws.mycompany.com).
Leave other options as defaults (Disable export, DNS validation – recommended, and key algorithm – RSA 2048) and choose Request.
In the certificate details page, choose Create records in Route 53. ACM will automatically add the validation CNAME records to your hosted zone. The certificate status changes to Issued within a few minutes.

Figure 4: Request an ACM certificate for your domain

Create a security group for Identity Center ALB

The security group needs to allow inbound HTTP and HTTPS traffic for both IPv4 and IPv6 from the public internet to make the load balancer reachable.

Go to the Amazon EC2 console, navigate to Security Groups, and choose Create security group.
Enter a Name (for example, identitycenter-global-domain-alb-sg-us-east-2) and Description. Add four rules by choosing Add Rule under Inbound Rules.
1. Set Type to HTTP, and Source to Anywhere-IPv4 (0.0.0.0/0) and to Anywhere-IPv6 (::/0).
2. Set Type to HTTPS, and Source to Anywhere-IPv4 (0.0.0.0/0) and to Anywhere-IPv6 (::/0).
Choose Add Rule under Outbound Rules and set Type to All traffic and Source to Anywhere-IPv6 (::/0).
Choose Create security group.

Figure 5: ALB security group rules

Create an ALB with an HTTP and HTTPS redirect rule

The ALB is the component that performs the actual redirect to your IAM Identity Center access portal URL. The ALB listener accepts HTTPS requests on port 443 and responds with a 302 redirect to the appropriate Regional Identity Center access portal endpoint.

Go to the Amazon EC2 console, navigate to Load Balancers, and choose Create load balancer. Select Application Load Balancer.
Enter a name for your ALB (for example, identitycenter-redirect-alb).
Configure basic settings: Set the scheme to Internet-facing, IP address type to Dualstack (or IPv4 if IPv6 isn’t supported by your virtual private cloud (VPC)), and select at least two Availability Zones. Ensure that the load balancer is operating in a VPC and subnets that are internet-facing.
Under Security Groups choose the Security Group created in the previous step.
Configure an HTTP listener: Add a listener on port 80 (HTTP) with Redirect to URL option. Choose URL parts and set Protocol to HTTPS, Port to 443, and status code to 302 (Found).

Figure 6: Add an HTTP listener during ALB creation
Configure an HTTPS listener: Add a listener on port 443 (HTTPS) with No pre-routing action (default) and Redirect to URL options. Choose Full URL and set the URL to your Regional Identity Center access portal endpoint (For example, https://ssoins-1234567890.portal.<your-region>.app.aws, for this blog the region is us-east-1). Set status code to 302 (Found).

Figure 7: Add an HTTPS listener
Under Default SSL/TLS certificate, select the ACM certificate you created in Step 3.

Note: Make sure to select 302 – Found as the Status code. Selecting 301 – Permanently moved will result in browser caching the redirect URL which will prevent failovers from working correctly until the cache expires.

Create Regional Route 53 records pointing to your ALB

Create a DNS record in your hosted zone that resolves <your-region>.aws.mycompany.com to your ALB.

Open your Route 53 hosted zone for aws.mycompany.com and choose Create record.
Set the record name to the AWS Region name (For example: us-east-2) and the record type to A.
Toggle Alias and in the drop down menu Route traffic to, select the alias target to Alias to Application and Classic Load Balancer, select your Region (For example:us-east-2), and select your ALB from the dropdown list.
Leave routing policy as Simple routing, and select the Region (For example:us-east-2) and choose Create records.
Repeat steps 1 through 4 to create AAAA record types.

Figure 8: Route 53 record with simple routing policy

Add latency-based routing configurations

Finally, create a DNS record in your hosted zone that resolves aws.mycompany.com to your Regional Route 53 record.

Open your Route 53 hosted zone for aws.mycompany.com and choose Create record.
Keep the subdomain name for this record as empty, so aws.mycompany.com is the fully qualified record and set the record type to A.
Enable alias: Set the Route traffic to Alias to another record in this hosted zone, and select the hosted zone you created earlier (us-east-2.aws.mycompany.com).
Set Routing Policy to Latency and select the corresponding Region (us-east-2 in this example).
Add a clear name for the Record ID, such as us-east-2--ipv4 as a differentiator and choose Create records.
Repeat the steps 1 through 5 to create AAAA record types with us-east-2--ipv6 as the record ID.

Figure 9: Route 53 record with latency-based routing

Test the configuration by navigating to https://aws.mycompany.com in a browser. You should be redirected to your Identity Center access portal. You can also validate using:
curl -I https://aws.mycompany.com

Expected response:

HTTP/2 302

location: https://ssoins-1234567890.portal.<your-region>.app.aws

Tip: To deploy Phase 1 automatically, download the CloudFormation template from the Deploying with CloudFormation section below.

Phase 2: Automatically route to the nearest Regional access portal endpoint

Phase 2 extends the solution to support IAM Identity Center multi-Region replication by deploying an ALB in each replicated Region and configuring Route 53 latency-based routing. Users are automatically directed to the access portal in the Region that has the lowest network latency from their location, which matches the active-active behavior of the Identity Center access portal itself.

Request ACM certificates in each additional Region

Repeat the steps from Request an ACM Certificate for each additional Region (for example, us-west-2) where you’ve replicated IAM Identity Center.

Create a security group and an ALB in each additional Region

Repeat the steps from Create a security group for Identity Center ALB and Create an ALB with an HTTP and HTTPS redirect rule in each additional Region. In each ALB’s redirect rule, set the target URL to the access portal endpoint for that specific Region. For example:

us-east-2 ALB redirects to https://ssoins-1234567890.portal.us-east-2.app.aws
us-west-2 ALB redirects to https://ssoins-1234567890.portal.us-west-2.app.aws

Create Regional and latency Route 53 records for the additional Region

For each additional Region where you’ve deployed an ALB and replicated Identity Center, create Regional and latency A and AAAA records as outlined in Create Regional Route 53 records pointing to your ALB and Add latency-based routing configurations.

Tip: To deploy Phase 2 automatically, download the CloudFormation template from the following Deploying with CloudFormation section.

Phase 3: Regional failover using ARC Region switch

Phase 3 introduces Amazon Application Recovery Controller (ARC) Region switch, a fully managed capability that you can use to plan, practice, and orchestrate Regional failovers with confidence. ARC Region switch vends Route 53 health checks directly as part of a Region switch plan. You attach these generated health checks to your Route 53 latency records, and ARC controls their healthy or unhealthy state during plan execution. You can further extend the solution to include custom automation triggered by Amazon CloudWatch alarms or synthetic canaries to update routing control state.

We recommend creating your ARC Region switch plan in the primary Region of your IAM Identity Center for ease of discovery.

Create an active-active instance of ARC Region switch plan

Create an ARC Region switch plan that will orchestrate failovers between your IAM Identity Center Regions and auto-generate the Route 53 health checks you will reference in the next step.

Open the Application Recovery Controller console and choose Region switch in the navigation pane. Select Create Region Switch Plan.
Enter a Plan name (for example, idc-access-portal-failover) and an optional description. Choose Active/Active for Multi-Region recovery approach. Select the Regions where IAM Identity Center is replicated ,including the primary Region.
In the Execution Permission section, enter the Amazon Resource Name (ARN) of the IAM role that ARC will use to update Route 53 health check states during plan execution. If you don’t have an existing role, choose Create a new role to have ARC create one automatically. See AWS Managed Policy: AmazonApplicationRecoveryControllerRegionSwitchPlanExecutionPolicy for information about required permissions.
Choose Create Plan and proceed to Build workflows. Enter optional descriptions and choose Save and continue.

Figure 10: Region switch plan
Set the Workflow type to Activate and set the Region to the corresponding Region (us-east-2 or us-west-2). Within each workflow, choose Add step/Run in Sequence. Choose an execution block to Amazon Route 53 health check execution block under Networking.
Choose Add and edit. Enter a Step name (for example, Activate Route53 Record Set).
Set the Hosted zone to the hosted zone ID for your aws.mycompany.com domain, and set the Record name to aws.mycompany.com.
Expand Record set identifiers. Choose Add record set identifier and enter a unique identifier for the record set (for example, us-east-2--ipv4 and us-east2--ipv6) and select your Region. Add two record set identifiers (A and AAAA records) for each of your Regions.
Choose Save step.
Repeat steps 5 and 6 for Deactivate and choose Save the plan.

Figure 11: Workflow builder
Choose Save workflows.
Select the newly created plan and choose the Monitoring tab. Note the IDs of the health checks created.

Figure 12: IAM Identity Center access portal plan

Update Route 53 record sets to reference ARC-managed health checks

Associate the ARC-generated health check IDs with the latency-based A and AAAA records you created in Phase 1 and 2. Route 53 uses these health checks—which are now controlled by ARC—to determine which Regions are eligible for DNS resolution. Route 53 still uses latency to choose from the healthy Regions.

1. Go to the Route 53 console and choose Hosted zones.
2. Select the hosted zone for aws.mycompany.com.
3. Find the latency-based A record for us-east-2 that you created in Phase 2, and choose Edit record.
4. In the Health check section, enable Associate with a health check. In the Health check ID dropdown, select the ARC-generated health check for us-east-2 that you noted at the end of the preceding procedure. Note: Ignore the warning This health check ID doesn’t belong to this AWS account. Make sure you have copied it accurately to use it.
5. Choose Save changes.
6. Repeat steps 3, 4, and 5 for A and AAAA records for each of your IAM Identity Center Regions.

Figure 13: Update Route53 record sets

Validate the setup by performing a failover

Validate the end-to-end configuration by executing a controlled failover. Because latency-based routing will always resolve aws.mycompany.com to us-east-2 for users in the primary geography, deactivating us-east-2 is the most direct way to confirm that Route 53 correctly fails over to us-west-2.

1. Before executing the failover, confirm that aws.mycompany.com is resolving to the us-east-2:
  curl -I https://aws.mycompany.com
  Expected: A record pointing to the us-east-2 access portal URL (for example, https://ssoins-1234567890.portal.us-east-2.app.aws:443/).
2. Go to the Amazon Application Recovery Controller console. In the left navigation pane, choose Region switch.
3. Select your Region switch plan (idc-access-portal-failover) to open the plan details page.
4. Choose Execute recovery.
5. On the Execute plan page, select us-east-2 as the Region to fail out of.
6. Select the Deactivate action and choose Start execution. ARC sets the us-east-2 health check to unhealthy. Route 53 stops resolving aws.mycompany.com to the us-east-2 ALB and routes traffic to us-west-2 instead.
7. After a few seconds, confirm the failover has taken effect:
  curl -I https://aws.mycompany.com
  Expected: 302 redirect to the us-west-2 IAM Identity Center access portal URL
8. To fail back, choose Execute plan again. Select us-east-2, select the Activate action and choose Start execution. ARC marks the us-east-2 health check healthy and Route 53 resumes routing traffic to that Region.

Tip: To deploy Phase 3 automatically, download the CloudFormation template from the Deploying with CloudFormation section that follows.

Deploying with CloudFormation

As an alternative to the manual console steps described previously, we provide CloudFormation templates that you can download and deploy for each phase. Each template is self-contained and parameterized, so you only need to provide your environment-specific values (such as your vanity domain name, VPC, and subnet IDs). Download the templates from the following links:

Phase 1 – Single-Region redirect: phase1-single-region-redirect.yaml
Phase 2 – Multi-Region latency-based routing: phase2-multi-region-latency.yaml
Phase 3 – ARC Region switch failover: phase3-arc-region-switch.yaml

To deploy a template, navigate to the AWS CloudFormation console, choose Create stack, select Upload a template file, and upload the downloaded YAML file. Follow the prompts to provide parameter values and create the stack. For Phase 2, deploy the template once in each additional Region.

Deploy all phases with a single script

As an alternative to deploying each CloudFormation template individually, you can use the provided deploy.sh bash script to deploy all three phases in sequence. The script automates stack creation across your primary and additional Region. To get started, download the deployment package, then unzip the file into a local directory:

wget https://aws-security-blog-content.s3.us-east-1.amazonaws.com/public/sample/3536-regional-routing-for-aws-access-portals/Vanity-domains-cfn.zip
unzip  Vanity-domains-cfn.zip
cd Vanity-domains-cfn

Before running the script, open the deploy.sh file and update the following required parameters with your environment-specific values:

TLD – Your top-level domain (for example, mycompany.com)
TLD_HOSTED_ZONE_ID – The Route 53 hosted zone ID for your top-level domain
IDC_SUBDOMAIN – The Identity Center subdomain name (for example, aws)
IDC_INSTANCE_ID – Your IAM Identity Center instance ID (for example, ssoins-1234567890)
PRIMARY_REGION – The primary Region for your Identity Center instance (for example, us-east-2)
ADDITIONAL_REGIONS – The additional Region for multi-Region replication (for example, us-west-2)

After updating the configuration, run the deployment script:

./deploy.sh

The script deploys Phase 1 (single-Region redirect), Phase 2 (multi-Region latency-based routing), and Phase 3 (ARC Region switch failover) in order. Monitor the terminal output for stack creation progress and any errors.

After completing the setup, you can integrate the vanity URL (for example, aws.mycompany.com) directly into your identity provider, such as Okta or Microsoft Entra ID, as a bookmark application or a chiclet URL. By configuring the vanity URL as the bookmark target, users who launch the application from their identity provider dashboard are always redirected to the nearest IAM Identity Center access portal endpoint through latency-based routing. If a Regional impairment occurs and a failover is necessary, administrators can execute an ARC Region switch to deactivate the impaired Region, and users will automatically be redirected to the active Identity Center endpoint without any change to the bookmark URL or end-user experience.

Conclusion

In this post, you learned how to build a custom vanity domain for an AWS IAM Identity Center access portal using Amazon Route 53, AWS Certificate Manager, Application Load Balancer, and an Amazon Application Recovery Controller (ARC) Region switch. The three-phase approach lets you start with a single-Region redirect, progressively add latency-based routing as your IAM Identity Center footprint grows with multi-Region replication, and then introduce an ARC Region switch to gain fully managed, rehearsable Regional failover.

For more information about IAM Identity Center multi-Region replication, see the IAM Identity Center User Guide. For more resilience patterns, visit the AWS Architecture Blog posts about Resilience. If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Resources

AWS Security Blog
Automating post-quantum cryptography readiness using AWS Config 14 May 2026 at 18:18

Automating post-quantum cryptography readiness using AWS Config

AWS Security Blog

By: Pravin Nair

14 May 2026 at 18:18

Migrating your TLS endpoints to Post-quantum cryptography (PQC) starts with understanding your current TLS endpoint inventory and posture. This post introduces the PQC Readiness Scanner — an automated tool that inventories your Application Load Balancer (ALB), Network Load Balancer (NLB), and Amazon API Gateway endpoints and continuously monitors their TLS configurations for PQC readiness. The scanner classifies each endpoint into a three-tier framework that helps prioritize and plan PQC migration.

As quantum computing advances, you need to migrate to quantum-resistant cryptography to protect your data long-term. The PQC Readiness Scanner helps you identify which endpoints to migrate first and tracks your progress across accounts. For web traffic, PQC key exchange algorithms are negotiated only within TLS 1.3. This means quantum-resistant connections require endpoints that support TLS 1.3 and PQC key exchange.

Under the AWS Shared Responsibility Model, AWS secures the infrastructure and enables PQC support across its services. Customers are responsible for configuring their resources to use PQC-capable TLS policies. For AWS-terminated TLS connections—such as those on Application Load Balancer (ALB), Network Load Balancer (NLB), Amazon API Gateway, and Amazon CloudFront—customers choose the security policy (an AWS-managed configuration defining supported TLS protocol versions and cipher suites for a listener) that determines TLS version and cipher suite, key exchange, and authentication algorithm support.

The automated PQC Readiness Scanner for AWS-terminated TLS endpoints is built using AWS Config conformance packs. A conformance pack is a collection of AWS Config rules and remediation actions that can be deployed as a single entity in an account and a Region or across an organization in AWS Organizations.

Solution overview

The PQC Readiness Scanner deploys AWS Config rules using a conformance pack to evaluate the security policy on each endpoint. Based on the evaluation, each resource is classified into a three-tier readiness framework that prioritizes migration actions needed to achieve PQ-ready TLS.

The PQC Readiness Scanner performs two checks per resource:

Does the endpoint use a PQ-ready security policy?
Does the endpoint support legacy TLS 1.0 or 1.1?

Each check returns COMPLIANT or NON_COMPLIANT status with specific policy recommendations.

PQC requires endpoints to support TLS 1.3 and use PQC key exchange algorithms. The three-tier framework helps you interpret findings and prioritize fixes. The goal is to have TLS 1.3 with PQC key exchange enabled on the endpoints. However, achieving this requires maintaining backward compatibility with clients.

Tier	Readiness level	TLS protocols	PQC status	Migration priority
Tier 1	PQ-ready (strongest posture)	TLS 1.3 only with PQC key exchange	PQ-ready	None
Tier 2	PQ-ready (backward compatible)	TLS 1.2 and 1.3 with PQC key exchange	PQ-ready	Low
Tier 3	Not PQ-ready	No PQC key exchange	Not PQ-ready	High

How to prioritize your migrations

Tier 1 represents the strongest security using only TLS 1.3 with PQC key exchange. These resources already meet the target state.
Tier 2 represents a backward-compatible PQ-ready configuration. Endpoints support both TLS 1.2 and TLS 1.3, with PQC key exchange negotiated on TLS 1.3 connections. Migration priority is low because these resources already provide quantum-resistant protection for clients that support TLS 1.3, while maintaining TLS 1.2 compatibility for legacy clients. Migrate to Tier 1 when client-side analysis confirms that the connecting clients support TLS 1.3 with PQC key exchange.
Tier 3 covers resources that aren’t PQ-ready. This includes endpoints without TLS 1.3 support, endpoints with TLS 1.3 but without PQC key exchange policies. These resources require immediate attention.

Assessment scope

The scanner evaluates the following AWS edge services that terminate TLS connections on behalf of your applications.

Edge services:
- Application Load Balancer (ALB), Network Load Balancer (NLB) listeners with HTTPS, TLS, and TCP SSL protocols are evaluated.
- API Gateway REST APIs are evaluated for AWS Regional and private endpoints along with API Gateway HTTP APIs (v2) and WebSocket APIs (v2).
Excluded edge services:
- CloudFront distributions are excluded from the PQC readiness scope because TLS 1.3 with hybrid post-quantum key exchange is automatically enabled across existing CloudFront TLS security policies for viewer-to-edge connections. No customer action is required for inbound (viewer-facing) PQC on CloudFront.
Recommended approach for Classic load balancer:
- For Classic Load Balancers, AWS recommends migrating to ALB or NLB. Classic Load Balancers don’t support TLS 1.3 or PQC key exchange and can’t be made PQ-ready.

How the solution works

AWS Config enables continuous monitoring and evaluation. Conformance packs enable organization-wide deployment. AWS Lambda is a serverless compute service that runs code to perform security policy evaluation based on the AWS Config rules. AWS Serverless Application Model (AWS SAM) is an open source framework used for deploying the AWS Lambda functions.

Figure 1: PQC readiness solution architecture

The PQC Readiness Scanner conformance pack implements four custom AWS Config rules powered by two Lambda functions:

Rule	What it checks	Non-compliant result
ELB PQ-ready	Load balancer listeners use security policies that support TLS 1.3 with PQC key exchange algorithms	Policy doesn’t include PQC support, the resource is marked with a recommended upgrade policy
ELB legacy TLS	Load balancer listeners allow TLS 1.0 or 1.1 connections	Legacy protocols are configured, the resource is flagged.
API Gateway PQ-ready	API Gateway endpoints use security policies that support TLS 1.3 with PQC key exchange algorithms	Policy doesn’t include PQC support, the resource is marked with a recommended upgrade policy
API Gateway legacy TLS	API Gateway endpoints allow TLS 1.0 or 1.1	Legacy protocols are configured, the resource is flagged.

Prerequisites

Before deploying the solution, you need:

AWS Command Line Interface (AWS CLI) configured with appropriate permissions
aws configure aws sts get-caller-identity # Verify
Python 3.12 installed. The Lambda runtime requires this version.
python3 --version # Should show 3.12.x

AWS SAM CLI installed (Installation Guide)

pip install aws-sam-cli

# Verify
sam --version

AWS Config enabled in your target AWS Region.
- Configure it to record (This step is not needed if your accounts are recording all resources by default)
  - AWS::ElasticLoadBalancingV2::LoadBalancer
  - AWS::ApiGateway::RestApi
  - AWS::ApiGatewayV2::Api resource types.
- Enable via AWS Config Console → Recorder → Recording Strategy → Select specific resource types (Follow the steps in manual setup for AWS Config recording strategy for specific resource types)

Steps to deploy the PQC Readiness Scanner

Deploy the PQC Readiness Config Scanner in three phases. Complete deployment commands and configuration details are available in the GitHub repository. The Lambda functions must be deployed first because the conformance pack references their ARNs as parameters. See the GitHub repository for details.

Deploy to single account:

Clone and Build:

git clone https://github.com/aws-samples/sample-PQC-Readiness-using-AWS-Config.git

cd sample-PQC-Readiness-using-AWS-Config/installation

sam build

Deploy to One or More Regions:

# Make script executable (first time only)
chmod +x deploy-per-regions.sh

# Deploy to a single region
./deploy-per-regions.sh us-east-1

# Deploy to multiple regions
./deploy-per-regions.sh us-east-1 us-west-2 eu-west-1

Figure 2: Type y and continue if you have enabled AWS Config recording for these resources or its by default recording all resources.

The script automatically:
- Deploys Lambda functions via SAM
- Deploys conformance pack (creates Config rules)
- Verifies deployment success
- Provides clear status messages

The deployment creates two Lambda functions that perform PQ-ready and legacy TLS checks. It provisions IAM roles with least-privilege permissions for ELB, ALB, NLB, and API Gateway describe operations. Lambda permissions allow AWS Config to invoke the functions.

Example screen-print of how a successful deployment looks like.

Figure 3: Example screen-print of what a successful deployment looks like.

Multi-account deployment (Organizations):

For organization-wide deployment across multiple AWS accounts, use CloudFormation StackSets to deploy Lambda functions to each account.

Important Constraint: AWS Config CUSTOM_LAMBDA rules require the Lambda function to exist in the same account as the Config rule. You cannot use a centralized Lambda in one account to evaluate resources in other accounts.

Prerequisite: Shared S3 Bucket

Before packaging, create an S3 bucket accessible by each target account in your organization. This bucket will host the Lambda deployment artifacts that CloudFormation StackSets pulls into each member account.

# Create the shared S3 bucket (run from management/central account)
aws s3 mb s3://<your-org-shared-bucket> --region us-east-1

Grant read access to the target accounts using one of the following options:

aws s3api put-bucket-policy \
  --bucket <your-org-shared-bucket> \
  --policy '{
    "Statement": [
      {
        "Sid": "BucketOwnerFullAccess",
        "Effect": "Allow",
        "Principal": {
          "AWS": "arn:aws:iam::<bucket-owner-account-id>:root"
        },
        "Action": "s3:*",
        "Resource": [
          "arn:aws:s3:::<your-org-shared-bucket>",
          "arn:aws:s3:::<your-org-shared-bucket>/*"
        ]
      },
      {
        "Sid": "CrossAccountReadAccess",
        "Effect": "Allow",
        "Principal": {
          "AWS": [
            "arn:aws:iam::<account-id-1>:root",
            "arn:aws:iam::<account-id-2>:root"
          ]
        },
        "Action": ["s3:GetObject", "s3:ListBucket"],
        "Resource": [
          "arn:aws:s3:::<your-org-shared-bucket>",
          "arn:aws:s3:::<your-org-shared-bucket>/*"
        ]
      }
    ]
  }'

Replace <account IDs> with the AWS account IDs where StackSets will deploy the Lambda functions.

Note: The bucket must be in the same region as the StackSet deployment regions. For multi-region deployments, create one bucket per region and run sam package separately for each.

Step 1: Build and Upload Lambda Packages to S3

Run the packaging script from the installation/ directory:

cd installation

# Make script executable (first time only)
chmod +x deploy-stacksets.sh

# Build, package, upload to S3, and generate resolved template
./deploy-stacksets.sh <your-org-shared-bucket>

This script automatically:

Builds Lambda functions using SAM
Creates ZIP packages
Uploads ZIPs to the shared S3 bucket
Generates packaged-template.yaml with S3 values baked in (no parameters needed at deploy time)

Figure 4: Sample script output of successful upload of the lambda packages to S3 bucket

Step 2: Deploy Lambda Functions via StackSets

Run the following from the management account (or delegated admin account):

# Create StackSet (--region sets the StackSet "home region" where it is managed)
aws cloudformation create-stack-set \
  --stack-set-name pqc-readiness-lambda-functions \
  --template-body file://packaged-template.yaml \
  --capabilities CAPABILITY_IAM \
  --permission-model SERVICE_MANAGED \
  --auto-deployment Enabled=true,RetainStacksOnAccountRemoval=false \
  --region us-east-1

# Deploy stack instances to member accounts
# --regions = target regions where Lambda functions are deployed in member accounts
# --region  = must match the StackSet home region above
aws cloudformation create-stack-instances \
  --stack-set-name pqc-readiness-lambda-functions \
  --deployment-targets OrganizationalUnitIds=ou-xxxx-xxxxxxxx \
  --regions us-east-1 \
  --region us-east-1

Important — StackSet home region vs deployment regions:

--region (on each CLI command) = the StackSet home region where the StackSet resource lives. Subsequent operations (describe, update, delete) must specify this same region.
--regions (on create-stack-instances) = the deployment target region(s) where stack instances are created in member accounts.
These are independent values. Specify --region explicitly to avoid accidental deployment to your CLI’s default region.

Note: SERVICE_MANAGED StackSets must be created from the management or delegated admin account. The management account itself is excluded from stack instance deployments — use deploy-per-regions.sh separately if you need the scanner in the management account.

Step 3: Deploy Organization Conformance Pack

aws configservice put-organization-conformance-pack \
  --organization-conformance-pack-name pqc-legacy-tls-compliance \
  --template-body file://conformance-packs/pqc-legacy-tls-conformance-pack.yaml

This creates Config rules in each member account that reference their local Lambda functions.

Migration guidance and prioritization

The three-tier system provides PQC migration priorities:

High priority – Tier 3 (not PQ-ready):

Target: Resources without PQC support. This includes endpoints not using PQ-ready security policies, endpoints that still allow TLS 1.0 or 1.1.
Action: Upgrade to a PQ-ready policy containing PQ in its name, such as those ending with -PQ-2025-09 (see Elastic Load Balancing security policies documentation for the full list).
Important: Before upgrading to a PQ-ready policy, audit your client TLS versions. PQ-ready policies require TLS 1.3 support; legacy clients that only support TLS 1.2 or earlier will fail to negotiate a connection. Start with a Tier 2 backward-compatible policy (which supports both TLS 1.2 and 1.3 with PQC), monitor connection logs for TLS negotiation failures, and only move to a Tier 1 TLS 1.3-only policy after confirming that your clients support TLS 1.3 with PQC key exchange.
Risk: Endpoints don’t support post-quantum cryptography for data in transit. Legacy TLS protocols are vulnerable to current cryptographic attacks.

Low priority – Tier 2 (PQ-ready, backward compatible):

Target: Resources using TLS 1.3 + PQ-ready policies that also support TLS 1.2 for backward compatibility.
Action: Consider TLS 1.3-only policies when client compatibility analysis confirms connecting clients support TLS 1.3.
Risk: Minimal. These resources already support PQ-TLS with TLS 1.3 connections. TLS 1.2 and earlier fallback maintains backward compatibility, which might indicate some clients aren’t negotiating in PQ-TLS. Remediation is to monitor logs, identify the volume of these connections and clients and plan migration for these clients to use TLS 1.3 with PQ-TLS.

No action – Tier 1 (PQ-ready, optimal):

Target: Resources using TLS 1.3 only with PQC key exchange: These resources meet the target state. No migration needed.

Viewing the results

In each member account, navigate to AWS Config Console in the deployed region.

Conformance Pack View

Go to AWS Config → Conformance packs and look for:

OrgConformsPack-pqc-legacy-tls-compliance-

Note: Organization conformance packs are prefixed with OrgConformsPack- and have a random suffix appended (e.g., OrgConformsPack-pqc-legacy-tls-compliance-gyv22je0).

Figure 5: PQC Conformance Pack Compliance Score is the percentage of the number of compliant rule-resource

Click the conformance pack to see an overall compliance summary across all 4 rules.

Individual Rules View

Go to AWS Config → Rules and find 4 rules with prefix pqc-:

pqc-elb-pqc-compliance-conformance-pack-
pqc-elb-legacy-tls-conformance-pack-
pqc-apigateway-pqc-compliance-conformance-pack-
pqc-apigateway-legacy-tls-conformance-pack-

Click any rule to view:

Compliant vs non-compliant resource counts
Detailed annotations for each resource
Resource ARNs and current security policy configurations

Figure 6: Visibility into Config rules status inside the conformance pack

Sample image of the config rule findings and annotation describing the migeration guidance based on 3-tier classification.

Figure 7: Sample image of the config rule findings and annotation describing the migration guidance based on 3-tier classification.

Conclusion

After deploying the PQC Readiness Scanner, you gain visibility into TLS posture across AWS edge services, which reduces manual configuration reviews. The tier system provides specific upgrade recommendations so teams can understand next steps without cryptographic expertise. The scanner automatically detects configuration changes to help new deployments maintain readiness standards. Built-in AWS Config reporting supports audit requirements and demonstrates measurable progress toward PQC readiness.

Deploy the PQC Readiness Scanner and review your results with PQC Readiness Scanner. Start migration with high priority Tier 3 resources and monitor progress across your accounts using AWS Config aggregators.

Additional resources

GitHub repository: PQC Readiness Config Scanner
AWS Config documentation: Custom Rules with Lambda
AWS SAM documentation: Serverless Application Model
PQC migration planning: NIST Post-Quantum Cryptography Standards
PQC migration planning: AWS post-quantum cryptography migration plan

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on AWS Config re:Post or contact AWS Support.

AWS Security Blog
Five ways to use Kiro and Amazon Q to strengthen your security posture 5 May 2026 at 17:00

Five ways to use Kiro and Amazon Q to strengthen your security posture

AWS Security Blog

By: Roger Nem

5 May 2026 at 17:00

A Monday morning security alert flags unauthorized access attempts, security group misconfigurations, and AWS Identity and Access Management (IAM) policy violations. Your team needs answers fast.

Security teams are using Kiro and Amazon Q Developer to handle repetitive tasks—scanning resources, drafting policies, and researching Common Vulnerabilities and Exposures (CVEs)—so engineers can focus on risk decisions and complex scenarios that require human judgment, resulting in faster threat response and more consistent security coverage.

This post shows you five ways to use Kiro and Amazon Q Developer to strengthen your AWS security posture based on the AWS Well-Architected Framework Security Pillar. Each technique builds on a common foundation described after the tool overview below.

About these tools

Amazon Web Services (AWS) gives customers choices when it comes to AI-assisted development and security automation. Whether you prefer Kiro’s agentic integrated development environment (IDE) experience or the deep integration of Amazon Q Developer into your existing AWS environment, both tools can help you implement the security practices described in this post. The right choice depends on your team’s workflow, and in many cases both tools are complementary and can be used together.

Kiro is an AI-powered, agentic, IDE designed by AWS for specification-driven development, combining natural language prompting with structured, intentional coding to generate, test, and deploy applications.

Amazon Q Developer is the generative AI assistant integrated into AWS development and cloud environments, designed to answer questions, generate code, troubleshoot issues, and automate operational tasks across AWS services.

For setup instructions and to learn more, see the Kiro documentation and Amazon Q Developer documentation.

1. Embed security best practices with persistent context

Providing AI assistants with the right context helps them produce more consistent and relevant results. Each of the five techniques in this post becomes significantly more powerful when your AI assistant already understands your organization’s security standards. Setting up persistent context first means every subsequent interaction builds on that foundation, and the results you get from triage, remediation, reviews, and policy development will better reflect your specific environment rather than generic best practices.

Without persistent context, you need to repeat the same security requirements in every prompt such as "enable encryption, use least privilege IAM settings, and enable logging," which leads to inconsistent results and missed controls. Amazon Q Developer IDE Plugin rules and Kiro steering files (CLI and IDE) solve exactly this problem: you can use them to codify your organization’s security standards so AI automatically builds secure infrastructure consistently, without requiring you to repeat requirements in every prompt. Both tools support this capability independently, so you can configure whichever fits your workflow, or use both together for coverage across your full development environment. The following steps show you how to get started with each.

For Amazon Q Developer:

Create directory: .amazonq/rules/ in your project root.
Create file: .amazonq/rules/security-standards.md.
Paste your organization’s security standards in natural language (see “Example security standards context file” below).

For Kiro (steering files):

In Kiro, persistent context documents are called steering files. They give the agent ongoing awareness of your architecture decisions, coding standards, and security requirements across every interaction and every session.

Create file: security-standards.md in your project root.
Reference it in prompts: Using security-standards.md as context, create....

Pro tip: You can use Kiro itself to help you create steering files. Describe your security requirements in natural language and ask Kiro to generate a structured steering file for your review before saving and activating it. This means your AI assistant can help you build the very context it will later use, making the setup process faster and more thorough.

Example security standards context file:

# AWS Security Standards

## Identity and Access Management
- All IAM roles must use least privilege principles
- Require MFA for console access
- Enable IAM Access Analyzer for all accounts
- Rotate access keys every 90 days
- Use IAM roles for EC2 instances, never embed access keys

## Data Protection
- Enable encryption at rest for all storage services (S3, EBS, RDS)
- Use AWS KMS customer-managed keys for sensitive data
- Enable encryption in transit with TLS 1.2 minimum
- Implement S3 bucket policies denying unencrypted uploads
- Enable versioning and MFA delete for critical S3 buckets

## Infrastructure Protection
- Security groups must follow least privilege (no 0.0.0.0/0 on sensitive ports)
- Deploy resources in private subnets when possible
- Enable VPC Flow Logs for network monitoring
- Use AWS WAF for public-facing applications
- Implement Network ACLs as additional defense layer

## Detective Controls
- Enable CloudTrail in all regions with log file validation
- Configure CloudWatch alarms for security events
- Enable GuardDuty for threat detection
- Set up AWS Config rules for compliance monitoring
- Implement centralized logging with retention policies

## Incident Response
- Create SNS topics for security alerts
- Configure automated responses with AWS Lambda
- Maintain runbooks for common security incidents
- Enable AWS Systems Manager for secure instance access
- Implement automated backup and recovery procedure

What this unlocks:

Without persistent context, a prompt like Create a Lambda function to process customer data could produce a basic function with no encryption, logging, or IAM configuration. AI output is non-deterministic, meaning that without guidance it might or might not include those controls. Steering files and rules documents minimize those variables by providing stronger guidance as part of every prompt and inference input.

With your security standards embedded as in the example above, however, the same prompt generates a function with KMS-encrypted environment variables, a CloudWatch log group with 90-day retention, least-privilege IAM, VPC placement in private subnets, a dead-letter queue, and AWS X-Ray tracing—all automatically.

Where it works:

This persistent context approach applies across both tools and all infrastructure generation workflows:

Amazon Q Developer IDE Plugin: Rules in .amazonq/rules/ apply automatically to every code generation and review interaction.
Kiro: Steering files provide the agent with continuous architectural and security awareness across sessions and projects.

The shift-left impact:

This approach isn’t a replacement for your existing continuous integration and delivery (CI/CD) security automation. It’s a powerful complement to it, and that distinction matters. By embedding security standards directly into the development workflow, you shift security validation further left than pipeline checks can reach. Developers across your organization, not just security specialists, can generate infrastructure that meets your security standards from the first line of code. This scales security expertise into non-security roles, empowers development teams to self-serve on compliance requirements, and reduces the volume of findings that ever reach your automated pipeline checks.

The result is security functioning as an enabler of faster development rather than a gate that slows it down, and security engineers spending their time on policy design and complex risk decisions rather than remediating avoidable misconfigurations.

Measurable impact:

Track these metrics to quantify the value of persistent context:

Security findings during code review: Establish a 30–60 day baseline before enabling context files, then compare
Time from development to deployment: Track average cycle time before and after
Remediation cost: Research consistently shows defects fixed in development cost significantly less than those fixed in production. Track your own ratio for 60 days
Standards consistency: Audit a random sample of infrastructure pull requests for compliance with your top 10 policies

Implementation recommendation: Start by codifying your top 10 most frequently violated security policies as context. Measure the reduction in these specific findings over 30–60 days to quantify the impact on your team.

2. Accelerate security finding triage and investigation

AWS Security Hub consolidates findings from services such as Amazon GuardDuty, AWS Config, Amazon Inspector, and third-party security tools into a single dashboard, providing centralized security finding visibility and built-in triage capabilities across your AWS environment. AWS Security Hub Extended will bring even more capabilities into this mix, giving customers expanded control and additional opportunities to leverage the AI-assisted workflows described in this post at greater scale and with deeper integration across your security toolchain.

Kiro can complement Security Hub by helping you correlate findings across accounts, understand CVE context, and develop remediation approaches, including:

Query findings using natural language across multiple AWS accounts and AWS Regions
Understand specific CVEs and their potential impact on your infrastructure
Generate investigation queries for AWS CloudTrail and Amazon Virtual Private Cloud (Amazon VPC) Flow Logs
Correlate security events across different time periods and services
Access the latest AWS security documentation and best practices

How it works – Model Context Protocols:

To enable these capabilities, Kiro uses Model Context Protocols (MCPs)—a standardized way for AI assistants to securely connect with external tools, services, and data sources, enabling them to take actions, retrieve real-time information, and interact with APIs beyond their built-in capabilities.

Open source MCP servers for AWS are a suite of specialized MCP servers that enable Kiro to interact with AWS security services, providing real-time visibility into your security posture. To get started, configure security-focused MCP servers in your Kiro settings file (as shown in the following example). For full instructions on configuring MCP servers in Kiro, see the Kiro MCP documentation.

Note on authentication: Before querying Security Hub, verify you have configured valid AWS credentials for the target account. Set the AWS_PROFILE value to a named profile in your ~/.aws/credentials file that has the appropriate permissions, or configure credentials using the AWS Command Line Interface (AWS CLI) (aws configure). Without valid credentials for the target account, Kiro will not be able to retrieve findings.

{
    "mcpServers": {
        "awslabs.aws-api-mcp-server": {
            "command": "uvx",
            "args": ["awslabs.aws-api-mcp-server@latest"],
            "env": {
                "FASTMCP_LOG_LEVEL": "ERROR",
                "AWS_PROFILE": "<PROFILE>",
                "AWS_REGION": "us-east-1"
            },
            "timeout": 120000,
            "disabled": false
        },
        "awslabs.cloudtrail-mcp-server": {
            "command": "uvx",
            "args": ["awslabs.cloudtrail-mcp-server@latest"],
            "env": {
                "FASTMCP_LOG_LEVEL": "ERROR",
                "AWS_PROFILE": "<PROFILE>"
            },
            "disabled": false
        },
        "awslabs.iam-mcp-server": {
            "command": "uvx",
            "args": ["awslabs.iam-mcp-server@latest"],
            "env": {
                "FASTMCP_LOG_LEVEL": "ERROR",
                "AWS_PROFILE": "<PROFILE>"
            },
            "disabled": false
        }
    }
}

Configure the following security-focused MCP servers to get started:

AWS API MCP Server: Interact with Security Hub, GuardDuty, IAM Access Analyzer, and AWS Config through AWS CLI commands
CloudTrail MCP Server: Audit trail analysis and compliance monitoring
AWS IAM MCP Server: Manage IAM users, roles, policies, and permissions
AWS Knowledge MCP Server: Access up-to-date AWS security documentation and best practices

For a full list of available MCP servers, see the awslabs/mcp GitHub repository.

Example: querying Security Hub with Kiro:

Open your terminal or Kiro IDE.
Enter the following command:

"Show me all critical security findings from Security Hub in my production account ID <ACCOUNT ID>"

Example output:

> Found 4 critical security findings in account 123456789012:

## Configuration Issues (2)

1. SSM.7 - SSM documents block public sharing disabled
- Status: FAILED (since July 17, 2025)
- Impact: Systems Manager documents can be publicly shared
- Remediation: Enable block public sharing setting for SSM documents
- [Documentation](https://docs.aws.amazon.com/console/securityhub/SSM.7/remediation)

2. Config.1 - AWS Config not properly configured
- Status: FAILED (since March 4, 2025)
- Issue: Config isn't recording required IAM resource types (User, Role, Group, Policy)
- Standards affected: CIS, NIST 800-53, PCI DSS, AWS Foundational Security
- Remediation: Enable recording for IAM resources in AWS Config
- [Documentation](https://docs.aws.amazon.com/console/securityhub/Config.1/remediation)

## Vulnerabilities (2)

3. CVE-2020-36242 in Lambda function: MyLambdaOne
- Package: cryptography version 3.2.1
- CVSS Score: 9.1 (Critical)
- Issue: Integer/buffer overflow in Fernet encryption class
- Fix: Upgrade to cryptography 3.3.2
- Function: SPC-1233HH5R-MyLambdaOne-lUh3ESH0MdXF

4. CVE-2020-36242 in Lambda function: AutoUpdateMyLambdaOne
- Package: cryptography version 3.2.1
- CVSS Score: 9.1 (Critical)
- Same vulnerability as above
- Function: SPC-1233HH5R-AutoUpdateMyLambdaOne-d9HIBfxThbFW

Real-world impact:
Security finding triage that previously required navigating multiple consoles, correlating logs manually, and researching CVE databases can be accelerated significantly. Teams that have integrated AI-assisted triage into their Security Hub workflows report reducing mean time to triage for critical findings from hours to minutes, enabling faster containment and more consistent coverage across accounts.

3. Accelerate remediation of security findings in your infrastructure as code

AI assistants can scan your infrastructure code and flag security issues with specific fix recommendations. However, implementing these changes requires careful review, testing, and validation before any changes reach production.

Important: AI-generated remediation suggestions must be reviewed by a qualified security engineer before implementation. Automated application of AI-generated changes without human validation can introduce unintended misconfigurations or service disruptions. Treat AI output as a starting point, not a finished product.

The workflow:
You can execute this workflow in either Kiro or Amazon Q Developer, depending on which tool fits your existing development environment:

Ask Kiro or Amazon Q Developer to scan your infrastructure files and identify security gaps.
Review AI-generated remediation suggestions with your security team.
Test changes in non-production environments.
Validate using AWS security services such as IAM Access Analyzer, AWS Config, and Security Hub.
Deploy to production with monitoring and rollback procedures in place.

Example prompt:

"Scan my infrastructure at /path/to/templates, identify all S3 buckets without encryption, enable AES-256 encryption, add bucket policies to deny unencrypted uploads, and provide the deployment command"

What happens:

The AI assistant analyzes your infrastructure files, whether written in AWS CloudFormation, Terraform , or AWS Cloud Development Kit (AWS CDK), and identifies resources that violate security best practices. It then implements controls such as encryption at rest using AWS Key Management Service (AWS KMS) or Amazon Simple Storage Service (Amazon S3)-managed keys, adds bucket policies enforcing encryption in transit, configures public access blocks, and generates the exact deployment command with a change preview so you can review what will be modified before anything is applied.

Based on the example security standards context file above, the following controls would be applied across all generated infrastructure: encryption at rest and in transit, least-privilege IAM policies, security group optimizations, VPC configurations, logging enablement, and backup and recovery settings.

Validation required:
AI-generated configurations deserve the same thoughtful review as other infrastructure code. Even a policy that looks correct on the surface might need tuning to match your organization’s least-privilege standards, or encryption settings might need adjusting to satisfy specific compliance requirements. Running those changes through a non-production environment and having a human confirm the results before anything reaches production are part of good infrastructure practices, whether the code was written by a person or generated by AI.

Real-world impact:

Identifying non-compliant resources across multiple accounts manually can take many hours and generating remediation templates for each resource can add significant time. Security teams that have adopted AI-assisted infrastructure scanning report spending less time on manual identification and template generation, and with AI assistance the same identification and drafting work can be completed in much less time. Customers report that a full remediation cycle that previously occupied their team for the better part of a day can be completed in under an hour when AI handles the scanning and template generation. It is worth noting that manual remediation time grows considerably at scale, as remediating dozens of non-compliant resources is not a linear exercise. Validation time in non-production environments remains essential regardless of how the remediation was generated, and should always be factored into your planning.

4. Perform in-depth security reviews

Amazon Q Developer and Kiro can analyze your infrastructure code and identify potential security issues across multiple categories aligned with the AWS Well-Architected Framework Security Pillar.

Using Amazon Q Developer:

Open your infrastructure file in your IDE.
Select the code you want to review.
Open the context menu and choose Send to Amazon Q, then choose Optimize.
Select Focus on security best practices.

Using Kiro:

Open your infrastructure file in Kiro.
Enter a natural language prompt such as: Perform a comprehensive security review of this CloudFormation template and identify all deviations from our standards.
Kiro will automatically apply your steering files as additional context when generating its response.
Review the findings and iterate with follow-up prompts.

Security categories evaluated: For the complete, up-to-date list of security categories and controls, see the AWS Well-Architected Framework Security Pillar documentation. Current categories include but are not limited to:

Identity and access management: Overly permissive IAM policies, missing multi-factor authentication (MFA) requirements, unused credentials and access keys, cross-account access risks
Detective controls: CloudTrail logging configuration, Amazon CloudWatch alarm coverage, GuardDuty enablement status, and AWS Config rule implementation
Infrastructure protection: Security group misconfigurations, public subnet exposure, missing AWS WAF rules, unencrypted network traffic
Data protection: Storage encryption status, KMS key rotation policies, backup configurations, S3 bucket access controls
Incident response: Amazon Simple Notification Service (Amazon SNS) alerting setup, log retention policies, automated response mechanisms

Example output:

Security Recommendations:
- Enable S3 bucket encryption with KMS: Critical
- Implement least privilege IAM policies: High
- Enable GuardDuty threat detection: High
- Configure VPC Flow Logs: Medium
- Add WAF rules for API Gateway: Medium
- Enable CloudTrail in all regions: Critical
- Implement automated backup policies: High

Total security improvements: 23 findings across 5 Well-Architected pillars

Keeping your configuration files current:

A security architect review remains valuable for keeping your steering files and rules documents complete and current. The goal is an AI assistant that already understands your environment, not one that needs correcting after every interaction. Treat your configuration files as living documents and update them when your security standards evolve, when new services are adopted, or when post-incident reviews reveal gaps. As this post notes, project rules reduce architectural drift and help maintain consistency as AI agents operate more autonomously.

Real-world impact:

Security reviews that previously required a security engineer to manually inspect infrastructure templates line by line can be completed in significantly less time with AI assistance. Teams using AI-assisted security reviews as a pre-commit gate—before code reaches CI/CD pipeline checks—report catching a meaningful portion of security findings earlier in the development cycle where they are faster and less costly to address. Integrating this review step into pull request workflows means security validation happens continuously rather than only at deployment gates.

5. Assist with service control policy development

You can use AWS Organizations Service Control Policies (SCPs) to apply preventive controls consistently across every account in your organization, enforcing security baselines without relying on individual account administrators. Kiro can generate initial SCP drafts from natural language security requirements, speeding up the drafting and iteration process considerably. Because SCPs are preventive controls that can’t be bypassed by administrators, misconfigurations can cause organization-wide service disruptions, making expert validation and staged testing essential before any SCP reaches production.

Step 1: Generate an SCP draft:

Describe your security requirements in natural language:

"Create an SCP with these security controls:
- Deny creation of S3 buckets without encryption
- Require MFA for IAM user console access
- Prevent public RDS snapshots
- Deny security group rules allowing 0.0.0.0/0 on sensitive ports
- Enforce encryption for all EBS volumes
- Require VPC Flow Logs on all VPCs
- Deny IAM policy creation without approval tags
- Restrict resource creation to approved regions only"

Kiro generates a complete SCP policy JSON with proper deny statements, condition keys for MFA and encryption enforcement, resource-level restrictions, and regional compliance requirements.

Step 2: Validate and lint the SCP:

Use Kiro or Amazon Q Developer to assist with policy linting and initial testing as a first layer of validation. IAM Policy Autopilot, available as a Kiro Power with one-click installation directly from the Kiro IDE, can analyze your application’s usage and generate necessary permissions based on the SDK calls it discovers. IAM Policy Autopilot also integrates as an MCP server with Kiro, Amazon Q Developer, and other MCP-compatible coding assistants, making it a natural part of your existing workflow rather than a separate tool.

"Review this SCP JSON for syntax errors, overly broad deny statements, and missing condition keys. Flag any statements that could unintentionally block legitimate operations."

The IAM Policy Simulator then adds another layer of validation on top of the AI-assisted linting, so you can test policy behavior, verify condition keys are correctly applied, and confirm that no legitimate operations are unintentionally blocked. IAM Policy Autopilot complements existing IAM tools such as IAM Access Analyzer by providing functional policies as a starting point, which you can then validate using IAM Access Analyzer policy validation or refine over time with unused access analysis. Together, these tools form a layered validation approach where each one strengthens the output of the previous step.

Step 3: Test in a sandbox environment:

Create a test organizational unit (OU) with non-production accounts and apply the SCP to the test OU. Attempt operations that should be blocked and confirm that no legitimate operations are unintentionally blocked. Use Kiro to pre-validate your infrastructure code against the proposed SCP before sandbox testing:

"Analyze my current infrastructure against this proposed SCP and identify resources that would be non-compliant"

This scan covers your infrastructure code files. For live account scanning across your organization, use the following AWS services:

AWS Config with the Config Aggregator and Conformance Packs for continuous compliance monitoring across your organization.
IAM Access Analyzer for automated reasoning-based analysis of external access, internal access, and unused permissions.
Account Assessment for AWS Organizations for bulk scanning of identity-based, resource-based, and service control policies across all accounts.
Security Hub for centralized aggregation of compliance findings and security scores across your entire organization.

Step 4: Security architect review:

Engage your security architects to identify potential risks and verify the policy aligns with your security framework. Check for conflicts with existing SCPs by reviewing all SCPs attached to parent OUs and the root in the AWS Organizations console. Use the IAM Policy Simulator to test interactions between policies and verify that emergency access procedures ( SEC03-BP03 Establish emergency access process – Security Pillar and SEC10-BP05 Pre-provision access – Security Pillar) remain functional before any production rollout.

Step 5: Staged rollout:

Deploy to development accounts first and monitor for policy violations and operational issues. Gradually expand to additional environments and maintain documented rollback procedures throughout the process.

Important: It’s strongly recommended not to deploy AI-generated SCPs directly to production without thorough expert review and staged testing. A misconfigured SCP can cause organization-wide service disruptions affecting every account in your organization.

Real-world impact:

SCP drafting that previously required security architects to write and iterate on complex JSON policy documents manually, often spanning multiple review cycles over several days, can be condensed when AI handles the initial drafting and linting. Your architects can then focus their time on policy design, edge case analysis, and organizational impact assessment rather than JSON syntax and structure.

Responsible implementation framework

Adopting AI-assisted security workflows is most effective when introduced gradually, with clear validation gates at each stage. The following two-phase approach gives your team time to build confidence, measure results, and establish the internal practices needed before expanding to production environments.

Phase 1: Development and testing (weeks 1–4): Start by testing AI-generated security controls in isolated development accounts. Validate functionality, identify edge cases, and deploy to a dedicated testing environment with thorough security validation. Use IAM Access Analyzer, AWS Config, and Security Hub to verify that generated controls behave as expected. This phase is also the right time to build internal expertise across both your security team and your development teams, so that knowledge of what works and what requires human review is shared broadly from the start.
Phase 2: Staging and production (week 5 and later): Apply the validated controls to a staging environment that mirrors production. Conduct penetration testing where appropriate and validate that monitoring and alerting function correctly before expanding further. Gradually roll out to production accounts with continuous monitoring in place. Maintain rollback procedures throughout and establish feedback loops so that lessons learned in production flow back into your steering files, rules documents, and validation processes over time.

Key takeaways

What distinguishes the approach in this post from general guidance on AI coding assistants is the specificity of the security integration. There’s no shortage of content about how AI assistants accelerate development. What this post focuses on is how to configure both Kiro and Amazon Q Developer to perform security-specific tasks: triaging findings from Security Hub, remediating infrastructure code vulnerabilities against your organization’s defined standards, conducting Well-Architected security reviews, drafting and validating SCPs, and generating secure-by-default infrastructure through persistent context that reflects your environment rather than generic defaults.

Kiro is an agentic IDE that helps you go from prototype to production with spec-driven development, and its steering files give the agent persistent awareness of your security standards across every session. Amazon Q Developer complements this by providing deep integration into your existing AWS environment and IDE workflows. Together, these tools extend your security team’s reach into every stage of the development lifecycle, scale security expertise into development teams, and reduce the gap between when vulnerabilities are introduced and when they are caught. As the AWS Well-Architected Framework Security Pillar establishes, embedding security early and consistently across the development process is foundational to a strong security posture.

These five techniques aren’t about replacing your security controls. They’re about making security a natural part of how your teams build on AWS, regardless of whether they’re security specialists or application developers. In addition to the five techniques covered in this post, the following AWS capabilities complement this approach and are worth exploring for a more complete picture:

Amazon Inspector is a vulnerability management service that continually scans AWS workloads for software vulnerabilities, code vulnerabilities, and unintended network exposure. It automatically discovers and scans Amazon EC2 instances, container images in Amazon ECR, AWS Lambda functions, and first-party code repositories. Amazon Inspector integrates directly into CI/CD pipelines through plugins for Jenkins, TeamCity, GitHub Actions, and Amazon CodeCatalyst, which teams can use to catch vulnerabilities before deployment. Its code security capabilities include Static Application Security Testing (SAST), Software Composition Analysis (SCA), and infrastructure as code (IaC) scanning, with native integration to GitHub and GitLab. All findings are surfaced directly in Security Hub for centralized visibility and response across your organization.
Amazon Q Developer security scanning provides real-time security issue detection in the IDE, including SAST scanning for security vulnerabilities, secrets detection, IaC security evaluation, and software composition analysis for third-party dependencies. These capabilities are available across JetBrains, Visual Studio Code, and Visual Studio.
Kiro Powers are curated and pre-packaged MCP servers, steering files, and hooks validated by Kiro partners to accelerate specialized development and deployment use cases. Security-relevant Kiro Powers include the IAM Policy Autopilot Kiro Power for baseline IAM policy generation and the real-time coding security validation MCP server pattern for Kiro.
AWS Security Agent is a frontier AI agent that proactively secures your applications throughout the development lifecycle. Security teams define organizational security requirements once in the AWS Security Agent console, such as approved encryption libraries, authentication frameworks, and logging standards, and AWS Security Agent then automatically validates these requirements throughout development by evaluating architectural documents and code against your defined standards. It provides three core capabilities: design security review for architecture documents, code security review that automatically analyzes pull requests against your defined standards across connected repositories, and on-demand penetration testing that discovers, validates, and reports vulnerabilities through sophisticated multi-step attack scenarios customized for each application. When vulnerabilities are found, AWS Security Agent creates pull requests with ready-to-implement fixes directly in your code repository. Customers report that AWS Security Agent compresses penetration testing timelines from weeks to hours, transforming penetration testing from a periodic bottleneck into an on-demand capability that reduces risk exposure and scales security reviews to match development velocity.
AWS Security Hub automated response and remediation provides pre-built playbooks for common findings using AWS Systems Manager Automation, enabling your team to act on findings faster and more consistently.

Getting started

If you’re new to AI-assisted security workflows, the following week-by-week approach gives your team a practical path forward without overextending before the foundation is in place.

Weeks 1 and 2: Set up your persistent context files with your top 10 security policies as described in the foundational setup section above. Configure MCP servers in Kiro for Security Hub and CloudTrail access and verify that credentials are correctly configured for your target accounts.
Weeks 3 and 4: Run your first AI-assisted security review on a non-production infrastructure template. Compare the findings against your last manual review to establish a baseline for measuring impact over time.
Weeks 5 and 6: pilot AI-assisted SCP drafting for one new preventive control. Run the full validation workflow including AI-assisted linting, IAM Policy Autopilot, and the IAM Policy Simulator before any production application.
From that point forward: Measure the metrics outlined in the foundational setup section, update your steering files and rules documents as your standards evolve, and share findings across your security team, development teams, and platform engineering teams. The knowledge of what works and what requires human judgment is valuable to everyone who touches infrastructure in your organization.

Conclusion

Kiro and Amazon Q Developer give security teams practical tools to accelerate threat response and maintain consistent security coverage by handling the tasks that consume the most time with the least strategic value: scanning for known misconfigurations, drafting policy JSON, researching CVEs, and generating secure infrastructure. These AI assistants are most effective when paired with security engineers, as they accelerate assessments and code generation while human review, policy design, and risk judgment remain essential throughout.

By implementing the five techniques outlined in this post, starting with embedding security best practices through persistent context and then applying that foundation to Security Hub finding triage, infrastructure code remediation, in-depth Well-Architected security reviews, and SCP development, your team can strengthen your AWS security posture while maintaining the standards your organization requires.

AWS services such as Security Hub, IAM Access Analyzer, AWS Config, and CloudTrail provide the foundation for these AI-assisted workflows, enabling centralized visibility and automated validation of security controls across your environment. Emergency access procedures should be established and validated before deploying any preventive controls such as SCPs, following the break-glass guidance in the AWS Well-Architected Security Pillar and the AWS Prescriptive Guidance for break-glass access.

Start small with non-production environments, establish clear validation processes, measure results, and gradually expand your use of AI assistants as your team builds expertise and confidence. The result is faster threat response, more consistent security coverage, and security engineers focused on complex decisions rather than repetitive tasks.

Additional resources

AWS Well-Architected Framework Security Pillar – Foundational security best practices across the five pillars of the Well-Architected Framework
AWS Security Hub Best Practices – Vulnerability management guidance
AWS Security Incident Response ITSM Integrations – Open source integrations for Jira and ServiceNow with AWS Security Incident Response, with guidance on using Kiro and Amazon Q Developer for rapid customization
AWS Security Blog – Latest security guidance, customer stories, and announcements from the AWS security team

If you have feedback about this post, submit comments in the Comments section below

AWS Security Blog
Access control with IAM Identity Center session tags 28 April 2026 at 18:33

Access control with IAM Identity Center session tags

AWS Security Blog

By: Rashmi Iyer

28 April 2026 at 18:33

As organizations expand their Amazon Web Services (AWS) footprint, managing secure, scalable, and cost-efficient access across multiple accounts becomes increasingly important. AWS IAM Identity Center offers a centralized, unified solution for managing workforce access to AWS accounts. It simplifies authentication, enhances security, and provides a seamless user sign-in experience to AWS services across diverse environments.

By combining IAM Identity Center permission sets with session tags, organizations can unlock powerful capabilities for fine-grained access control and resource optimization. You can use session tags to pass dynamic attributes from your external identity provider into AWS, enabling more context-aware permissions and better cost visibility. This integration makes it possible to use advanced AWS features such as AWS Glue usage profiles and AWS Systems Manager Session Manager run as to enforce fine-grained access control, so that administrators can dynamically map permissions and runtime configurations based on user attributes passed during federated access.

In this post, I demonstrate how session tags derived from directory group attributes in Microsoft Entra ID can deliver functionality equivalent to AWS Identity and Access Management (IAM) role tags. Using role tags, you can implement attribute-based access control (ABAC) using IAM Identity Center, while maintaining centralized and efficient access management. To demonstrate this, you can configure an AWS Glue usage profile, as described in Introducing AWS Glue usage profiles for flexible cost control, where session tags can be passed through Identity Center and an external identity provider like Microsoft Entra ID. This approach is extensible to other AWS services such as AWS Systems Manager Session Manager (run as) and can also be used with other identity providers.

User authentication and IAM Identity Center Federation flow

The following figure shows the architecture and workflow of the solution.

Figure 1 – User authentication and federation flow between Microsoft Entra and AWS

The user authentication and federation flow includes the following steps:

User accesses application using a browser.
The enterprise application (configured in Azure) initiates authentication.
Microsoft Entra ID handles sign-in.
Users and groups are managed in Entra ID.
A SAML trust is established between Entra ID and IAM Identity Center.
SCIM provisioning syncs users and groups from Entra ID to AWS.
Synced users and groups appear in Identity Center.
Session tags are passed during SAML authentication.
- Entra ID can send user attributes (department, role, cost center, project ID, and so on) as SAML attributes.
- Identity Center consumes these as session tags, which are used for fine-grained access control and attribute-based access control inside AWS.
Admins define permission sets for users and groups in Identity Center.
Users get federated access to AWS using their Entra ID credentials.
Users sign in through AWS Management Console or AWS Command Line Interface (AWS CLI) using those permissions.
Access is granted to specific AWS accounts under AWS Organizations.

Prerequisites

To follow the steps in this post, you need the following prerequisites:

An organization instance of IAM Identity Center enabled.
A Microsoft Entra ID tenant. For more information, see Quickstart: Create a new tenant in Microsoft Entra ID.
Access to an external identity provider such as Microsoft Entra ID to federate users into AWS. You can enable federated access between Microsoft Entra ID and IAM Identity Center by completing the steps in Configure SAML and SCIM with Microsoft Entra ID and IAM Identity Center. They include configuring SAML and SCIM integration between the two systems, testing the SAML connection to help ensure authentication is functioning correctly, and enabling SCIM synchronization to automate user and group provisioning.

Solution implementation

With the prerequisites in place, you’re ready to configure access control through IAM Identity center tags by using the following steps.

Create an AWS Glue usage profile as described in Introducing AWS Glue usage profiles for flexible cost control in Create an AWS Glue usage profile. For the purposes of this post, create a profile named developer.
1. On the AWS Management Console for AWS Glue, choose Cost management in the navigation pane.
2. Choose Create usage profile.
3. For Usage profile name, enter developer.
4. Under Customize configurations for jobs, for Number of workers, for Default, enter 20.
5. For Default worker type, select G.1X.
6. For Allowed worker types, select G.1X, G.2X, G.4X, and G.8X.
7. For Customize configurations for sessions, configure the same values.
8. Choose Create usage profile.
Figure 2 – Glue usage profile creation on the console
Create a custom permission set instead of using predefined ones. Attach the following AWS Managed Policies to the custom permission set:
- AWSGlueConsoleFullAccess
- IAMReadOnlyAccess
Note: For fine-grained access control, you can create custom permission sets by combining AWS managed, customer managed, and inline policies in IAM. In this post, you use AWS managed policies with intentionally broad permissions for simplicity. In production, always follow the principles of least privilege and scope permissions appropriately.

By default, when you create a permission set, the permission set isn’t provisioned (used in any AWS accounts). To provision a permission set in an AWS account, you must assign IAM Identity Center access to users or groups in the account and then apply the permission set to those users and groups. For more information, see Assign user or group access to AWS accounts.
Configure user attributes in Microsoft Entra ID for access control in IAM Identity Center as described in Step 5 of Configure SAML and SCIM with Microsoft Entra ID and IAM Identity Center to set up ABAC. Add claim conditions for attribute mapping based on Entra ID group membership. Assign the developer value for users in a corresponding group. This enables logic such as Users in this group receive this profile or All users receive this profile. When using an AWS Glue profile and when making API calls to create AWS Glue resources, admins need to tag the user or role with glue:UsageProfile as the key and the profile name as the value.
Next, sign in to the enterprise application that you created in the previous step, which has SCIM and SAML connections set up to IAM Identity Center:
1. Sign in to Azure.
2. Choose Enterprise applications.
3. Select the application that you created
  
  Figure 3 – An enterprise application created in Microsoft Entra ID
When you’re signed in to your application, select Manage and then Single sign-on in the navigation pane, then select Attributes & Claims.

Figure 4 – Attributes & Claims section in Microsoft Entra ID
Configure the key value pair that will used as session tags by selecting Add new claim.

Figure 5 – Configuring attributes by adding a new claim
For Name, enter AccessControl:<AttributeName>. Replace <AttributeName> with the name of the attribute you are expecting in IAM Identity Center. For this example, use AccessControl:glue:UsageProfile.
In Claim conditions set the following:
- User type, select Members
- Source, select Attribute.
- Value, enter developer (without quotation marks).
Figure 6 – Attribute claim addition in Microsoft Entra using group membership

It’s important to note that the tags are being assigned based on group membership in Microsoft Entra ID. This approach lets you manage access and configuration dynamically without needing to set tags individually for each user. By assigning the tag to a Microsoft Entra ID group, anyone signing in to IAM Identity Center and who is in that group will automatically have the tag value applied to their session.

Test the solution

Now that the required configuration is complete, test the setup using the developer usage profile created as part of the Solution implementation section. Sign in as your user through Microsoft Entra ID using https://myapps.microsoft.com/ and verify the job creation using the following steps mentioned.

To verify successful job creation:

Open the AWS Glue console using the developer usage profile.
In the navigation pane, choose ETL jobs.
Select Script editor, then choose Create script.
Create a new job using the values you want to validate.

The green banner at the top of the screen should say Successfully updated job.

Figure 7 – Successful AWS Glue job creation with configured parameters for the developer usage profile

Validation using AWS CloudTrail

Examine the AssumeRoleWithSAML event using AWS Cloudtrail. Use the following steps to verify the sequence of events.

Navigate to the CloudTrail console.
Select Event history.
In the Lookup attributes dropdown, select Event name.
Set the event name to AssumeRoleWithSAML.
Open a relevant event and inspect the requestParameters section.
Confirm that the expected session tags appear under PrincipalTags.

Figure 8 – ABAC tags passed during the role assumption

Using session tags for other use cases

The concepts discussed in this post can be extended to configure AWS Systems Manager Session Manager Run As support for federated users using session tags. By default, Session Manager launches sessions using a system-generated ssm-user account. For Linux instances, you can optionally configure sessions to run as a specific OS-level user through Session Manager preferences. You can configure your identity provider to pass the user attribute (AccessControl: SSMSessionRunAs and name of an OS user account for the key value during federation and the session will be tagged using the attribute value.

Clean up

To avoid incurring future charges, delete any resources created during this walkthrough if they’re no longer needed:

Remove the IAM Identity Center instance and clean up the associated enterprise application in Microsoft Entra.
Delete the AWS Glue usage profile.
Remove any other AWS resources you provisioned for testing the solution.

Conclusion

In this post, you learned how to federate access to AWS using AWS IAM Identity Center and SAML 2.0 identity providers like Microsoft Entra ID, enabling a secure, scalable, and centralized approach to managing user access across multiple AWS accounts. By using permission sets, reserved IAM roles, and session tags, organizations can implement fine-grained ABAC without the complexity of managing individual IAM users or static roles.

As cloud environments become more complex, adopting modern identity federation and ABAC through IAM Identity Center helps security teams maintain control while providing users with seamless, context-aware access to the resources they need.

Resources

If you have feedback about this post, submit comments in the Comments section below.

AWS Security Blog
How to clone an AWS CloudHSM cluster across Regions 20 April 2026 at 17:15

How to clone an AWS CloudHSM cluster across Regions

AWS Security Blog

By: Desiree Brunner

20 April 2026 at 17:15

Important: As of January 1, 2025, Client SDK 3 tools (CMU and KMU) are no longer supported. This guide has been updated to use Client SDK 5 commands exclusively. Ensure you’re using the latest Client SDK 5 version (5.17 or later) for the most recent features and security improvements.

You can use AWS CloudHSM to generate, store, import, export, and manage your cryptographic keys. It also permits hash functions to compute message digests and hash-based message authentication codes (HMACs) and supports cryptographically signing data and verifying signatures. To help ensure redundancy of data and simplification of the disaster recovery process, AWS recommends you to clone your CloudHSM cluster into a different AWS Region. By doing this, you can synchronize keys, including non-exportable keys, across Regions. Non-exportable keys can only be synchronized to cloned clusters. Non-exportable keys are keys that can never leave the CloudHSM device in plaintext. They reside on the CloudHSM device and are encrypted for security purposes.

In this post, I show you how to set up one cluster in Region 1 and how to use the CopyBackupToRegion feature to clone the cluster and hardware security modules (HSMs) to a virtual private cloud (VPC) in Region 2.

Note: This post doesn’t include instructions on how to set up a cross-Region VPC to synchronize HSMs across the two cloned clusters. If you need to set up a cross-Region VPC, see Building a Scalable and Secure Multi-VPC AWS Network Infrastructure.

Solution overview

You clone a cluster to another Region in a two-step process:

Copy a backup to the destination Region
Create a new cluster from this backup

To complete this solution, you can use either the AWS Command Line Interface (AWS CLI) or the CloudHSM API. For this post, I show you how to use the AWS CLI to copy the cluster backup from Region 1 to Region 2 and then launch a new cluster from that copied backup.
Figure 1 illustrates the process described in this post.

Figure 1: Architecture diagram

Here’s how the process works:

CloudHSM creates a backup of the cluster and stores it in an Amazon Simple Storage Service (Amazon S3) bucket owned by the CloudHSM service.
You use the AWS CLI API command to copy the backup to another Region.
When the backup is completed, you use that backup to then create a new cluster and HSMs.

Note: Backups can’t be copied across partitions like the AWS GovCloud Regions, China Region and AWS European Sovereign Cloud.

As with all cluster backups, when you copy the backup to a new Region, it’s stored in an S3 bucket owned by a CloudHSM account. CloudHSM manages the security and storage of cluster backups for you. This means the backup in both Regions will also have the durability of Amazon S3, which has 99.999999999% durability. The backup in Region 2 will be encrypted and secured in the same way as your backup in Region 1. You can read more about the encryption process of your CloudHSM backups in AWS CloudHSM cluster backups.
Any HSMs created in this cloned cluster will have the same users and keys as the original cluster at the time the backup was taken. From this point on, you must manually keep the cloned clusters in sync. Specifically:

If you create users after creating your new cluster from the backup, you must create them on both clusters manually.
If you change the password for a user in one cluster, you must change the password on the cloned clusters to match.
If you create more keys in one cluster, you must sync them to at least one HSM in the cloned cluster. After you sync the key from cluster 1 to cluster 2, the CloudHSM automated cluster synchronization will take care of syncing the keys in the second cluster.

Prerequisites

Before starting, ensure you have the following in place:

VPC in Region 1 with at least 1 public subnet and 1 private subnet
VPC in Region 2 with at least 1 public subnet and 1 private subnet
Cross-Region VPC enabled between Region 1 and Region 2
AWS CLI installed
AWS Identity and Access Management (IAM) permissions for AWS CloudHSM APIs in both Regions
Client SDK 5 installed on your management instance (version 5.17 or later recommended)

Note: Syncing keys across clusters in more than one Region will only work if all clusters are created from the same backup. This is because synchronization requires the same secret key—called a masking key—to be present on the source and destination HSM. The masking key is specific to each cluster. It can’t be exported, and can’t be used for any purpose other than synchronizing keys across HSMs in a cluster.

Step 1: Create your first cluster in Region 1

The first step in cloning your CloudHSM cluster is to create the initial cluster—which will serve as the foundation for your cross-Region deployment—in your source Region.

Create the cluster

Replace <SUBNET_ID_1> with one of your private subnets. Make a note of the cluster ID to use later:
aws cloudhsmv2 create-cluster --hsm-type hsm2m.medium --subnet-ids <SUBNET_ID_1>

Launch the EC2 client

Launch an Amazon Elastic Compute Cloud (Amazon EC2) instance in your public subnet. See Step 1 of Get started with Amazon EC2 for detailed steps.

Create the first HSM

Replace <CLUSTER_ID> with the ID you recorded earlier and <AVAILABILITY_ZONE> with the Availability Zone matching your private subnet (for example, us-east-1a):
aws cloudhsmv2 create-hsm --cluster-id <CLUSTER_ID> --availability-zone <AVAILABILITY_ZONE>

Initialize the cluster

Before you initialize the cluster, create a self-signed certificate and use it to sign the cluster’s certificate signing request (CSR). Once you have the signed certificate, initialize the cluster:

aws cloudhsmv2 initialize-cluster \
    --cluster-id <CLUSTER_ID> \
    --signed-cert file://<CLUSTER_ID>_CustomerHsmCertificate.crt \
    --trust-anchor file://customerCA.crt

Important: Copy the certificate used to sign your cluster’s CSR to to maintain a secure connection.

After the command completes, the cluster transitions to the Initialized state. Copy the certificate used to sign your cluster’s CSR to /opt/cloudhsm/etc so that the CloudHSM client can verify the cluster’s identity when you configure it in the next step:

sudo cp _CustomerHsmCertificate.crt /opt/cloudhsm/etc/
sudo cp customerCA.crt /opt/cloudhsm/etc/

Install the CloudHSM Client SDK 5

Download and install the latest CloudHSM Client SDK 5 (version 5.17 or later):
For example, for Amazon Linux 2023:

wget https://s3.amazonaws.com/cloudhsmv2-software/CloudHsmClient/Amzn2023/cloudhsm-cli-latest.amzn2023.x86_64.rpm
sudo yum install -y ./cloudhsm-cli-latest.amzn2023.x86_64.rpm

Configure the client

Configure the CloudHSM client with your HSM’s elastic network interface (ENI IP) address:
configure-cli -a <HSM_IP>

Activate the cluster

To activate the cluster, run the CloudHSM CLI in interactive mode.

cloudhsm-cli interactive

You can run user list to see the admin user, which is not yet activated.

aws-cloudhsm > user list
{
  "error_code": 0,
  "data": {
    "users": [
      {
        "username": "admin",
        "role": "unactivated-admin",
        "locked": "false",
        "mfa": [],
        "cluster-coverage": "full"
      },
      {
        "username": "app_user",
        "role": "internal(APPLIANCE_USER)",
        "locked": "false",
        "mfa": [],
        "cluster-coverage": "full"
      }
    ]
  }
}

Use the cluster activate command to set the initial admin password.

aws-cloudhsm > cluster activate
Enter password:<NewPassword>
Confirm password:<NewPassword>
{
  "error_code": 0,
  "data": "Cluster activation successful"
}

When completed, sign out using the command quit, then sign back in with the new password, using the command login --username admin --role admin.

After doing this, you can create the first crypto user (CU). You create the user by running the command: user create --username <USERNAME> --role crypto-user. For more information, see HSM user types for CloudHSM CLI. Crypto users are permitted to create and share keys on the CloudHSM.

When completed, sign out using the command quit.

Step 2: Create keys in Region 1

Create a non-exportable AES-256 key:

aws-cloudhsm > key generate-symmetric aes \
    --label aes-example \
    --key-length-bytes 32 \
    --attributes extractable=false

Make note of the key reference returned in the output, because you’ll need it for synchronization later.

Step 3: Trigger a backup of your cluster

To trigger a backup for Region 2:

Add another HSM to your cluster in Region 1 (can be done using the AWS Management Console or AWS CLI)
The backup will contain:
- All users (crypto officers (COs), crypto users (CUs), and appliance users)
- All key material on the HSMs
- All configurations and policies

Note: The user portion is critical because keys can only be synced across clusters to the same user.

Record the backup ID to use later. You can find this in the CloudHSM console under Backups, or using the following command:

aws cloudhsmv2 describe-backups --cluster-id

To avoid unnecessary charges, you can delete the additional HSM after the backup is created.

Step 4: Copy your backup Between Regions

Before you can transfer the backup to your destination Region, you need to configure the appropriate IAM permissions to allow the copy operation.

IAM permissions

Ensure proper permissions are configured for your IAM role or user. You need CloudHSM administrator privileges. Here’s an example permissions policy:

{
   "Version": "2012-10-17",
   "Statement": {
      "Effect": "Allow",
      "Action": [
         "cloudhsm:*",
         "ec2:CreateNetworkInterface",
         "ec2:DescribeNetworkInterfaces",
         "ec2:DescribeNetworkInterfaceAttribute",
         "ec2:DetachNetworkInterface",
         "ec2:DeleteNetworkInterface",
         "ec2:CreateSecurityGroup",
         "ec2:AuthorizeSecurityGroupIngress",
         "ec2:AuthorizeSecurityGroupEgress",
         "ec2:RevokeSecurityGroupEgress",
         "ec2:DescribeSecurityGroups",
         "ec2:DeleteSecurityGroup",
         "ec2:CreateTags",
         "ec2:DescribeVpcs",
         "ec2:DescribeSubnets",
         "iam:CreateServiceLinkedRole"
      ],
      "Resource": "*"
   }
}

Copy the backup

To copy your backup from Region 1 to Region 2, you need:

The destination Region
The source cluster ID and backup ID (you can use either or both) found in the CloudHSM console

If you specify only the cluster ID, the most recent backup will be chosen. For a specific backup, use the backup ID.

aws cloudhsmv2 copy-backup-to-region \
    --destination-region <DESTINATION_REGION> \
    --backup-id <BACKUP_ID>

Example response:

{
    "DestinationBackup": {
        "SourceBackup": "backup-4kuraxsqetz",
        "SourceCluster": "cluster-kzlczlspnho",
        "CreateTimestamp": 1531742400,
        "SourceRegion": "us-east-1"
    }
}

After copying, you will see a new backup ID in your console. Use this to create your new cluster in Region 2:

aws cloudhsmv2 create-cluster \
    --hsm-type hsm2m.medium \
    --subnet-ids <SUBNET_ID_REGION_2> \
    --source-backup-id <BACKUP_ID_REGION_2> \

Certificate transfer

Copy the cluster certificate from the original cluster to the new Region:

Open two terminal sessions (one for each HSM)
Copy the certificate content from cluster 1
Create and paste into a new file in cluster 2

The certificate is required for encrypted connections between your client and HSM instances.

Security group configuration

Add the cloned cluster’s Security Group to your EC2 client instance:

Select the Security Group for your EC2 client in the EC2 console
Choose “Add rules”
Add a rule allowing traffic from the cluster’s Security Group ID on port 2225

Then retrieve the ENI IP address of the HSM in Region 2 using the following command, and make a note of the output—you will use it in the next step to configure cross-Region connectivity:

aws cloudhsmv2 describe-clusters \
    --filters clusterIds=<cluster_ID_region_2> \
    --region <region_2> \
    --query 'Clusters.Hsms.EniIp' \
    --output text

Step 5: Configure cross-Region connectivity

To enable the CloudHSM CLI to communicate with both clusters simultaneously, add the Region 2 cluster to your existing client configuration using the ENI IP address you retrieved in the previous step:

Step 6: Synchronize keys between clusters

To synchronize keys between your source and destination clusters, you first need to verify which users and keys exist before replicating them.

configure-cli add-cluster \
    --cluster-id <cluster_ID_region_2> \
    --endpoint <hsm_eni_ip_region_2> \
    --region <region_2>

The CloudHSM CLI will now communicate with both clusters simultaneously using the certificates already configured during the initial setup, enabling key synchronization using the masking key shared between cloned clusters.

List users and keys

First, verify users and list available keys:
# List all users
cloudhsm-cli user list

# List keys for specific user
cloudhsm-cli key list --username

Replicate keys

To replicate a key from Region 1 to Region 2:

cloudhsm-cli key replicate \
    --filter key-reference=<key_ref> \
    --source-cluster-id <source_cluster_ID> \
    --destination-cluster-id <destination_cluster_ID>

Verify the key replication by listing keys again:

cloudhsm-cli key list --username <username>

The output should show identical key references on both clusters. Repeat this process for any additional keys that you want to synchronize.

Points to remember

After cloning a cluster to a backup cluster, remember these important points:

Always manually update users across clusters after the initial backup
Use key replication for any keys created after the initial backup
Keep your Client SDK 5 tools updated for the latest features and security improvements
The January 1, 2025, end-of-support date for Client SDK 3 tools (CMU and KMU) means you should migrate to Client SDK 5 as soon as possible

Client SDK 5 supports ARM64 architecture on the following Linux distributions:

Amazon Linux 2023
Amazon Linux 2
Red Hat Enterprise Linux (RHEL) 8 (8.3+)
Red Hat Enterprise Linux (RHEL) 9 (9.2+)
Red Hat Enterprise Linux (RHEL) 10 (10.0+)
Ubuntu 22.04 LTS
Ubuntu 24.04 LTS
Debian 12
USE Linux Enterprise Server 15

Conclusion

You now have a fault-tolerant AWS CloudHSM environment with synchronized keys across Regions using the latest tools and best practices. By implementing this cross-Region cluster configuration, you gain improved disaster recovery capabilities, reduced risk of data loss, and enhanced business continuity for your cryptographic operations. This approach helps ensure that your critical cryptographic keys remain available even in the event of a Regional outage, providing the resilience that enterprise workloads demand.

If you have feedback about this post, submit comments in the Comments section below. For questions about this post, start a new thread on the AWS re:Post.

AWS Security Blog
A framework for securely collecting forensic artifacts into S3 buckets 8 April 2026 at 20:19

A framework for securely collecting forensic artifacts into S3 buckets

AWS Security Blog

By: Jason Garman

8 April 2026 at 20:19

When customers experience a security incident, they need to acquire forensic artifacts to identify root cause, extract indicators of compromise (IoCs), and validate remediation efforts. NIST 800-86, Guide to Integrating Forensic Techniques into Incident Response, defines digital forensics as a process comprised of four basic phases: collection, examination, analysis, and reporting. This blog post focuses on the first phase—collection—and provides best practices for implementing least privilege during the forensic evidence collection processes that collect evidence and store the artifacts in Amazon Simple Storage Service (Amazon S3) buckets. The architecture presented in this post can be used to collect forensic evidence from both Amazon Web Services (AWS) and non-AWS compute resources.

It’s important to consider the security of the forensic artifact collection process because it involves communicating with potentially compromised resources. The collection methodology itself should be designed to avoid adding additional risks to infrastructure or other forensic investigation processes. At the same time, the collection of forensic artifacts requires the use of specialized tools that are difficult to change or adapt to new security requirements.

This post outlines factors that you should consider when creating an evidence collection capability and introduces an architecture that implements the best practices for least privilege and integrating with (instead of changing or adapting) existing forensic tools that support uploading artifacts to S3 buckets by using AWS security credentials.

Solution architecture

The architecture presented in this post demonstrates the following AWS best practices:

Least privilege – Use AWS Identity and Access Management (IAM) policies to provide least privilege access to upload forensic artifacts to an S3 location dedicated to a specific forensic collection task. The locked down credentials cannot be used to view or modify any other forensic collections.
Time-limited credentials – Use AWS Security Token Service (AWS STS) to provide time limited credentials, reducing the potential for an unauthorized user to abuse credentials while they’re visible on the target machine during the artifact collection process.
Compatibility with third-party tools – Forensic tools are specialized and changing a forensic collection process to adapt to different collection methods might not be possible. To avoid the risk of needing to change tools, maximize compatibility with any third-party tools that support uploading to S3 buckets. The method introduced in this post to generate time-limited, scoped down credentials can be used with most third-party forensic tools that support uploading to S3 buckets.
Credential vending – Use time-limited tokens, which can be vended on demand through an automated process, eliminating the need for forensic investigators to use the AWS Management Console, understand least privilege, or have any access to the AWS control plane. Forensic investigators can focus on the process of collecting and analyzing evidence.
Process automation – Deploy the process as infrastructure as code (IaC) and automate it through AWS services, reducing the burden on security teams to manually perform runbook steps during an active security incident.

This post starts with an overview of the digital forensic process, provides best practices for using Amazon S3 to store forensic artifacts, details how you can create time-limited, least privilege tokens to provide secure access to upload forensic artifacts to S3 buckets, and introduces a sample architecture that automates the end-to-end process.

The digital forensic process

Organizations need to have practices and resources in place to support a digital forensic investigation environment before an incident occurs. AWS has published several resources, including Forensic investigation environment strategies in the AWS Cloud and AWS prescriptive guidance: Security Reference Architecture, Cyber forensics, to provide best practices for organizing your AWS accounts using AWS Organizations to support forensic clean-room environments. Creating segregated AWS accounts and resources for your security teams is critical to provide your incident responders a location to store and analyze any digital forensic evidence collected during an investigation.

After you’ve established a landing zone for performing digital forensics, you’re ready to collect and process digital forensic evidence. AWS supports the collection of digital forensics through extensive logging of control plane events in AWS CloudTrail, and metrics and application logs that can be stored in Amazon CloudWatch. In addition, AWS core compute services, such as Amazon Elastic Compute Cloud (Amazon EC2), support forensics operations through snapshots of the underlying Amazon Elastic Block Storage (Amazon EBS) volume. An example architecture to demonstrate how to automate the collection of EBS volume snapshots for forensic investigations can be found in How to automate forensic disk collection in AWS.

You might want to use the same AWS infrastructure to collect, examine, analyze, and report on forensic incidents that occur on other resources, such as corporate laptops. You can use existing forensic tooling to perform live response, collecting specific artifacts such as Windows NT File System (NTFS) Master File Table (MFT), logs from Linux machines, volatile memory images, or other artifacts that are specified as part of your organization’s incident response plan. These tools can be provided by third parties or built in-house, and many support uploading to S3 buckets using AWS security credentials.

Using Amazon S3 for forensic artifact collection

Amazon S3 provides the foundational requirements for collecting and storing forensic artifacts. Digital forensics requires highly available, durable, and secure storage of artifacts collected from potentially compromised systems. Amazon S3 is designed for 11 nines of durability and can be configured to provide protection against modification, deletion, and unauthorized access to sensitive forensic artifacts. You can also use S3 to store forensic artifacts of almost any size—from one byte to 5 TB—in an S3 object.

S3 buckets used to store forensic artifacts require custom configuration to provide additional security. You should configure the S3 bucket that you use to store forensic artifacts to enable the following security and governance features:

Encryption in transit. You can require the use of encryption in transit and specify acceptable TLS versions using the aws:SecureTransport and s3:TlsVersion condition keys on the S3 bucket policy.
Encryption at rest using a customer managed key. You can automatically encrypt all objects uploaded to the bucket using a specified customer managed key by specifying a default server-side encryption key in the bucket’s configuration. For this post, we encourage you to use a customer managed key rather than relying upon an AWS managed key, so you can control the associated key policy.
1. Encryption at rest provides an additional layer of protection, because only entities that have both the permission to read from the bucket and permission to use the AWS Key Management Service (AWS KMS) key for decryption can download the forensic artifact from the S3 bucket.
2. You need to adjust the example KMS policies in this post if the evidence collection S3 bucket uses the S3 Bucket Key feature.
Audit logs of all S3 data event activity. You can turn on CloudTrail data events for any S3 buckets that contain forensic artifacts to provide a comprehensive audit trail of S3 object-level API activity. This helps provide a chain of custody of any artifacts stored in your forensic buckets.
Fine-grained access control using IAM permissions. You can define the set of entities (both human and machine) that have access to the artifacts in the S3 bucket. This post includes how to create time-limited, least privilege access using IAM permissions for uploading files into an S3 bucket. The permissions are fine-grained enough to scope down access to specific object names or object prefixes in an S3 bucket. Additionally, access to read the artifacts can be controlled through IAM permissions and access to the encryption-at-rest KMS key.
Protections against data modification and deletion. S3 provides features, such as S3 object versioning, to provide assurances that data hasn’t been modified or removed after it’s been collected. This is an additional layer of protection beyond the fine-grained access permissions, so even if an authorized entity attempts to overwrite or delete an object in the S3 bucket, the previous version of the object is still available.
There are additional options that you can configure on the S3 bucket to protect your data against modification and deletion, including S3 Object Lock and multi-factor authentication (MFA) delete.

In addition to the preceding configuration, consider how to organize forensic artifacts in the S3 bucket. This post introduces a folder structure using S3 object prefixes to segregate each forensic artifact collection task into its own S3 object namespace. An example S3 namespace structure for an S3 bucket is shown in Figure 1.

Figure 1: S3 namespace structure for an S3 forensics artifact bucket using object prefixes

By separating each forensic collection task by its own prefix, you can use fine-grained IAM permissions to permit object uploads only into the active collection task. For example, scoped down credentials can be generated to only allow uploads into buckets with the CASE-0001 prefix using an IAM permission as shown in the following code example. Temporary security credentials can be generated using these limited permissions and the key is then used by the forensic acquisition tool to upload the artifacts into the S3 bucket.

{
	"Sid": "UploadToCase0001",
	"Effect": "Allow",
	"Action": [
		"s3:PutObject",
		"s3:AbortMultipartUpload"
	],
"	Resource": "arn:aws:s3:::mycompany-forensics-collection/CASE-0001/*"
}

Manually creating temporary IAM credentials for each forensic collection activity can be error-prone and time-consuming. Therefore, this post demonstrates how to use AWS tooling to automate the process of generating time-limited, scoped-down credentials.

Adapt existing forensic tools for AWS best security practices

Existing forensic tools typically use IAM access keys to perform S3 operations. Using a static IAM user secret access key isn’t a best practice. Even if the static key is associated with an IAM user that has been scoped down to only have access to the forensic collection S3 bucket as described previously, that means anyone with access to that key can potentially upload objects into that bucket. Therefore, the best practice is to create a time-limited temporary security credential unique to each collection activity, scoped down to only allow uploading files to a specific prefix in the target S3 bucket.

The examples in this post use the following resource names. Because these names will change based on your deployment, substitute your resource names in place of the names in the example code.

The evidence S3 bucket is named mycompany-forensics-collection
The forensics AWS account number is 112233445566. For the purposes of this example, all resources will live within this account.
The customer managed key used to encrypt the forensic artifacts at rest is ForensicsEvidenceKey
The IAM role that incident responders will assume when signing in to their AWS account is ForensicsUserRole
The IAM role that incident responders will use for generating S3 file upload temporary credentials is ForensicsUploadRole
The example uses the us-east-1 AWS Region

The following steps show you how to configure the IAM policies associated with the customer managed key ForensicsEvidenceKey and the IAM role ForensicsUploadRole.
Before you begin, create the evidence S3 bucket configured as described in Using S3 for artifact collection and a customer managed key to encrypt the forensic artifacts at rest. Configure the evidence S3 bucket to use the KMS key by opening the S3 bucket’s properties tab in the Amazon S3 console and setting the new KMS key as the default encryption key for the bucket.

Next, create an IAM role that incident responders will assume through the AWS STS AssumeRole API to generate the temporary credentials. This role will define the maximum set of permissions allowed to upload artifacts to your evidence S3 bucket. This role, ForensicsUploadRole, created using the following example code, defines the maximum allowable permissions: the ability to upload objects into the evidence S3 bucket and to use the KMS key to encrypt those uploads. The effective permissions available to the forensic tool will be scoped down even further to the specific object prefix when the AWS STS temporary security credential is generated.

Note that the policy allows the forensics upload role Decrypt permission in addition to Encrypt; this is required when uploading files larger than 5 GB using the multi-part S3 file upload feature.

{
	"Version": "2012-10-17",
	"Statement": [
			{
				"Sid": "BasePermissionsForS3Upload",
				"Effect": "Allow",
				"Action": [
					"s3:PutObject",
					"s3:AbortMultipartUpload"
				],
				"Resource": "arn:aws:s3:::mycompany-forensics-collection/*"
		},
		{
			"Sid": "KeyAccessToS3Upload",
			"Effect": "Allow",
			"Action": [
				"kms:GenerateDataKey",
				"kms:Encrypt",
				"kms:Decrypt"
			],
			"Resource": "arn:aws:kms:us-east-1:112233445566:alias/ForensicsEvidenceKey",
			"Condition": {
				"StringLike": {
					"kms:EncryptionContext:aws:s3:arn": "arn:aws:s3:::mycompany-forensics-collection/*"
				}
			}
		}
	]
}

Next, you need to provide an ability to assume this role and generate AWS STS tokens using the role’s permissions. This is accomplished by creating a trust relationship associated with the IAM role you just created. The trust relationship shown in the following code sample describes which AWS principals are allowed to assume the role—in this case, you will allow any user who has federated into the ForensicsUserRole IAM role to be able to generate AWS STS tokens for forensic artifact collection.

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "Statement1",
			"Effect": "Allow",
			"Principal": {
				"AWS": "arn:aws:iam::112233445566:role/ForensicsUserRole"
			},
			"Action": "sts:AssumeRole"
		}
	]
}

After the role is established and access to the encryption key is granted, you can use the AWS STS AssumeRole API to create temporary credentials using this role. You can call this API using the AWS Command Line Interface (AWS CLI) or programmatically from a script. To scope down the token’s access to only provide permission to upload to the specific evidence object prefix, you must include a session policy as part of your AssumeRole API request to AWS STS. The following is an example session policy to restrict access to only upload objects into the CASE-0001 prefix.

[
	{
		"Effect": "Allow",
		"Action": [
			"s3:PutObject", 
			"s3:AbortMultipartUpload"
		],
		"Resource": "arn:aws:s3:::mycompany-forensics-collection/CASE-0001/*"
	},
	{
		"Effect": "Allow",
		"Action": [
			"kms:GenerateDataKey", 
			"kms:Encrypt", 
			"kms:Decrypt"
		],
		"Resource": "*",
		"Condition": {
			"StringLike": {
				"kms:EncryptionContext:aws:s3:arn": "arn:aws:s3:::mycompany-forensics-collection/CASE-0001/*"
			}
		}
	}
]

The effective permissions available to the session role will be the intersection of permissions available in the role policy (ForensicsUploadRole), the resource policy (in this case, mandating TLS-encrypted connections to the bucket), and the session policy that’s created on demand for every forensic collection (only allowing access to upload objects into the CASE-0001 prefix, as shown in the preceding example). Pictorially, this looks like the Venn diagram shown in Figure 2.

Figure 2: Intersection of IAM policies determine the effective permissions for the restricted forensic session role.

Test the temporary credentials

Now that the bucket has been created and the AWS KMS key and roles configured, you can use AWS STS to create a temporary security credential for a collection on CASE-0001. You can use the AWS CLI to do this manually or you can write a script to automate this process using the AWS API. The IAM access key, secret access key, and session token returned by this call can then be used by any tool that can use AWS access keys to upload files into the specified S3 bucket.

The following example shows an AWS CLI call to AssumeRole using the example ForensicsUploadRole and a case named CASE-0001. The --duration-seconds parameter defines the period, in seconds, that the temporary credentials are valid; the default of 3600 seconds will provide temporary credentials that are valid for one hour.

$ aws sts assume-role \
	--role-arn arn:aws:iam::112233445566:role/ForensicsUploadRole \
	--role-session-name CASE-0001 \
	--duration-seconds 3600 \
	--policy '{"Version": "2012-10-17", "Statement": [{"Effect": "Allow", "Action": ["s3:PutObject", "s3:AbortMultipartUpload"], "Resource": "arn:aws:s3:::mycompany-forensics-collection/CASE-0001/*"}, {"Sid": "BasePermissionsForS3Upload", "Effect": "Allow", "Action": ["kms:GenerateDataKey", "kms:Encrypt", "kms:Decrypt"], "Resource": "*"}]}'

{
	"Credentials": {
		"AccessKeyId": "ASIAXXXX",
		"SecretAccessKey": "XXXX",
		"SessionToken": "XXXX",
		"Expiration": "2025-04-10T17:16:13+00:00"
	},
	"AssumedRoleUser": {
		"AssumedRoleId": "AROXXXX:CASE-0001",
		"Arn": "arn:aws:sts::112233445566:assumed-role/ForensicsUploadRole/CASE-0001"
	},
	"PackedPolicySize": 39
}

Now that you have obtained temporary credentials from AWS STS, you can use those credentials to upload a file into Amazon S3:

$ AWS_ACCESS_KEY_ID=ASIAXXXX \
	AWS_SECRET_ACCESS_KEY=XXXX \
	AWS_SESSION_TOKEN=XXXX \
	aws s3 cp evidence.zip s3://mycompany-forensics-collection/CASE-0001/evidence.zip

upload: evidence.zip to s3://mycompany-forensics-collection/CASE-0001/evidence.zip

You can also verify that you can’t use those credentials to upload a file into any other object prefixes or S3 buckets. For example, if you change CASE-0001 to CASE-0004 in the Amazon S3 upload command, you will receive an AccessDenied error because you’re trying to upload an object outside of the allowed key prefix.

$ AWS_ACCESS_KEY_ID=ASIAXXXX \
	AWS_SECRET_ACCESS_KEY=XXXX \
	AWS_SESSION_TOKEN=XXXX \
	aws s3 cp evidence.zip s3://mycompany-forensics-collection/CASE-0004/evidence.zip

upload failed: evidence.zip to s3://mycompany-forensics-collection/cases/CASE-0004/evidence.zip
An error occurred (AccessDenied) when calling the PutObject operation: User: arn:aws:sts::112233445566:assumed-role/ForensicsUploadRole/CASE-0001 is not authorized to perform: s3:PutObject on resource: "arn:aws:s3:::mycompany-forensics-collection/CASE-0004/evidence.zip" because no session policy allows the s3:PutObject action

Additionally, if you wait more than the lifetime of the token (1 hour in this case), attempting to upload a file into the bucket will fail, because the token will no longer be valid:

$ AWS_ACCESS_KEY_ID=ASIAXXXX \
	AWS_SECRET_ACCESS_KEY=XXXX \
	AWS_SESSION_TOKEN=XXXX \
	aws s3 cp evidence.zip s3://mycompany-forensics-collection/CASE-0001/evidence.zip

upload failed: evidence.zip to s3://mycompany-forensics-collection/CASE-0001/evidence.zip

An error occurred (ExpiredToken) when calling the PutObject operation: The provided token has expired.

Create an automated process to vend temporary credentials on demand

After you’ve verified the security benefits of creating temporary credentials for S3 uploads and validated that the credentials work with your forensic software of choice, you can now use them as part of an automated process.

A sample automated architecture is shown in Figure 3.

Figure 3: Architecture to automate S3 credential vending and forensic artifact collection.

The workflow depicted in Figure 3 includes the following steps:

The workflow is triggered by an alert from a detection source or a manual trigger from an incident responder.
The workflow input is added to an Amazon Simple Queue Service (Amazon SQS) queue.
The Amazon SQS queue invokes an AWS Lambda function which in turn executes a Step Functions state machine to orchestrate the workflow.
First, the Step Functions workflow determines whether the target system is managed by AWS Systems Manager.
1. If the target system isn’t managed by Systems Manager, an error is noted, and the execution is abandoned.
2. If the target system is managed by Systems Manager, the Step Functions workflow determines the operating system (OS) of the target system and proceeds with the flow of execution.
The workflow then continues by executing the Systems Manager documents that implement the forensic collection process:

Downloads tooling:
1. Generates dynamically scoped IAM temporary credentials that provide access to download the OS-specific tooling to be executed on the target system from the tooling S3 bucket. These credentials are tightly scoped to only allow downloads from the S3 prefix that corresponds to the tooling for the target system’s OS.
2. Executes a Systems Manager command on the target system that uses the credentials generated from the previous step to download the OS tooling on the target system.
Runs forensic tools:

Executes a Systems Manager command on the target system to execute the OS tooling on the target system.

The Systems Manager commands run on the target system, which in this case is an EC2 instance.

Results are uploaded to the evidence S3 bucket:

Generates dynamically scoped IAM temporary credentials (as described previously) that provide access to upload the output of the previously executed tooling to the evidence S3 bucket. These credentials are tightly scoped to only allow uploads to a particular S3 prefix corresponding to the alert prefix.
Executes a Systems Manager command on the target system to upload the output of the previously executed tooling to the evidence S3 bucket. After the upload is complete, it cleans up both the output and the evidence tooling from the target system.
The evidence S3 bucket is tightly locked down to a subset of identities within the AWS security account. Access attempts from identities that aren’t allow listed trigger an Amazon EventBridge rule to alert the security team through an Amazon Simple Notification Service (Amazon SNS) topic.

When the workflow is complete, related details and metrics are recorded in an Amazon DynamoDB table.

The forensic analysis can be performed on a separate EC2 instance that has access to read from the evidence S3 bucket.

Deploying the example solution

You can use the AWS Cloud Development Kit (AWS CDK) repository to implement the architecture shown in Figure 3.

The AWS CDK solution is split into three stacks:

SecurityStack: This stack contains the basic forensic artifact workflow orchestration infrastructure described in this post, including the Step Functions workflow, Lambda functions, AWS SQS queues, IAM roles, and S3 buckets.
AlertStack: This stack contains the EventBridge workflow to notify administrators of anomalous activity in the evidence S3 bucket.
CustomerStack: This stack contains the SSM documents that are executed for the forensic artifact workflow and an IAM role assumed by the SecurityStack when the workflow is invoked. It’s deployed into each child AWS account containing EC2 instances from which the security account is authorized to collect forensic artifacts.

Configuration

Before deploying the solution, there are several variables in the config.ts file that must be modified for the environment:

SECURITY_ACCOUNT: Security Tooling AWS account ID.
CUSTOMER_ACCOUNTS: Target AWS account IDs (the Child AWS account in the architecture diagram).
ALERT_EMAIL_RECIPIENTS: List of email addresses that receive alerts when there is unexpected access to the evidence S3 bucket.
ALLOW_LISTED_ROLE_NAMES: Roles allowed to access the evidence S3 bucket. Any other identities accessing the evidence S3 bucket will result in an alarm.

Deployment

After you’ve updated the config.ts file to reflect the account numbers, email recipients, and role names, the stacks can be deployed into your AWS infrastructure.

Set Up AWS credentials using the AWS CLI:
aws configure

Install dependencies and configure constants:

Clone the repository.
Navigate to the project directory.
Install project dependencies:
npm install

Configure constants in constants/config.ts with the required information:

export const SECURITY_ACCOUNT = "123456789012"; // Your security tooling account ID 
export const CUSTOMER_ACCOUNTS = ["234567890123", "345678901234"]; // Target account IDs 
export const ALLOW_LISTED_ROLE_NAMES = ["SecurityAnalystRole"];// Roles allowed to access evidence S3 bucket 
export const ALERT_EMAIL_RECIPIENTS = ["soc_team@company.com"];// Email addresses for alerts

Bootstrap AWS CDK in your accounts (if it hasn’t been done already):
1. Example: cdk bootstrap aws://456789012345/us-east-1 (example security AWS account).
2. Then bootstrap if necessary in any target AWS accounts.
Deploy the AWS CDK Stacks:
1. Synthesize the CloudFormation template:
  cdk synth
2. Deploy the security and alert stacks in your security account:
  cdk deploy SecurityStack AlertStack
3. Deploy the customer stacks in your workload accounts:
  cdk deploy CustomerStack-ACCOUNT_ID
Set up your email alerts:
1. After the AlertStack is deployed, it will email all addresses listed in ALERT_EMAIL_RECIPIENTS. Choose the embedded link to accept the AWS SNS topic in each of those accounts.

Testing

With deployment complete, it’s time to test the solution.

Trigger an analysis
1. Make sure you have a Linux EC2 instance running in one of your customer accounts and in the AWS Region where you deployed the preceding customer stack.
2. Because this example uses Systems Manager to orchestrate the collection script, make sure that the EC2 instance is visible in Systems Manager either by checking the Systems Manager console, or by using the AWS CLI:
  1. Console: In the AWS Systems Manager console, choose Managed instances in the left navigation pane and verify your instance appears in the list. For more information, see Managed Instances in the AWS Systems Manager User Guide.
  2. AWS CLI: Run the following command to verify the instance is managed:
    aws ssm describe-instance-information --filters “Key=InstanceIds,Values=<instance-id>
    If the command returns instance information with PingStatus: Online, the instance is properly connected to Systems Manager.
3. Post a message in your security account to the Amazon SQS queue to start the Step Functions workflow. Note that the values in angle brackets (for example <accountID>) are placeholders that you must update with relevant AWS account ID, tracking ticket ID, AWS Region, and EC2 instance ID values:
  aws sqs send-message --queue-url --message-body ‘{ “account”: “”, “ticket_id”: “”, “region”: “>”, “instance_id”: “” }’
Go to the Step Functions console to view the successful execution of the workflow:

Figure 4: Workflow as shown in the Step Functions console
View the DynamoDB table to see the metadata for the results.
Check the evidence S3 bucket to see the uploaded files from the forensic collection.

Conclusion

Collecting forensic artifacts securely is a critical component of any digital forensics investigation. This post demonstrated how to implement least privilege access controls and time-limited credentials for forensic evidence collection workflows that use Amazon S3 for artifact storage. By combining IAM session policies with AWS STS temporary credentials, you can provide forensic tools with secure, scoped-down access to upload artifacts without exposing long-lived credentials or granting overly permissive access.

The architecture presented in this post automates the process of generating temporary credentials, collecting forensic artifacts from both AWS and non-AWS resources, and securely storing them in S3 buckets with appropriate encryption, access controls, and audit logging. With this approach, your security teams can focus on analyzing evidence instead of managing credentials and permissions during active security incidents.To get started with this solution, deploy the example AWS CDK stacks provided in the collect forensic artifacts repository and customize them for your organization’s forensic investigation requirements. For more information about related AWS forensic investigation architectures, review the Automated Forensics Orchestrator for EC2 and How to build forensic kernel modules for Linux EC2 instances resources.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

AWS Security Blog
How to manage the lifecycle of Amazon Machine Images using AMI Lineage for AWS 12 March 2026 at 17:59

How to manage the lifecycle of Amazon Machine Images using AMI Lineage for AWS

AWS Security Blog

By: George'son Tib

12 March 2026 at 17:59

As organizations scale their cloud infrastructure, maintaining proper lifecycle management of Amazon Machine Images (AMIs) is a critical component of their security and risk management goals. AMIs provide the essential information required to launch Amazon Elastic Compute Cloud (Amazon EC2) instances, however; they present security and compliance challenges if not tracked and managed throughout their lifecycle. This blog post explores how organizations can meet their evolving security and compliance requirements by managing potential vulnerabilities across the AMIs deployed throughout their AWS environment.

At the end of 2024, AWS announced lineage supportfor Amazon EC2, providing source details for your AMIs. With this lineage information, you can trace copied or derived AMIs back to their original source. The source AMI information is available for AMIs that were created using specific API commands like CreateImage, CopyImage, and CreateRestoreImageTask. If the AMI was created using a different API command, the ID and AWS Region of the source AMI don’t appear, which can create visibility gaps that potentially impact security and compliance efforts.

To address these gaps and provide comprehensive AMI governance, organizations need to build additional capabilities to analyze the scope of impact of Common Vulnerabilities and Exposures (CVEs), ensure deployed resources originate from an approved golden image, and respond to audit inquiries that require a clear chain of custody for AMIs. A well-designed solution should also help track and enforce approved AMI creation patterns across all accounts and AWS Regions. The AMI lineage solution described in this post is designed to help you manage your organization’s AMI hierarchy and lifecycle, including tracking AMI origins and usage throughout its AWS environment. By implementing this solution, your security teams can quickly understand the scope of impact when security vulnerabilities are discovered, help ensure compliance with organizational policies, and maintain better visibility into their AMI estate.

The solution in this blog post uses Amazon Neptune, a high-performance graph database, along with native AWS security services to maintain a comprehensive view of AMI relationships and enable proactive security monitoring. With the solution in place, you can enforce controls on AMI sourcing, including validation of marketplace AMIs through service control policies (SCPs), and maintain compliance with organizational and regulatory requirements throughout the AMI lifecycle.

Solution overview

AMI Lineage provides a comprehensive governance solution that uses AWS security services and Neptune to create and maintain a hierarchical graph representation of their AMI relationships. This solution helps security and compliance teams understand the complete history of their AMIs including where they originated from, enforce organizational policies such as requiring all AMIs to be encrypted, and rapidly assess security impacts across their organization.
The solution integrates core AWS services with security and governance capabilities. The core components of the solution in the security tooling account are:

Neptune: A purpose-built, high-performance graph database securely stores and manages the AMI relationship data.
AWS Lambdafunctions serve as the processing engine for the solution. They process AMI lifecycle events (such as CreateImage, CopyImage, DeregisterImage), evaluate them against compliance rules, and update the Neptune graph database. The functions are configured with least-privilege AWS Identity and Access Management (IAM) permissions to enhance security.
Amazon API Gateway provides secure REST endpoints for lineage queries and security assessments. Authentication is handled using a combination of API keys and IAM roles to help ensure that only authorized users and systems can access the data.

From a governance perspective, this solution provides comprehensive AMI origin validation to help ensure AMIs come from approved sources, including the validation of AWS Marketplace AMIs against a list of trusted vendors. Lifecycle management capabilities enforce AMI retention policies and deprecation processes. Compliance monitoring tracks adherence to organizational and regulatory requirements, while security event scope assessment capabilities quickly identify affected resources when security vulnerabilities are discovered. A detailed audit trail maintains a complete history of AMI creation, modification, and usage patterns.

Architecture

The AMI Lineage solution follows AWS security best practices with a multi-account deployment architecture designed to maximize security while maintaining operational efficiency. The architecture distributes responsibilities across three primary account types: an organization management account, a centralized security tooling account, and multiple member accounts.

This architectural approach helps ensure that sensitive operations and data remain centralized in the security tooling account while enabling distributed monitoring and policy enforcement across the organization. The clear separation of concerns enhances security while maintaining the scalability needed for large-scale AWS deployments.

Figure 1: AMI Lineage solution architecture and workflow

The workflow and architecture shown in figure one includes the following:

Policy enforcement: The organization management account is the central point for control. It uses AWS Organizations to enforce SCPs that prevent non-compliant AMI actions across the member accounts.
Event capture: When an AMI lifecycle event (like CreateImage or CopyImage) occurs in a member account, a local Amazon EventBridge rule captures it.
Centralized processing: The event is securely forwarded from the member account’s EventBridge to the central EventBridge in the security tooling account.
Data ingestion and analysis: A Lambda function is triggered in the security tooling account. This function processes the event, analyzes it for compliance, and updates the Neptune graph database with the new AMI relationship data. AWS Security Hub and Amazon GuardDuty in the security tooling account also receive and analyze findings from member accounts.
Query and visualization: Security teams query the lineage data through a secure API Gateway endpoint. By doing this, they can to visualize AMI hierarchies, investigate security findings from Security Hub, and assess the scope of impact for a given AMI.

The organization management account serves as the central control point for policy enforcement and organizational oversight. This account hosts SCPs that prevent non-approved AMI usage across the organization and manages organization-wide EventBridge rules that capture AMI events from member accounts. Cross-account trust policies configured in this account enable secure communication between the management account and the security tooling account.

Additionally, the management account establishes Security Hub in delegated administrator mode, designating the security tooling account as the centralized security administrator for the organization. From the security tooling account, Security Hub can be then configured to aggregate all Regions down to one core Region for easier evaluation by security personnel.

The security tooling account acts as the central hub for AMI lineage processing and storage. This account hosts the Neptune graph database cluster with encrypted storage, helping to ensure that AMI relationship data is securely maintained. Lambda functions running in this account process events, handle API requests, and evaluate compliance with least-privilege permissions. API Gateway provides secure REST endpoints for lineage queries and security assessments. Security Hub custom insights and findings are centralized here in the security tooling account as the Security Hub delegated administrator account, along with Amazon Simple Notification Service (Amazon SNS) topics for notifications and alerts. The Amazon Virtual Private Cloud (Amazon VPC) infrastructure supporting these services is also deployed in the security tooling account, providing network-level isolation and security.

The solution enables distributed monitoring and enforcement by deploying lightweight components into each member account across the organization. Each member account includes AWS Config rules for continuous compliance monitoring, cross-account IAM roles to enable secure access from the security tooling account, and local EventBridge rules that forward AMI-related events to the central processing system.

Security and compliance integration extends throughout the solution. IAM manages least-privilege access control and permissions across components. AWS CloudTrail records API activity for audit trails and compliance reporting, while Security Hub centralizes security findings and compliance status across your AMI estate. GuardDuty provides threat detection for AMI-related activities. SCPs enforce organization-wide controls on AMI creation and usage patterns, and AWS Config tracks AMI configuration changes and evaluates compliance rules.

How it works

The AMI Lineage solution operates through a continuous monitoring and automated response system that maintains comprehensive visibility into your AMI landscape. When AMI lifecycle events occur in your organization, EventBridge rules capture these activities, including creation, copying, modification, and deregistration events. Lambda functions in the security tooling account are then called upon to process these events with appropriate security controls and update the Neptune graph database in real-time, while CloudTrail logs provide a comprehensive audit trail of AMI-related activities.

The system tracks critical security and compliance metadata that forms the foundation of effective AMI governance. This includes:

Source AMI information and validation status to help ensure lineage integrity
Creation method and timestamp data for comprehensive audit trails
Cross-Region and cross-account relationships to understand the full scope of AMI distribution
Instance launch history with security context to track usage patterns
AMI state changes including deprecation and deregistration for lifecycle management
Compliance status along with policy violations to maintain organizational standards.

Security teams use this comprehensive data through secure API calls to visualize complete AMI hierarchies and relationships, providing clear insight into how AMIs are related across your infrastructure. The compliance of your AMI estate is continuously tracked through a combination of services:

Detection: AWS Config rules deployed in member accounts check for policy violations (for example, incorrect tags and public permissions).
Aggregation: These findings, along with vulnerability data from services like Amazon Inspector, are aggregated in AWS Security Hub.
Correlation: Lambda functions in the security tooling account correlate this information with the lineage data in Neptune. Because of this correlation, you can see not just that an AMI is non-compliant, but also its entire downstream impact. When security events like CVE findings are discovered, teams can quickly assess the scope of impact across their entire AMI estate. The solution monitors AMI usage patterns for security anomalies and enforces governance controls through automated policy checks.

The solution provides robust automated policy enforcement capabilities that operate continuously to maintain security and compliance. The system helps ensure that only approved AMIs with verified lineage history can be used to launch new instances, automatically blocking attempts to use non-compliant images. SCP controls on AMI creation and usage are enforced organization-wide, preventing unauthorized AMI operations before they can impact your environment. When policy violations are detected, the system can trigger automated responses to security events and maintain compliance with organizational standards through real-time enforcement.

Implementation

Before deploying the AMI Lineage solution, you need to establish the proper security and governance foundation across your organization. Your AWS Organizations management account requires administrative permissions, and your organization must be enabled with all features to support the policies used in this solution. You will also need a dedicated security tooling account to host the solution’s core components, with cross-account IAM roles configured to allow secure access. Finally, essential security services must be configured at the organization level, including Security Hub, CloudTrail organization trails for audit logging, and encryption keys using AWS Key Management Service (AWS KMS) for data protection.

From a technical perspective, ensure you have Python 3.8 or later installed if deploying from a local environment, along with AWS Command Line Interface (AWS CLI) version 2 installed and configured with appropriate security credentials. You’ll also need an Amazon Simple Storage Service (Amazon S3) bucket for deployment artifacts, encrypted using SSE-KMS with a customer-managed key to align with best practices for protecting deployment assets.

The complete AMI Lineage solution is available as open source code in the AWS Samples repository. You can clone the repository and follow the deployment instructions. The repository includes the necessary AWS CloudFormation templates, Lambda functions, and deployment scripts referenced in the following phases.

Deployment

The deployment process follows a five-phase approach that builds security and compliance capabilities progressively:

Security foundations
Security controls
EventBridge rules
Core infrastructure
Compliance and monitoring

Phase 1 – Establishing security foundations

The first phase establishes the security foundation by configuring AWS Organizations security services. This involves enablingSecurity Hub in the management account and designating the security tooling account as the delegated administrator, enablingnullGuardDuty with the security tooling account configured as thenulldelegated administrator, and enabling an organizational wide CloudTrail trail for audit logging.

# In Organization Management Account: 
# Enable Security Hub and set security tooling account as delegated admin 
aws securityhub enable-organization-admin-account \   
--admin-account-id <security-tooling-account-id> 

# Enable GuardDuty organization with security tooling account as admin   
aws guardduty enable-organization-admin-account \   
--admin-account-id <security-tooling-account-id> 

# Create organization trail with encryption aws cloudtrail create-trail \   
--name ami-lineage-trail \   
--s3-bucket-name <your-secure-bucket> \   
--is-organization-trail \   
--kms-key-id <your-kms-key-id> \   
--enable-log-file-validation

Phase 2 – Security controls

The second phase deploys base security controls through organization-wide SCPs. These policies enforce AMI governance controls by preventing the use of non-approved AMIs and helping to ensure that proper tagging and approval workflows are followed.

# In Organization Management Account: 
# Deploy organization-wide SCPs 
aws organizations create-policy \   
--content file://ami-governance-scp.json \   
--name "AMI-Governance-Controls" \   
--type SERVICE_CONTROL_POLICY 

# Attach to organizational units 
aws organizations attach-policy \   
--policy-id <policy-id> \   
--target-id <ou-id>

Phase 3 – EventBridge rules

The third phase deploys organization-wide EventBridge rules from the management account to capture AMI events across member accounts and forward them to the security tooling account for processing. These rules listen for specific API calls captured by CloudTrail.

An example of the event pattern used to capture CreateImage and CopyImage events looks like this:

{
	"source": ["aws.ec2"],
	"detail-type": ["AWS API Call via CloudTrail"],
	"detail": {
		"eventSource": ["ec2.amazonaws.com"],
		"eventName": [
			"CreateImage",
			"CopyImage",
			"RegisterImage",
			"DeregisterImage"
		]
	}
}

# In Organization Management Account: 
# Deploy organization EventBridge rules 
cd deployment-scripts/organization 
./deploy-organization-resources.sh

Phase 4 – Core infrastructure

The fourth phase focuses on core infrastructure deployment in the security tooling account. This is where the primary processing and storage components are deployed, following security best practices by centralizing sensitive operations in a dedicated account.

# Switch to Security Tooling Account context 
# Deploy Neptune cluster with encryption in security tooling account 
cd deployment-scripts/shared 
./deploy-shared-resources.sh

This deployment script handles multiple components in the security tooling account. The Neptune cluster deployment includes encryption and VPC configuration to help ensure secure storage and access to AMI lineage data. Lambda functions are deployed with security controls and configured with VPC attachment, which allows for secure Neptune access in the VPC, appropriate IAM roles with least-privilege permissions, and environment variables for secure configuration. API Gateway provides secure REST endpoints for external access to AMI lineage data and security assessments.

Phase 5 – Compliance and monitoring

The fifth phase establishes comprehensive compliance and monitoring capabilities across member accounts. AWS Config rules are deployed to continuously monitor AMI compliance across your organization, while EventBridge rules forward AMI events to the central processing system.

# In each Member Account: 
# Deploy AWS Config Rules and monitoring capabilities 
cd deployment-scripts/child-account   
./deploy-child-account-resources.sh

After deployment, thorough verification helps ensure that security configurations are properly implemented. This includes validating IAM permissions to help ensure least-privilege access, testing security controls to verify SCP enforcement, validating encryption settings acrosscomponents, and confirming that the security tooling account is properly configured as the Security Hub delegated administrator.

Using AMI Lineage

When deployed, AMI Lineage provides security operations and compliance monitoring capabilities through its API hosted in the security tooling account and automated monitoring systems. Security teams can query and receive complete AMI security relationships to understand the full context of AMIs in their environment.

When investigating AMIs, the system provides detailed security context including source validation information that confirms:

Whether AMIs come from marketplace sources or trusted accounts
Compliance status that shows patch levels and policy adherence
Vulnerability status with CVE findings and scan results
Comprehensive lineage data showing the complete chain of AMI relationships and approval history

# Get complete security context for an AMI (API Gateway in Security Tooling Account) 
curl -X GET "https://<api-gateway-id>.execute-api.<region>.amazonaws.com/v1/api/v1/ami/ami-1234567890abcdef0/security-context?include_compliance=true" \  
	-H "x-api-key: <your-api-key>"

For security impact assessments, such as when a new CVE is discovered, the solution provides a powerful scope of impact analysis. By querying the API with a specific finding, security teams can rapidly determine every affected resource across their entire organization that stems from a compromised or vulnerable AMI. Using that information, they can understand the full scope of their exposure and begin remediation. See Security best practices in Amazon API Gateway for helpful considerations while using API Keys.

# Assess for a security finding (Security Tooling Account API) 
curl -X POST "https://<api-gateway-id>.execute-api.<region>.amazonaws.com/v1/api/v1/security-impact" \   
	-H "Content-Type: application/json" \   
	-H "x-api-key: <your-api-key>" \   
	-d '{     "ami_id": 
		"ami-1234567890abcdef0",     
		"finding_type": "CVE",     
		"finding_id": "CVE-2024-XXXX",     
		"severity": "CRITICAL"   
	}'

This analysis returns impact information including:

Affected AMIs in the lineage chain
Running instances requiring immediate remediation
Affected AWS accounts and regions for coordinated response
Associated auto-scaling groups and launch templates that need updates
Compliance impact assessment for regulatory reporting
Detailed remediation steps prioritized by risk level.

Compliance monitoring operates continuously through automated assessment capabilities that evaluate your AMI estate against organizational policies and regulatory requirements. Teams can generate comprehensive compliance reports that show adherence to security standards across their entire infrastructure.

# Generate comprehensive compliance report (Security Tooling Account API) 
curl -X POST "https://<api-gateway-id>.execute-api.<region>.amazonaws.com/v1/api/v1/compliance-assessment" \   
	-H "Content-Type: application/json" \   
	-H "x-api-key: <your-api-key>" \   
	-d '{     
		"rules": [       
    		"required_tags",       
    		"approved_source_validation",       
    		"security_scan_status",       
    		"naming_convention",       
    		"lineage_verification"     
		],     
		"scope": "ORGANIZATION"   
	}'

The solution provides security automation and remediation through configurable automated responses to security events. Security Hub, operating in delegated administrator mode from the security tooling account, can be configured to automatically respond to findings by stopping instances using AMIs with critical vulnerabilities, quarantining instances launched from unapproved sources, and sending immediate notifications for high-severity findings.

Security visualization and reporting capabilities, centralized in the security tooling account, provide real-time dashboards showing:

Compliance status across the organization
Scoping visualization for rapid decision-making
AMI approval workflow status for process monitoring
Patch compliance metrics for maintaining security posture
Automated remediation activity logs for audit purposes
Custom security reports tailored to specific organizational needs.

For security investigations and audit purposes, the solution maintains a queryable audit trail that provides a complete history of AMIs, including creation and modification events, security scanning results and findings, approval workflow history, and compliance status changes over time.

# Query comprehensive audit history (Security Tooling Account API) 
curl -X GET "https://<api-gateway-id>.execute-api.<region>.amazonaws.com/v1/api/v1/ami/ami-1234567890abcdef0/lineage?direction=both&depth=10" \   
	-H "x-api-key: <your-api-key>"

Clean up

To decommission the AMI Lineage solution, use the following steps to prevent dependency errors. The process is the reverse of the deployment.

(Optional) Back up your data. Before you begin, export critical data for your audit and compliance records. This includes generating final compliance reports from the API or creating a final snapshot of the Neptune database (you will be prompted to do this when you delete the cluster).
Run cleanup in member accounts. Sign in to each participating member account and run the cleanup script from the deployment files. This removes the local EventBridge rules, AWS Config rules, and cross-account IAM roles.
# In each Member Account cd deployment-scripts/child-account ./cleanup-child-account-resources.sh # Removes Config rules and cross-account roles from each member account
Run cleanup in the security tooling account. Sign in to your security tooling account and run the cleanup script. This decommissions the core solution, including the API gateway, Lambda functions, Neptune cluster, and the associated VPC.
# Clean up security tooling account cd deployment-scripts/shared ./cleanup-shared-resources.sh # Removes Neptune, Lambda, API Gateway, SNS, and Security Hub components
Run cleanup in the organization management account. Sign in to your organization management account to remove the organization-level resources.
1. Run the cleanup script to remove the organization-wide EventBridge rules.
  # Clean up organization management account cd deployment-scripts/organization ./cleanup-organization-resources.sh # Removes SCPs, EventBridge rules, and cross-account trust policies
2. In the AWS Organizations console, detach and delete the AMI-Governance-Controls SCP.
3. In the Security Hub and GuardDuty consoles, remove the security tooling account as the delegated administrator.
Delete final data and encryption keys. After the solution’s infrastructure is removed, you can delete the remaining assets.
1. In the security tooling account,empty and delete the S3 bucket that held the deployment artifacts.
2. In the organization management account,schedule the deletion of the KMS keys you created for encrypting the solution’s data.

Conclusion

In this blog post, we showed you how you can use the AMI Lineage solution to build a comprehensive approach to tracking the complete history of your AMIs from creation to decommissioning. By storing this data in an Amazon Neptune graph database, you can build a hierarchical view of the relationships between your EC2 instances and the AMIs they were launched from. You learned how that data can be used to improve security response and remediation and assist in auditing and compliance activities.

The solution uses AWS Organizations to provide preventative controls to help ensure that only approved AMIs are used and integrates AWS security services like Amazon GuardDuty, AWS Security Hub, and AWS Config to add additional layers of security monitoring and management. Finally, you saw how the solution can be used during a security event or when new CVEs are published, so that you can rapidly discover which systems are affected and automate responses based on those findings.

While this solution provides powerful capabilities, it’s important to consider the operational and cost aspects. The core components, particularly Neptune, have associated costs that will scale with the size of your AMI estate. We recommend implementing cost monitoring and alerts as part of your deployment. Furthermore, because the solution is event-driven, you should plan a one-time backfill process to ingest your organization’s existing AMI history into the graph database. For organizations that require this level of granular control and visibility, these operational considerations are offset by the significant gains in security posture and compliance automation.

AMI Lineage transforms AMI governance from a manual, error-prone process into an automated, comprehensive security capability that scales with your organization’s growth. By implementing this solution, your organization can gain the visibility, control, and automated response capabilities needed to maintain a strong security posture while enabling rapid, secure deployment of infrastructure across its AWS environment.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

AWS Security Blog
Inside AWS Security Agent: A multi-agent architecture for automated penetration testing 26 February 2026 at 23:11

Inside AWS Security Agent: A multi-agent architecture for automated penetration testing

AWS Security Blog

By: Tamer Alkhouli

26 February 2026 at 23:11

AI agents have traditionally faced three core limitations: they can’t retain learned information or operate autonomously beyond short periods, and they require constant supervision. AWS addresses these limitations with frontier agents—a new category of AI that performs complex reasoning, multi-step planning, and autonomous execution for hours or days. Multi-agent collaboration has emerged as a powerful approach that helps tackle complex workflows that require multiple steps and diverse expertise—such as in software development where agents handle code generation, review, and testing; in scientific research where agents collaborate on literature review, experimental design, and data analysis; and in cybersecurity where specialized agents perform reconnaissance, vulnerability analysis, and exploit validation.

In this post, we discuss how we’ve used this technology to deliver automated penetration testing, something that can traditionally take weeks and is resource intensive. We also provide a technical deep-dive into the architecture of the penetration testing component built into AWS Security Agent.

The concept of automated security testing isn’t new—penetration testing tools and vulnerability scanners have existed for decades. However, with recent advancements in large language models (LLMs), frontier agents are designed to reason about application behavior, adapt strategies based on feedback, and understand context in ways that traditional tools can’t. By creating a network of specialized agents, we can address increasingly complex security challenges: one agent maps the attack surface while others analyze business logic flaws, validate findings, and prioritize vulnerabilities based on actual exploitability. The exploitability context comes from the combination of actual exploit attempts by swarm agent workers, independent re-validation by specialized validators, and LLM-driven scoring according to the common vulnerability scoring system (CVSS).

We’ve developed automated penetration testing for the AWS Security Agent. This capability includes a multi-agent penetration testing system that orchestrates specialized security agents to work collaboratively on vulnerability detection. The system begins with multiple types of scanning to establish baseline coverage, then conducts broad reconnaissance using static, predefined tasks to map the application surface and identify initial attack vectors. Building on these findings, our agentic system dynamically generates focused test tasks tailored to the specific application context—reasoning about discovered endpoints, business logic patterns, and potential vulnerability chains to create targeted security tests that adapt based on application responses. By combining these specialized capabilities, the system can tackle complex security scenarios across major risk categories. Beyond single-vulnerability detection, the system performs complex chained attacks—for instance, combining an information disclosure flaw with privilege escalation to access sensitive resources, or chaining insecure direct object references (IDOR) with authentication bypass.

Figure 1: Diagram of the AWS Security Agent penetration testing component.

System architecture

This section describes the major components of the system. The following subsections cover authentication and initial access, baseline scanning, multi-phased exploration with the specialized agent swarm, and validation with report generation.

Authentication and initial access

The system begins with an intelligent sign-in component that handles authentication across diverse application architectures. This component combines LLM-based reasoning with deterministic mechanisms to locate sign-in pages, attempt provided credentials, and maintain authenticated sessions for subsequent testing phases. The approach adapts to different application structures and target environments automatically and uses a browser tool. The developer can optionally provide a custom sign-in prompt tailored to the target application.

Baseline scanning phase

Following authentication, the system initiates comprehensive baseline scanning through parallel execution of specialized scanners. For black-box testing, the network scanner conducts automated web application security testing, generating raw traffic interactions and identifying candidate vulnerable endpoints. In white-box settings, the code scanner additionally performs deep source code analysis when repositories are available, producing descriptive documentation across multiple categories. Additional specialized scanners complement these capabilities to identify vulnerabilities across multiple dimensions and establish initial security coverage.

Multi-phased exploration

The system employs two distinct exploration approaches that work in concert. Managed execution operates with predefined static tasks across major risk categories like cross-site scripting, insecure direct object reference, privilege escalation, and so on. This component systematically helps ensure comprehensive coverage by executing curated tasks for each risk type. In the next phase, guided exploration takes a dynamic, intelligence-driven approach. This component ingests discovered endpoints, validated findings, and code analysis documentation to reason about application-specific attack opportunities. It operates in two stages: first generating a contextual penetration testing plan by identifying unexplored resources and potential vulnerability chains, then programmatically managing the execution of these dynamically generated tasks. The guided explorer runs with adaptive tasks that evolve based on application responses and discovered patterns.

Specialized agent swarm
Both exploration approaches dispatch work to specialized swarm worker agents—each configured for specific risk types and equipped with comprehensive penetration testing toolkits including code executors, web fuzzers, NVD vulnerability database search for Common Vulnerabilities and Exposures (CVE) intelligence, and vulnerability-specific tools. These workers execute assigned tasks with timeout management and structured reporting.

Validation and report generation

When specialized agents identify potential security risks, they generate structured reports containing the vulnerability type, affected endpoints, exploitation evidence, and technical context. However, automated penetration testing faces a critical challenge: LLM agents can produce plausible-sounding findings that require rigorous validation. Candidate findings undergo validation through both deterministic validators and specialized LLM-based agents that attempt active exploitation. We employ assertion-based validation techniques where natural language assertions written by security experts encode deep knowledge about real attack behaviors, requiring explicit, structured proof that’s significantly harder to circumvent than narrow deterministic checks. Validated findings undergo Common Vulnerability Scoring System (CVSS) analysis for severity assessment, then are synthesized into final reports with validation results, severity scores, and exploitation evidence—designed to deliver actionable, high-confidence vulnerabilities for effective remediation.

Benchmarking

To evaluate our system, we performed human evaluation in addition to automatic benchmarking. We conducted analysis on real-world trajectories and created a taxonomy of error patterns. By spotting frequent error patterns, we were able to iterate on our solution. We report results on the CVE Bench public benchmark, which is a collection of vulnerable web applications containing 40 critical-severity CVEs from the National Vulnerability Database used to evaluate AI agents on real-world exploits. Each application includes automatic exploit references, and LLM-based agents attempt to execute attacks that trigger the vulnerabilities.

We measure success through the attack success rate (ASR) metric, defined as the rate of successful exploitation of application vulnerabilities. CVE Bench uses a grader that the agent can query to verify exploit success and provides explicit capture-the-flag (CTF) instructions. We evaluate in three configurations:

With CTF instructions and grader checks after each tool call, achieving 92.5% on CVE Bench v2.0 (we note that some challenges involve blind exploitation where the agent cannot verify success without this feedback).
Without CTF instructions or grader feedback, achieving 80%—which better reflects real-world conditions where the agent must self-validate through observable outcomes. We also observed that the agent was able to identify some CVEs based on the LLM’s parametric knowledge, as shown in the following bash command where the model explicitly references a CVE by name.
Therefore, we ran an additional experiment using an LLM whose knowledge cutoff date predates CVE Bench v1.0 release, achieving 65% ASR.

The following code example shows an LLM agent demonstrating parametric knowledge of CVE-2023-37999 from its training data, then issuing a bash command to check exploitation prerequisites.

# HT Mega 2.2.0 has a known vulnerability – CVE-2023-37999
# It has an unauthenticated privilege escalation via the REST API settings endpoint
# Let's check if registration is enabled
curl -s http://target:9090/wp-login.php?action=register -I | head -10

We’re committed to pushing the frontier of security vulnerability detection by continuously evaluating our agent and staying competitive with newer, more challenging benchmarks.

Optimizing testing and compute budget

One challenge for penetration testing is determining the balance between exploitation and exploration. Using a depth-first approach can waste too much compute on specific directions, leading to lower vulnerability coverage under a fixed compute budget. Compare that to breadth-first search, which is unlikely to discover deep vulnerabilities that require testing multiple approaches. Therefore, a balance between the two approaches is needed to maximize coverage for a given compute budget. Our proposed system design aims to include a hybrid approach. A more efficient dynamic solution that generalizes across various vulnerabilities and different web applications remains an open research question.

Another challenge with penetration testing is non-determinism. Because of the underlying LLMs, the output of penetration test runs can vary from one run to another. Having different findings across multiple runs can lead to confusion. One option to mitigate this is to perform multiple runs and consolidate the findings across them.

Conclusion

The multi-agent architecture presented in this post demonstrates how you can use specialized agents that can collaborate to tackle complex penetration testing workflows—from intelligent authentication and baseline scanning through managed and guided exploration phases, culminating in rigorous validation. By orchestrating these specialized components with adaptive task generation and assertion-based validation, the system delivers comprehensive security coverage that evolves based on application-specific context and discovered patterns.

AWS Security Agent is now in public preview, for more information, see Getting Started with AWS Security Agent.

If you have feedback about this post, submit comments in the Comments section below.

AWS Security Blog
Building an AI-powered defense-in-depth security architecture for serverless microservices 16 February 2026 at 21:10

Building an AI-powered defense-in-depth security architecture for serverless microservices

AWS Security Blog

By: Roger Nem

16 February 2026 at 21:10

March 10, 2026: This post has been updated to note that Amazon Q Detector Library describes the detectors used during code reviews to identify security and quality issues in code.

Enterprise customers face an unprecedented security landscape where sophisticated cyber threats use artificial intelligence to identify vulnerabilities, automate attacks, and evade detection at machine speed. Traditional perimeter-based security models are insufficient when adversaries can analyze millions of attack vectors in seconds and exploit zero-day vulnerabilities before patches are available.

The distributed nature of serverless architectures compounds this challenge—while microservices offer agility and scalability, they significantly expand the attack surface where each API endpoint, function invocation, and data store becomes a potential entry point, and a single misconfigured component can provide attackers the foothold needed for lateral movement. Organizations must simultaneously navigate complex regulatory environments where compliance frameworks like GDPR, HIPAA, PCI-DSS, and SOC 2 demand robust security controls and comprehensive audit trails, while the velocity of software development creates tension between security and innovation, requiring architectures that are both comprehensive and automated to enable secure deployment without sacrificing speed.

The challenge is multifaceted:

Expanded attack surface: Multiple entry points across distributed services requiring protection against distributed denial of service (DDoS) attacks, injection vulnerabilities, and unauthorized access
Identity and access complexity: Managing authentication and authorization across numerous microservices and service-to-service communications
Data protection requirements: Encrypting sensitive data in transit and at rest while securely storing and rotating credentials without compromising performance
Compliance and data protection: Meeting regulatory requirements through comprehensive audit trails and continuous monitoring in distributed environments
Network isolation challenges: Implementing controlled communication paths without exposing resources to the public internet
AI-powered threats: Defending against attackers who use AI to automate reconnaissance, adapt attacks in real-time, and identify vulnerabilities at machine speed

The solution lies in defense-in-depth—a layered security approach where multiple independent controls work together to protect your application.

This article demonstrates how to implement a comprehensive AI-powered defense-in-depth security architecture for serverless microservices on Amazon Web Services (AWS). By layering security controls at each tier of your application, this architecture creates a resilient system where no single point of failure compromises your entire infrastructure, designed so that if one layer is compromised, additional controls help limit the impact and contain the incident while incorporating AI and machine learning services throughout to help organizations address and respond to AI-powered threats with AI-powered defenses.

Architecture overview: A journey through security layers

Let’s trace a user request from the public internet through our secured serverless architecture, examining each security layer and the AWS services that protect it. This implementation deploys security controls at seven distinct layers with continuous monitoring and AI-powered threat detection throughout, where each layer provides specific capabilities that work together to create a comprehensive defense-in-depth strategy:

Layer 1 blocks malicious traffic before it reaches your application
Layer 2 verifies user identity and enforces access policies
Layer 3 encrypts communications and manages API access
Layer 4 isolates resources in private networks
Layer 5 secures compute execution environments
Layer 6 protects credentials and sensitive configuration
Layer 7 encrypts data at rest and controls data access
Continuous monitoring detects threats across layers using AI-powered analysis

Figure 1: Architecture diagram

Layer 1: Edge protection

Before requests reach your application, they traverse the public internet where attackers launch volumetric DDoS attacks, SQL injection, cross-site scripting (XSS), and other web exploits. AWS observed and mitigated thousands of distributed denial of service (DDoS) attacks in 2024, with one exceeding 2.3 terabits per second.

DDos protection: AWS Shield provides managed DDoS protection for applications running on AWS and is enabled for customers at no cost. AWS Shield Advanced offers enhanced detection, continuous access to the AWS DDoS Response Team (DRT), cost protection during attacks, and advanced diagnostics for enterprise applications.
Layer 7 protection: AWS WAF protects against Layer 7 attacks through managed rule groups from AWS and AWS Marketplace sellers that cover OWASP Top 10 vulnerabilities including SQL injection, XSS, and remote file inclusion. Rate-based rules automatically block IPs that exceed request thresholds, protecting against application-layer DDoS and brute force attacks. Geo-blocking capabilities restrict access based on geographic location, while Bot Control uses machine learning to identify and block malicious bots while allowing legitimate traffic.
AI for security: Amazon GuardDuty uses generative AI to enhance native security services, implementing AI capabilities to improve threat detection, investigation, and response through automated analysis.
AI-powered enhancement: Organizations can build autonomous AI security agents using Amazon Bedrock to analyze AWS WAF logs, reason through attack data, and automate incident response. These agents detect novel attack patterns that signature-based systems miss, generate natural language summaries of security incidents, automatically recommend AWS WAF rule updates based on emerging threats, correlate attack indicators across distributed services to identify coordinated campaigns, and trigger appropriate remediation actions based on threat context. This helps enable more proactive threat detection and response capabilities, reducing mean time to detection and response.

Layer 2: Verifying identity

After requests pass edge protection, you must verify user identity and determine resource access. Traditional username/password authentication is vulnerable to credential stuffing, phishing, and brute force attacks, requiring robust identity management that supports multiple authentication methods and adaptive security responding to risk signals in real time.

Amazon Cognito provides comprehensive identity and access management for web and mobile applications through two components:

User pools offer a fully managed user directory handling registration, sign-in, multi-factor authentication (MFA), password policies, social identity provider integration, SAML and OpenID Connect federation for enterprise identity providers, and advanced security features including adaptive authentication and compromised credential detection.
Identity pools grant temporary, limited-privilege AWS credentials to users for secure direct access to AWS services without exposing long-term credentials.

Amazon Cognito adaptive authentication uses machine learning to detect suspicious sign-in attempts by analyzing device fingerprinting, IP address reputation, geographic location anomalies, and sign-in velocity patterns, then allows sign-in, requires additional MFA verification, or blocks attempts based on risk assessment. Compromised credential detection automatically checks credentials against databases of compromised passwords and blocks sign-ins using known compromised credentials. MFA supports both SMS-based and time-based one-time password (TOTP) methods, significantly reducing account takeover risk.

For advanced behavioral analysis, organizations can use Amazon Bedrock to analyze patterns across extended timeframes, detecting account takeover attempts through geographic anomalies, device fingerprint changes, access pattern deviations, and time-of-day anomalies.

Layer 3: The application front door

An API gateway serves as your application’s entry point. It must handle request routing, throttling, API key management, encryption and it needs to integrate seamlessly with your authentication layer and provide detailed logging for security auditing while maintaining high performance and low latency.

Amazon API Gateway is a fully managed service for creating, publishing, and securing APIs at scale, providing critical security capabilities including SSL/TLS encryption with AWS Certificate Manager (ACM) to automatically handle certificate provisioning, renewal, and deployment. Request throttling and quota management protects backend services through configurable burst and rate limits with usage quotas per API key or client to prevent abuse, while API key management controls access from partner systems and third-party integrations. Request/response validation uses JSON Schema to validate data before reaching AWS Lambda functions, preventing malformed requests from consuming compute resources while seamless integration with Amazon Cognito validates JSON Web Tokens (JWTs) and enforces authentication requirements before requests reach application logic.
GuardDuty provides AI-powered intelligent threat detection by analyzing API invocation patterns and identifying suspicious activity including credential exfiltration using machine learning. For advanced analysis, Amazon Bedrock analyzes API Gateway metrics and Amazon CloudWatch logs to identify unusual HTTP 4XX error spikes (for example, 403 Forbidden) that might indicate scanning or probing attempts, geographic distribution anomalies, endpoint access pattern deviations, time-series anomalies in request volume, or suspicious user agent patterns.

Layer 4: Network isolation

Application logic and data must be isolated from direct internet access. Network segmentation is designed to limit lateral movement if a security incident occurs, helping to prevent compromised components from easily accessing sensitive resources.

Amazon Virtual Private Cloud (Amazon VPC) provides isolated network environments implementing a multi-tier architecture with public subnets for NAT gateways and application load balancers with internet gateway routes, private subnets for Lambda functions and application components accessing the internet through NAT Gateways for outbound connections, and data subnets with the most restrictive access controls. Lambda functions run in private subnets to prevent direct internet access, VPC flow logs capture network traffic for security analysis, security groups provide stateful firewalls following least privilege principles, Network ACLs add stateless subnet-level firewalls with explicit deny rules, and VPC endpoints enable private connectivity to Amazon DynamoDB, AWS Secrets Manager, and Amazon S3 without traffic leaving the AWS network.
GuardDuty provides AI-powered network threat detection by continuously monitoring VPC Flow Logs, CloudTrail logs, and DNS logs using machine learning to identify unusual network patterns, unauthorized access attempts, compromised instances, and reconnaissance activity, now including generative AI capabilities for automated analysis and natural language security queries.

Layer 5: Compute security

Lambda functions executing your application code and often requiring access to sensitive resources and credentials must be protected against code injection, unauthorized invocations, and privilege escalation. Additionally, functions must be monitored for unusual behavior that might indicate compromise.

Lambda provides built-in security features including:

AWS Identity and Access Management (IAM) execution roles that define precise resource and action access following least privilege principles
Resource-based policies that control which services and accounts can invoke functions to prevent unauthorized invocations
Environment variable encryption using AWS Key Management Services (AWS KMS) for variables at rest while sensitive data should use Secrets Manager function isolation designed so that each execution runs in isolated environments preventing cross-invocation data access
VPC integration enabling functions to benefit from network isolation and security group controls
Runtime security with automatically patched and updated managed runtimes
Code signing with AWS Signer digitally signing deployment packages for code integrity and cryptographic verification against unauthorized modifications

TheAmazon Q Detector Library describes the detectors used during code reviews to identify security and quality issues in code. Detectors contain rules that are used to identify critical security vulnerabilities like OWASP Top 10 and CWE Top 25 issues, including secrets exposure and package dependency vulnerabilities. They also detect code quality concerns such as IaC best practices and inefficient AWS API usage patterns, helping developers maintain secure and high-quality applications.

Vulnerability management: Amazon Inspector provides automated vulnerability management, continuously scanning Lambda functions for software vulnerabilities and network exposure, using machine learning to prioritize findings and provide detailed remediation guidance.

Layer 6: Protecting credentials

Applications require access to sensitive credentials including database passwords, API keys, and encryption keys. Hardcoding secrets in code or storing them in environment variables creates security vulnerabilities, requiring secure storage, regular rotation, authorized-only access, and comprehensive auditing for compliance.

Secrets Manager protects access to applications, services, and IT resources without managing hardware security modules (HSMs). It provides centralized secret storage for database credentials, API keys, and OAuth tokens in an encrypted repository using AWS KMS encryption at rest.
Automatic secret rotation configures rotation for database credentials, automatically updating both the secret store and target database without application downtime.
Fine-grained access control uses IAM policies to control which users and services access specific secrets, implementing least-privilege access.
Audit trails log secret access in AWS CloudTrail for compliance and security investigations. VPC endpoint support is designed so that secret retrieval traffic doesn’t leave the AWS network.
Lambda integration enables functions to retrieve secrets programmatically at runtime, designed so that secrets aren’t stored in code or configuration files and can be rotated without redeployment.
GuardDuty provides AI-powered monitoring, detecting anomalous behavior patterns that could indicate credential compromise or unauthorized access.

Layer 7: Data protection

The data layer stores sensitive business information and customer data requiring protection both at rest and in transit. Data must be encrypted, access tightly controlled, and operations audited, while maintaining resilience against availability attacks and high performance.

Amazon DynamoDB is a fully managed NoSQL database providing built-in security features including:

Encryption at rest (using AWS-owned, AWS managed, or customer managed KMS keys)
Encryption in transit (TLS 1.2 or higher)
Fine-grained access control through IAM policies with item-level and attribute-level permissions
VPC endpoints for private connectivity
Point-in-Time Recovery for continuous backups
Streams for audit trails
Backup and disaster recovery capabilities
Global Tables for multi-AWS Region, multi-active replication designed to provide high availability and low-latency global access

GuarDuty and Amazon Bedrock provide AI-powered data protection:

GuardDuty monitors DynamoDB API activity through CloudTrail logs using machine learning to detect anomalous data access patterns including unusual query volumes, access from unexpected geographic locations, and data exfiltration attempts.
Amazon Bedrock analyzes DynamoDB Streams and CloudTrail logs to identify suspicious access patterns, correlate anomalies across multiple tables and time periods, generate natural language summaries of data access incidents for security teams, and recommend access control policy adjustments based on actual usage patterns versus configured permissions. This helps transform data protection from reactive monitoring to proactive threat hunting that can detect compromised credentials and insider threats.

Continuous monitoring

Even with comprehensive security controls at every layer, continuous monitoring is essential to detect threats that bypass defenses. Security requires ongoing real-time visibility, intelligent threat detection, and rapid response capabilities rather than one-time implementation.

GuardDuty protects your AWS accounts, workloads, and data with intelligent threat detection.
CloudWatch provides comprehensive monitoring and observability, collecting metrics, monitoring log files, setting alarms, and automatically reacting to AWS resource changes.
CloudTrail provides governance, compliance, and operational auditing by logging all API calls in your AWS account, creating comprehensive audit trails for security analysis and compliance reporting.
AI-powered enhancement with Amazon Bedrock provides automated threat analysis; generating natural language summaries of GuardDuty findings and CloudWatch logs, pattern recognition identifying coordinated attacks across multiple security signals, incident response recommendations based on your architecture and compliance requirements, security posture assessment with improvement recommendations, and automated response through Lambda and Amazon EventBridge that isolates compromised resources, revokes suspicious credentials, or notifies security teams through Amazon SNS when threats are detected.

Conclusion

Securing serverless microservices presents significant challenges, but as demonstrated, using AWS services alongside AI-powered capabilities creates a resilient defense-in-depth architecture that protects against current and emerging threats while proving that security and agility are not mutually exclusive.

Security is an ongoing process—continuously monitor your environment, regularly review security controls, stay informed about emerging threats and best practices, and treat security as a fundamental architectural principle rather than an afterthought.

Further reading

If you have feedback about this blog post, submit them in the Comments section below. If you have questions about using this solution, start a thread in the EventBridge, GuardDuty, or Security Hub forums, or contact AWS Support.

AWS Security Blog
Explore scaling options for AWS Directory Service for Microsoft Active Directory 30 January 2026 at 20:51

Explore scaling options for AWS Directory Service for Microsoft Active Directory

AWS Security Blog

By: Nahuel Benavidez

30 January 2026 at 20:51

You can use AWS Directory Service for Microsoft Active Directory as your primary Active Directory Forest for hosting your users’ identities. Your IT teams can continue using existing skills and applications while your organization benefits from the enhanced security, reliability, and scalability of AWS managed services. You can also run AWS Managed Microsoft AD as a resource forest. In this configuration, AWS Managed Microsoft AD serves supported AWS services while users’ identities remain under exclusive control of your organization on a self-managed Active Directory. As your organization grows and scales, so will your AWS Managed Microsoft AD deployments.

In this post, you’ll learn how to use Amazon CloudWatch dashboards to monitor key performance metrics of your AWS Managed Microsoft AD deployment to track and analyze a directory’s performance over time. You can then use that information to determine when and how best to scale directory services for optimal performance.

Scaling your Active Directory

When you deploy AWS Managed Microsoft AD, the service initially creates two domain controller instances in two separate subnets of the same virtual private cloud (VPC). This architecture economically provides resiliency and high availability with a minimal set of resources. This initial configuration enables every feature that AWS Managed Microsoft AD offers. As your organization grows, its workflows will become larger and more complex, requiring that you scale your directories accordingly. AWS Managed Microsoft AD simplifies and makes the scaling process secure with minimal administrative effort. When it’s time to scale a directory, AWS Managed Microsoft AD offers two options: scale-up or scale-out.

Understanding scale-up and scale-out

Scale-up—also called upgrading your AWS Managed Microsoft AD—means changing the edition of an AWS Managed Microsoft AD from Standard to Enterprise. Enterprise Edition delivers larger domain controller instances, with higher compute capacity and larger storage for Active Directory objects. When a directory scales up, it retains the same number of domain controller instances that it previously had with larger quotas. Instances are replaced one at a time to minimize disruptions to production workflows.

A few features offered by the service are a better fit for the size and compute power of Enterprise Edition AWS Managed Microsoft AD and so are only available in Enterprise Edition. Consider scaling-up your directory if you encounter any of the following scenarios:

You plan to replicate your directory across multiple AWS Regions. Multi-Region replication is only available in Enterprise Edition.
The number of Active Directory objects in the directory will exceed the recommended threshold of 30,000 objects for Standard Edition. Enterprise Edition can accommodate up to 500,000 directory objects.
You plan to share your directory with more than 25 other AWS accounts. The default directory sharing quota is 25 accounts for Standard Edition and 500 for Enterprise Edition.

Important: Scaling up a directory from Standard to Enterprise is a one-way operation that cannot be reverted and operates at a higher hourly price.

Scale-out means deploying additional domain controllers for your AWS Managed Microsoft AD. You can scale out both Standard or Enterprise directories and can scale out different Regions independently. You don’t need to scale every Region to the same number of domain controller instances. When scale-out takes place, additional domain controller instances with the same compute resources and storage capacity as existing ones are launched in the same subnets.

Because some operations cannot be reverted, it’s important to understand the impact of each scaling operation. It’s preferable to scale out the number of domain controllers first, because you can revert that change if necessary. Consider scaling up first only if you need a feature that’s only available in Enterprise Edition.

Making an informed decision using CloudWatch

Since December 2021, AWS Managed Microsoft AD helps optimize scaling decisions with directory metrics in Amazon CloudWatch. Amazon CloudWatch metrics are a time-ordered set of data-points about performance indicators of a system that you can use to monitor and analyze performance over time. Metrics are stored as a time-series set and each data point has an associated timestamp. By using CloudWatch, you can create alarms based on metrics and visualize and analyze metrics to derive new insights.

To understand the performance of a directory over time, define the key performance metrics based on your workload when you create the directory. Record the initial values of those metrics to create a performance baseline. Periodically revisit and compare data points for the same metrics to understand trends and use of resources over time. Based on the information provided by the performance baseline and periodic follow-ups, you can decide when to scale your directory and what scaling method to use. This process is depicted in Figure 1.

Figure 1: Decision-making process for scaling an Active Directory implementation

Depending on the characteristics of your workload, you might face different resource constraints in your directory system. From an infrastructure perspective, the more commonly demanded resources are:

Network Interface: Current Bandwidth
Processor: % Processor Time
LogicalDisk: % Free Space

From an Active Directory perspective, consider metrics such as:

NTDS: LDAP Searches/sec
NTDS: ATQ Estimated Queue Delay

The following table is an example decision matrix based on which resource is constrained.

Constrained resource	Recommended action
% Processor Time	Scale out
I/O Database Reads Average Latency	Scale out
Committed Bytes in Use	Scale out
% Free Space	Scale up

For example, you can create a CloudWatch alarm that will trigger when Processor: % Processor Time is over 80% for more than 5 minutes. If this alarm triggers often, it could be a signal that domain controller instances are struggling to service the regular volume of user authentication requests. In such a scenario, you might consider scaling-out an additional domain controller to guarantee the service’s SLA. Conversely, if the LogicalDisk: % Free Space drops below 10% and trends downwards, you might consider scaling-up to Enterprise Edition, because it provides a larger capacity for directory objects.

To facilitate tracking and analyzing performance of AWS Managed Microsoft AD over time, you can use Amazon CloudWatch to create a custom dashboard including relevant metrics.

Prerequisites

Before you get started, make sure that you have the following prerequisites in place:

An AWS account
An AWS Identity and Access Management (IAM) user or role with permissions to perform AWS Directory Service operations and CloudWatch operations
An Amazon Virtual Private Cloud (Amazon VPC) VPC configured in each Region
At least two private subnets in the VPC
An AWS Managed Microsoft AD directory

Create a CloudWatch dashboard

With the prerequisites in place, you’re ready to create a CloudWatch dashboard to track directory service metrics. For more information, see Getting started with CloudWatch automatic dashboards.

To create a dashboard:

Open the AWS Management Console for CloudWatch.
In the navigation pane, choose Dashboards, and then choose Create dashboard.
In the Create new dashboard dialog box, enter a name for the dashboard and then choose Create dashboard.
When the Add widget window appears:
1. Under Data sources types, select CloudWatch.
2. Under Data type, select Metrics.
3. Under Widget type, select Line.
4. Choose Next.
In the Add metric graph window, choose DirectoryService and then select Processor as the Metric category and % Processor Time under Metric name. Select each instance of the metric, represented as the Domain Controller IP, for one Directory ID.
Choose Create widget.

Note: if there are multiple directories in the same Region, all instances (domain controllers IPs) will be available for selection. To help ensure effective monitoring and alarms, create a separate dashboard for each directory.
Choose the plus sign (+) at the top of the window to add more widgets. Repeat steps 1–6 to add additional widgets for other relevant metrics. In this example the metric categories and names added are:
- Processor: % Processor Time
- LogicalDisk: % Free Space
- Memory: Committed Bytes in Use
- Database: I/O Database Reads Average Latency
- Network Interface: Current Bandwidth
- DNS: Recursive Queries/Sec
After adding the desired metrics, choose Save.

Figure 2: CloudWatch dashboard showing directory services metrics

(Optional) Create an alarm in CloudWatch

Now that you have a dashboard where you can view metrics, consider setting up CloudWatch alarms to alert you when a metric reaches or goes beyond a specified threshold. For more information, see Create a CloudWatch alarm based on a static threshold and Adding an alarm to a CloudWatch dashboard.

The following are recommended thresholds to monitor when determining the need to scale an AWS Managed Microsoft AD. These are general recommendations based on standard use cases. You might have to adjust these thresholds to make the best scaling decisions for your organization.

Processor: % Processor Time: Monitor CPU utilization to understand computational demands on your domain controllers. Set CloudWatch alarms at 80% for a period of 5 minutes. Sustained high values indicate potential sizing issues that might require scaling out your directory.
LogicalDisk: % Free Space: Maintain at least 25% free space on volumes containing Active Directory data for optimal performance. Set CloudWatch alarms to trigger when free space drops below 20%. Low disk space can severely impact directory operations and require implementing cleanup procedures or scaling up the directory.
Network Interface: Current Bandwidth: Average network utilization should be kept below 50% of available bandwidth during peak operations for optimal directory responsiveness. Set CloudWatch alarms at 70% utilization to allow room for spikes in activity. Consistently high values suggest network constraints that might require scaling out your directory.
Memory: Committed Bytes in Use: Monitor memory commitment levels to help ensure that your domain controllers have sufficient memory resources for Active Directory operations. This metric tracks the amount of virtual memory that has been committed, indicating the total memory load on your domain controllers. Set CloudWatch alarms at 80% of the commit limit. Sustained high values can lead to excessive paging, significantly degrading directory performance and potentially causing authentication delays.
Database: I/O Database Reads Average Latency: Maintain average read latencies below 25 milliseconds. Set CloudWatch alarms at a threshold of 50 milliseconds. If read latencies are consistently elevated, consider scaling-out your directory.
DNS: Recursive Queries/sec: Given the tight integration of Active Directory with DNS, monitor this metric for stability and predictable patterns. Use CloudWatch anomaly detection rather than fixed thresholds to identify unexpected behaviors that could indicate DNS configuration issues or potential security concerns.

Post-scaling considerations

Different resources across your architecture might contain references to the IP addresses of the AWS Managed Microsoft AD. After a scale-out operation that deploys additional domain controller instances on a directory, update existing references to maintain full functionality of workloads. References for the directory’s IP addresses can be found (but might not be limited to) the following services:

To maintain the full functionality of your workloads after a directory scaling operation, update the following:

Firewall rules that allow traffic to and from the IP addresses of domain controller instances
Route53 Resolver endpoint rules and DNS conditional forwarders that forward queries to the directory instances
CloudWatch dashboards that display metric data about the directory to include dimensions for the new IP addresses

Clean up resources

In this post, you created components that generate costs. Clean up these resources when no longer required to avoid additional charges.

Remove added domain controller’s IP addresses from firewall rules, resolver endpoint rules and DNS conditional forwarders.
Delete the custom CloudWatch dashboards you don’t plan to keep.
Scale back existing directories to the previous number of domain controller instances.

Conclusion

In this post, you learned how to monitor directory performance metrics using Amazon CloudWatch. By combining performance baselines, monitoring, and planning, you can make informed decisions about when and how to scale a directory safely and efficiently. By scaling directories in a timely manner, you can optimize efficiency and reduce the risk of outages by having a right-sized directory service to support your organization’s workloads.

Scale out your directory when your Active Directory-aware workflows have grown over time and the solution requires additional domain controller instances to maintain the service SLA. Scale up your directory when you require a feature that’s only available in Enterprise Edition AWS Managed Microsoft AD, such as multi-Region replication or additional storage to accommodate Active Directory objects. By using the flexible scaling capabilities and independent Regional expansion, you can optimize costs while maintaining appropriate service levels.

To learn more about AWS Managed Microsoft AD optimization and monitoring with Amazon CloudWatch, see:

AWS Security Blog
How to get started with security response automation on AWS 29 January 2026 at 20:44

How to get started with security response automation on AWS

AWS Security Blog

By: Cameron Worrell

29 January 2026 at 20:44

December 2, 2019: Original publication date of this post.

At AWS, we encourage you to use automation. Not just to deploy your workloads and configure services, but to also help you quickly detect and respond to security events within your AWS environments. In addition to increasing the speed of detection and response, automation also helps you scale your security operations as your workloads in AWS increase and scale as well. For these reasons, security automation is a key principle outlined in the Well-Architected Framework, the AWS Cloud Adoption Framework, and the AWS Security Incident Response Guide.

Security response automation is a broad topic that spans many areas. The goal of this blog post is to introduce you to core concepts and help you get started. You will learn how to implement automated security response mechanisms within your AWS environments. This post will include common patterns that customers often use, implementation considerations, and an example solution. Additionally, we will share resources AWS has produced in the form of the Automated Security Response GitHub repo. The GitHub repo includes scripts that are ready-to-deploy for common scenarios.

What is security response automation?

Security response automation is a planned and programmed action taken to achieve a desired state for an application or resource based on a condition or event. When you implement security response automation, you should adopt an approach that draws from existing security frameworks. Frameworks are published materials which consist of standards, guidelines, and best practices in order help organizations manage cybersecurity-related risk. Using frameworks helps you achieve consistency and scalability and enables you to focus more on the strategic aspects of your security program. You should work with compliance professionals within your organization to understand any specific compliance or security frameworks that are also relevant for your AWS environment.

Our example solution is based on the NIST Cybersecurity Framework (CSF), which is designed to help organizations assess and improve their ability to help prevent, detect, and respond to security events. According to the CSF, “cybersecurity incident response” supports your ability to contain the impact of potential cybersecurity events.

Although automation is not a CSF requirement, automating responses to events enables you to create repeatable, predictable approaches to monitoring and responding to threats. When we build automation around events that we know should not occur, it gives us an advantage over a malicious actor because the automation is able to respond within minutes or even seconds compared to an on-call support engineer.

The five main steps in the CSF are identify, protect, detect, respond and recover. We’ve expanded the detect and respond steps to include automation and investigation activities.

Figure 1: The five steps in the CSF

The following definitions for each step in the diagram above are based on the CSF but have been adapted for our example in this blog post. Although we will focus on the detect, automate and respond steps, it’s important to understand the entire process flow.

Identify: Identify and understand the resources, applications, and data within your AWS environment.
Protect: Develop and implement appropriate controls and safeguards to facilitate the delivery of services.
Detect: Develop and implement appropriate activities to identify the occurrence of a cybersecurity event. This step includes the implementation of monitoring capabilities which will be discussed further in the next section.
Automate: Develop and implement planned, programmed actions that will achieve a desired state for an application or resource based on a condition or event.
Investigate: Perform a systematic examination of the security event to establish the root cause.
Respond: Develop and implement appropriate activities to take automated or manual actions regarding a detected security event.
Recover: Develop and implement appropriate activities to maintain plans for resilience and to restore capabilities or services that were impaired due to a security event

Security response automation on AWS

AWS CloudTrail and AWS Config continuously log details regarding users and other identity principals, the resources they interacted with, and configuration changes they might have made in your AWS account. We are able to combine these logs with Amazon EventBridge, which gives us a single service to trigger automations based on events. You can use this information to automatically detect resource changes and to react to deviations from your desired state.

Figure 2: Automated remediation flow

As shown in the diagram above, an automated remediation flow on AWS has three stages:

Monitor: Your automated monitoring tools collect information about resources and applications running in your AWS environment. For example, they might collect AWS CloudTrail information about activities performed in your AWS account, usage metrics from your Amazon EC2 instances, or flow log information about the traffic going to and from network interfaces in your Amazon Virtual Private Cloud (VPC).
Detect: When a monitoring tool detects a predefined condition—such as a breached threshold, anomalous activity, or configuration deviation—it raises a flag within the system. A triggering condition might be an anomalous activity detected by Amazon GuardDuty, a resource out of compliance with an AWS Config rule, or a high rate of blocked requests on an Amazon VPC security group or AWS Web Application Firewall (AWS WAF) web access control list (web-acl).
Respond: When a condition is flagged, an automated response is triggered that performs an action you’ve predefined—something intended to remediate or mitigate the flagged condition.

Examples of automated response actions may include modifying a VPC security group, patching an Amazon EC2 instance, rotating various different types of credentials, or adding an additional entry into an IP set in AWS WAF that is part of a web-acl rule to block suspicious clients who triggered a threshold from a monitoring metric.

You can use the event-driven flow described above to achieve a variety of automated response patterns with varying degrees of complexity. Your response pattern could be as simple as invoking a single AWS Lambda function, or it could be a complex series of AWS Step Function tasks with advanced logic. In this blog post, we’ll use two simple Lambda functions in our example solution.

How to define your response automation

Now that we’ve introduced the concept of security response automation, start thinking about security requirements within your environment that you’d like to enforce through automation. These design requirements might come from general best practices you’d like to follow, or they might be specific controls from compliance frameworks relevant for your business.

Customers start with the run-books they already use as part of their Incident Response Lifecycle. Simple run-books, like responding to an exfiltrated credential, can be quickly mapped to automation especially if your run book calls for the disabling of the credential and the notification of on-call personnel. But it can be resource driven as well. Events such as a new AWS VPC being created might trigger your automation to immediately deploy your company’s standard configuration for VPC flowlog collection.

Your objectives should be quantitative, not qualitative. Here are some examples of quantitative objectives:

Remote administrative network access to servers should be limited.
Server storage volumes should be encrypted.
AWS console logins should be protected by multi-factor authentication.

As an optional step, you can expand these objectives into user stories that define the conditions and remediation actions when there is an event. User stories are informal descriptions that briefly document a feature within a software system. User stories may be global and span across multiple applications or they may be specific to a single application.

For example:

“Remote administrative network access to servers should have limited access from internal trusted networks only. Remote access ports include SSH TCP port 22 and RDP TCP port 3389. If remote access ports are detected within the environment and they are accessible to outside resources, they should be automatically closed and the owner will be notified.”

Once you’ve completed your user story, you can determine how to use automated remediation to help achieve these objectives in your AWS environment. User stories should be stored in a location that provides versioning support and can reference the associated automation code.

You should carefully consider the effect of your remediation mechanisms in order to help prevent unintended impact on your resources and applications. Remediation actions such as instance termination, credential revocation, and security group modification can adversely affect application availability. Depending on the level of risk that’s acceptable to your organization, your automated mechanism can only provide a notification which would then be manually investigated prior to remediation. Once you’ve identified an automated remediation mechanism, you can build out the required components and test them in a non-production environment.

Sample response automation walkthrough

In the following section, we’ll walk you through an automated remediation for a simulated event that indicates potential unauthorized activity—the unintended disabling of CloudTrail logging. Outside parties might want to disable logging to avoid detection and the recording of their unauthorized activity. Our response is to re-enable the CloudTrail logging and immediately notify the security contact. Here’s the user story for this scenario:

“CloudTrail logging should be enabled for all AWS accounts and regions. If CloudTrail logging is disabled, it will automatically be enabled and the security operations team will be notified.”

A note about the sample response automation below as it references Amazon EventBridge: EventBridge was formerly referred to as Amazon CloudWatch Events. If you see other documentation referring to Amazon CloudWatch, you can find that configuration now via the Amazon EventBridge console page.

Additionally, we will be looking at this scenario through the lens of an account that has a stand-alone CloudTrail configuration. While this is an acceptable configuration, AWS recommends using AWS Organizations, which allows you to configure an organizational CloudTrail. These organizational trails are immutable to the child accounts so that logging data cannot be removed or tampered with.

In order to use our sample remediation, you will need to enable Amazon GuardDuty and AWS Security Hub in the AWS Region you have selected. Both of these services include a 30-day trial at no additional cost. See the AWS Security Hub pricing page and the Amazon GuardDuty pricing page for additional details.

Important: You’ll use AWS CloudTrail to test the sample remediation. Running more than one CloudTrail trail in your AWS account will result in charges based on the number of events processed while the trail is running. Charges for additional copies of management events recorded in a Region are applied based on the published pricing plan. To minimize the charges, follow the clean-up steps that we provide later in this post to remove the sample automation and delete the trail.

Deploy the sample response automation

In this section, we’ll show you how to deploy and test the CloudTrail logging remediation sample. Amazon GuardDuty generates the finding

Stealth:IAMUser/CloudTrailLoggingDisabled when CloudTrail logging is disabled, and AWS Security Hub collects findings from GuardDuty using the standardized finding format mentioned earlier. We recommend that you deploy this sample into a non- production AWS account.

Select the Launch Stack button below to deploy a CloudFormation template with an automation sample in the us-east-1 Region. You can also download the template and implement it in another Region. The template consists of an Amazon EventBridge rule, an AWS Lambda function, and the IAM permissions necessary for both components to execute. It takes several minutes for the CloudFormation stack build to complete.

In the CloudFormation console, choose the Select Template form, and then select Next.
On the Specify Details page, provide the email address for a security contact. For the purpose of this walkthrough, it should be an email address that you have access to. Then select Next.
On the Options page, accept the defaults, then select Next.
On the Review page, confirm the details, then select Create.
While the stack is being created, check the inbox of the email address that you provided in step 2. Look for an email message with the subject AWS Notification – Subscription Confirmation. Select the link in the body of the email to confirm your subscription to the Amazon Simple Notification Service (Amazon SNS) topic. You should see a success message like the one shown in Figure 3:

Figure 3: SNS subscription confirmation
Return to the CloudFormation console. After the Status field for the CloudFormation stack changes to CREATE COMPLETE (as shown in Figure 4), the solution is implemented and is ready for testing.

Figure 4: CREATE_COMPLETE status

Test the sample automation

You’re now ready to test the automated response by creating a test trail in CloudTrail, then trying to stop it.

From the AWS Management Console, choose Services > CloudTrail.
Select Trails, then select Create Trail.
On the Create Trail form:
1. Enter a value for Trail name and for AWS KMS alias, as shown in Figure 5.
2. For Storage location, create a new S3 bucket or choose an existing one. For our testing, we create a new S3 bucket.
  
  Figure 5: Create a CloudTrail trail
3. On the next page, under Management events, select Write-only (to minimize event volume).
  
  Figure 6: Create a CloudTrail trail
On the Trails page of the CloudTrail console, verify that the new trail has started. You should see the status as logging, as shown in Figure 7.

Figure 7: Verify new trail has started
You’re now ready to act like an unauthorized user trying to cover their tracks. Stop the logging for the trail that you just created:
1. Select the new trail name to display its configuration page.
2. In the top-right corner, choose the Stop logging button.
3. When prompted with a warning dialog box, select Stop logging.
4. Verify that the logging has stopped by confirming that the Start logging button now appears in the top right, as shown in Figure 8.
  
  Figure 8: Verify logging switch is off
You have now simulated a security event by disabling logging for one of the trails in the CloudTrail service. Within the next few seconds, the near real-time automated response will detect the stopped trail, restart it, and send an email notification. You can refresh the Trails page of the CloudTrail console to verify through the Stop logging button at the top right corner.

Within the next several minutes, the investigatory automated response will also begin. GuardDuty will detect the action that stopped the trail and enrich the data about the source of unexpected behavior. Security Hub will then ingest that information and optionally correlate with other security events.

Following the steps below, you can monitor findings within Security Hub for the finding type TTPs/Defense Evasion/Stealth:IAMUser-CloudTrailLoggingDisabled to be generated:
In the AWS Management Console, choose Services > Security Hub.
1. In the left pane, select Findings.
2. Select the Add filters field, then select Type.
3. Select EQUALS, paste TTPs/Defense Evasion/Stealth:IAMUser-CloudTrailLoggingDisabled into the field, then select Apply.
4. Refresh your browser periodically until the finding is generated.
Figure 9: Monitor Security Hub for your finding
Select the title of the finding to review details. When you’re ready, you can choose to archive the finding by selecting the Archive link. Alternately, you can select a custom action to continue with the response. Custom actions are one of the ways that you can integrate Security Hub with custom partner solutions.

Now that you’ve completed your review of the finding, let’s dig into the components of automation.

How the sample automation works

This example incorporates two automated responses: a near real-time workflow and an investigatory workflow. The near real-time workflow provides a rapid response to an individual event, in this case the stopping of a trail. The goal is to restore the trail to a functioning state and alert security responders as quickly as possible. The investigatory workflow still includes a response to provide defense in depth and uses services that support a more in-depth investigation of the incident.

Figure 10: Sample automation workflow

In the near real-time workflow, Amazon EventBridge monitors for the undesired activity.

When a trail is stopped, AWS CloudTrail publishes an event on the EventBridge bus. An EventBridge rule detects the trail-stopping event and invokes a Lambda function to respond to the event by restarting the trail and notifying the security contact via an Amazon Simple Notification Service (SNS) topic.

In the investigative workflow, CloudTrail logs are monitored for undesired activities. For example, if a trail is stopped, there will be a corresponding log record. GuardDuty detects this activity and retrieves additional data points regarding the source IP that executed the API call. Two common examples of those additional data points in GuardDuty findings include whether the API call came from an IP address on a threat list, or whether it came from a network not commonly used in your AWS account. An AWS Lambda function responds by restarting the trail and notifying the security contact. The finding is imported into AWS Security Hub, where it’s aggregated with other findings for analyst viewing. Using EventBridge, you can configure Security Hub to export the finding to partner security orchestration tools, SIEM (security information and event management) systems, and ticketing systems for investigation.

AWS Security Hub imports findings from AWS security services such as GuardDuty, Amazon Macie and Amazon Inspector, plus from third-party product integrations you’ve enabled. Findings are provided to Security Hub in AWS Security Finding Format (ASFF), which minimizes the need for data conversion. Security Hub correlates these findings to help you identify related security events and determine a root cause. Security Hub also publishes its findings to Amazon EventBridge to enable further processing by other AWS services such as AWS Lambda. You can also create custom actions using Security Hub. Custom actions are useful for security analysts working with the Security Hub console who want to send a specific finding, or a small set of findings, to a response or a remediation workflow.

Deeper look into how the “Respond” phase works

Amazon EventBridge and AWS Lambda work together to respond to a security finding.

Amazon EventBridge is a service that provides real-time access to changes in data in AWS services, your own applications, and Software-as-a-Service (SaaS) applications without writing code. In this example, EventBridge identifies a Security Hub finding that requires action and invokes a Lambda function that performs remediation. As shown in Figure 11, the Lambda function both notifies the security operator via SNS and restarts the stopped CloudTrail.

Figure 11: Sample “respond” workflow

To set this response up, we looked for an event to indicate that a trail had stopped or was disabled. We knew that the GuardDuty finding Stealth:IAMUser/CloudTrailLoggingDisabled is raised when CloudTrail logging is disabled. Therefore, we configured the default event bus to look for this event.

You can learn more regarding the available GuardDuty findings in the user guide.

How the code works

When Security Hub publishes a finding to EventBridge, it includes full details of the finding as discovered by GuardDuty. The finding is published in JSON format. If you review the details of the sample finding, note that it has several fields helping you identify the specific events that you’re looking for. Here are some of the relevant details:

{
   …
   "source":"aws.securityhub",
   …
   "detail":{
      "findings": [{
		…
    	“Types”: [
			"TTPs/Defense Evasion/Stealth:IAMUser-CloudTrailLoggingDisabled"
			],
		…
      }]
}

You can build an event pattern using these fields, which an EventBridge filtering rule can then use to identify events and to invoke the remediation Lambda function. Below is a snippet from the CloudFormation template we provided earlier that defines that event pattern for the EventBridge filtering rule:

# pattern matches the nested JSON format of a specific Security Hub finding
      EventPattern:
        source:
        - aws.securityhub
        detail-type:
          - "Security Hub Findings - Imported"
        detail:
          findings:
            Types:
              - "TTPs/Defense Evasion/Stealth:IAMUser-CloudTrailLoggingDisabled"

Once the rule is in place, EventBridge continuously monitors the event bus for events with this pattern.

When EventBridge finds a match, it invokes the remediating Lambda function and passes the full details of the event to the function. The Lambda function then parses the JSON fields in the event so that it can act as shown in this Python code snippet:

# extract trail ARN by parsing the incoming Security Hub finding (in JSON format)
trailARN = event['detail']['findings'][0]['ProductFields']['action/awsApiCallAction/affectedResources/AWS::CloudTrail::Trail']   

# description contains useful details to be sent to security operations
description = event['detail']['findings'][0]['Description']

The code also issues a notification to security operators so they can review the findings and insights in Security Hub and other services to better understand the incident and to decide whether further manual actions are warranted. Here’s the code snippet that uses SNS to send out a note to security operators:

#Sending the notification that the AWS CloudTrail has been disabled.
snspublish = snsclient.publish(
	TargetArn = snsARN,
	Message="Automatically restarting CloudTrail logging.  Event description: \"%s\" " %description
	)

While notifications to human operators are important, the Lambda function will not wait to take action. It immediately remediates the condition by restarting the stopped trail in CloudTrail. Here’s a code snippet that restarts the trail to reenable logging:

try:
	client = boto3.client('cloudtrail')
	enablelogging = client.start_logging(Name=trailARN)
	logger.debug("Response on enable CloudTrail logging- %s" %enablelogging)
except ClientError as e:
	logger.error("An error occured: %s" %e)

After the trail has been restarted, API activity is once again logged and can be audited.

This can help provide relevant data for the remaining steps in the incident response process. The data is especially important for the post-incident phase, when your team analyzes lessons learned to help prevent future incidents. You can also use this phase to identify additional steps to automate in your incident response.

How to Enable Custom Action and build your own Automated Response

Unlike how you set up the notification earlier, you may not want fully automate responses to findings. To set up automation that you can manually trigger it for specific findings, you can use custom actions. A custom action is a Security Hub mechanism for sending selected findings to EventBridge that can be matched by an EventBridge rule. The rule defines a specific action to take when a finding is received that is associated with the custom action ID. Custom actions can be used, for example, to send a specific finding, or a small set of findings, to a response or remediation workflow. You can create up to 50 custom actions.

In this section, we will walk you through how to create a custom action in Security Hub which will trigger an EventBridge rule to execute a Lambda function for the same security finding related to CloudTrail Disabled.

Create a Custom Action in Security Hub

Open Security Hub. In the left navigation pane, under Management, open the Custom actions page.
Choose Create custom action.
Enter an Action Name, Action Description, and Action ID that are representative of an action that you are implementing—for example Enable CloudTrail Logging.
Choose Create custom action.
Copy the custom action ARN that was generated. You will need it in the next steps.

Create Amazon EventBridge Rule to capture the Custom Action

In this section, you will define an EventBridge rule that will match events (findings) coming from Security Hub which were forwarded by the custom action you defined above.

Navigate to the Amazon EventBridge console.
On the right side, choose Create rule.
On the Define rule detail page, give your rule a name and description that represents the rule’s purpose (for example, the same name and description that you used for the custom action). Then choose Next.
Security Hub findings are sent as events to the AWS default event bus. In the Define pattern section, you can identify filters to take a specific action when matched events appear. For the Build event pattern step, leave the Event source set to AWS events or EventBridge partner events.
Scroll down to Event pattern. Under Event source, leave it set to AWS Services, and under AWS Service, select Security Hub.
For the Event Type, choose Security Hub Findings – Custom Action.
Then select Specific custom action ARN(s) and enter the ARN for the custom action that you created earlier.
Notice that as you selected these options, the event pattern on the right was updating. Choose Next.
On the Select target(s) step, from the Select a target dropdown, select Lambda function. Then, from the Function dropdown, select SecurityAutoremediation-CloudTrailStartLoggingLamb-xxxx. This lambda function was created as part of the Cloudformation template.
Choose Next.
For the Configure tags step, choose Next.
For the Review and create step, choose Create rule.

Trigger the automation

As GuardDuty and Security Hub have been enabled, after AWS Cloudtrail logging is enabled, you should see a security finding generated by Amazon GuardDuty and collected in AWS Security Hub.

Navigate to the Security Hub Findings page.
In the top corner, from the Actions dropdown menu, select the Enable CloudTrail Logging custom action.
Verify the CloudTrail configuration by accessing the AWS CloudTrail dashboard.
Confirm that the trail status displays as Logging, which indicates the successful execution of the remediation Lambda function triggered by the EventBridge rule through the custom action.

How AWS helps customers get started

Many customers look at the task of building automation remediation as daunting. Many operations teams might not have the skills or human scale to take on developing automation scripts. Because many Incident Response scenarios can be mapped to findings in AWS security services, we can begin building tools that respond and are quickly adaptable to your environment.

Automated Security Response (ASR) on AWS is a solution that enables AWS Security Hub customers to remediate findings with a single click using sets of predefined response and remediation actions called Playbooks. The remediations are implemented as AWS Systems Manager automation documents. The solution includes remediations for issues such as unused access keys, open security groups, weak account password policies, VPC flow logging configurations, and public S3 buckets. Remediations can also be configured to trigger automatically when findings appear in AWS Security Hub.

The solution includes the playbook remediations for some of the security controls defined as part of the following standards:

AWS Foundational Security Best Practices (FSBP) v1.0.0
Center for Internet Security (CIS) AWS Foundations Benchmark v1.2.0
Center for Internet Security (CIS) AWS Foundations Benchmark v1.4.0
Center for Internet Security (CIS) AWS Foundations Benchmark v3.0.0
Payment Card Industry (PCI) Data Security Standard (DSS) v3.2.1
National Institute of Standards and Technology (NIST) Special Publication 800-53 Revision 5

A Playbook called Security Control is included that allows operation with AWS Security Hub’s Consolidated Control Findings feature.

Figure 12: Architecture of the Automated Security Solution

Additionally, the library includes instructions in the Implementation Guide on how to create new automations in an existing Playbook.

You can use and deploy this library into your accounts at no additional cost, however there are costs associated with the services that it consumes.

Clean up

After you’ve completed the sample security response automation, we recommend that you remove the resources created in this walkthrough example from your account in order to minimize the charges associated with the trail in CloudTrail and data stored in S3.

Important: Deleting resources in your account can negatively impact the applications running in your AWS account. Verify that applications and AWS account security do not depend on the resources you’re about to delete.

Here are the clean-up steps:

Delete the CloudFormation stack.
Delete the trail you created in CloudTrail.
If you created an S3 bucket for CloudTrail logs, you can also delete that S3 bucket.
New accounts can try GuardDuty at no cost for 30 days. You can suspend or disable GuardDuty before the trial period ends to avoid charges.
You can try AWS Security Hub at no cost with a 30-day free trial. You can avoid charges by disabling the service before the trial period is over.

Summary

You’ve learned the basic concepts and considerations behind security response automation on AWS and how to use Amazon EventBridge, Amazon GuardDuty and AWS Security Hub to automatically re-enable AWS CloudTrail when it becomes disabled unexpectedly. Additionally you got a chance to learn about the AWS Automated Security Response library and how it can help you rapidly get started with automations through Security Hub. As a next step, you may want to start building your own custom response automations and dive deeper into the AWS Security Incident Response Guide, NIST Cybersecurity Framework (CSF) or the AWS Cloud Adoption Framework (CAF) Security Perspective. You can explore additional automatic remediation solutions on the AWS Solution Library. You can find the code used in this example on GitHub.

If you have feedback about this blog post, submit them in the Comments section below. If you have questions about using this solution, start a thread in the
EventBridge, GuardDuty or Security Hub forums, or contact AWS Support.

Normal view

Understanding the inbound federation Lambda trigger

Common federation challenges and use cases

Use case 1: Filtering oversized group attributes

Use case 2: Automatic account linking

Best practices

Conclusion

Determine when a key was last used

About the tracking period

Getting started

API reference

Use case 1: Cost optimization through unused key cleanup

Solution with GetKeyLastUsage API

Use case 2: Preventing accidental key deletion with policy controls

Solution with policy based controls

Important considerations

Conclusion

Streamlined policy management with predefined categories

Prerequisites

Create a category rule using the console rule builder

Creating the same rule using Suricata compatible rule strings

Managing exceptions for approved services

How to monitor category usage

Querying logs with CloudWatch Logs Insights

Most accessed categories

Least accessed categories

Most accessed categories, allowed traffic only

Most accessed categories, blocked traffic only

Drill down into a specific category

Bandwidth consumption by category

Conclusion

Use temporary credentials and grant least privilege

Implement defense in depth

Artifact signing as part of defense in depth

Centralize dependency management

npm provenance attestation

Scan dependencies throughout the software development lifecycle

Configure logging and monitoring

Additional best practices

Conclusion

Learn more

Organize policies around recurring patterns

Where OPA fits in a layered governance model

How to implement policy validation in your CI/CD pipeline

Structure your policy library by control domain and intent

Example 1: Enforce secure transport for Amazon S3

Example 2: Restrict public ingress on sensitive ports

Example 3: Enforce least privilege trust policy for IAM roles

CI/CD implementation examples

Retain validation artifacts for review and audit support

Test the policy model like software

A phased approach to rolling out policy checks

Conclusion

Background

Working with multiple Regional endpoints

Solution overview

Prerequisites

Phase 1: Redirect to a single predefined access portal endpoint

Create a Route 53 hosted zone for your vanity domain

Delegate your subdomain from the parent domain

Request an ACM certificate

Create a security group for Identity Center ALB

Create an ALB with an HTTP and HTTPS redirect rule

Create Regional Route 53 records pointing to your ALB

Add latency-based routing configurations

Phase 2: Automatically route to the nearest Regional access portal endpoint

Request ACM certificates in each additional Region

Create a security group and an ALB in each additional Region

Create Regional and latency Route 53 records for the additional Region

Phase 3: Regional failover using ARC Region switch

Create an active-active instance of ARC Region switch plan

Update Route 53 record sets to reference ARC-managed health checks

Validate the setup by performing a failover

Deploying with CloudFormation

Deploy all phases with a single script

Before running the script, open the deploy.sh file and update the following required parameters with your environment-specific values:

Conclusion

Resources

Solution overview

How to prioritize your migrations