STS token failure in AWS leads to ClickHouse Cloud Outage
Incident Report for ClickHouse Cloud
Resolved
We received confirmation from AWS support that the issue has been resolved, and our monitoring no longer detects any errors. This incident is now considered resolved. For additional details regarding the AWS IAM issue, you can refer to https://health.aws.amazon.com/health/status?eventID=arn:aws:health:global::event/IAM/AWS_IAM_OPERATIONAL_ISSUE/AWS_IAM_OPERATIONAL_ISSUE_62881_637B393821C
Posted Dec 18, 2023 - 04:02 UTC
Monitoring
Our team has validated that the creation of new ClickHouse services is operational across all AWS regions. While awaiting the definitive confirmation from the AWS support team, we will continue to monitor all regions for any potential issues.
Posted Dec 18, 2023 - 03:50 UTC
Identified
Update: The AWS support team has verified that this issue specifically impacts the creation and modification of IAM roles, users, and policies, with no impact on existing IAM configurations. Our team conducted a thorough check with our internal services, existing customer instances, and observed no disruptions. Consequently, we can deduce that this incident solely affects newly created ClickHouse instances.
Posted Dec 18, 2023 - 03:36 UTC
Investigating
We have just been informed about a persistent problem affecting the AWS IAM STS token endpoint. This issue has led to disruptions in ClickHouse services for all customers in all AWS regions. Currently, it is not possible to provision new services, and a small number of existing services may experience some disruption. Our team is collaborating with AWS to resolve this issue at the earliest opportunity.
Posted Dec 18, 2023 - 03:19 UTC
This incident affected: Amazon Web Services (AWS) (us-east-1, us-east-2, us-west-2, eu-central-1, eu-west-1, ap-south-1, ap-southeast-1, ap-southeast-2).