Experiencing networking issues in GCP regions
Incident Report for Incorta Cloud
Resolved
The incident has be resolved.

Incident Report:

### Summary:
On 17 February , Network Data plane Takes Longer than Usual To Initialize.
Customers in multiple GCP domains (cloud3, cloud4, cloud5 and cloud6) suffers from spark workloads pending for healthy compute nodes longer than usual.

### Root Cause:
Forced upgrade for network data plane leads to degraded performance of network data plane of workers.
The aim of the forced upgrade was to provide better secure and performance for management data planes.
The forced upgrades was tested in internal different regions, but due to high concurrency and requests in cloud4, cloud5 and cloud6 it took some time to stabilize network data plane.

### Remediation and Prevention:
Incorta Cloud Engineers alerted with workloads takes longer than usual, started immediately to investigate and mitigate the. issue within an hour of issue discovery.

High demand of spark executors and slowness of spark executors to starts was correlated with the forced upgrade event of the data plane.

If your service or application was affected, we apologize — this is not the level of quality and reliability we strive to offer you. Incorta Cloud is committed to preventing a repeat of this issue in the future and is completing the following actions:

[Mitigation] Upgrade deprecated compute workloads. [Done]
[Prevent] Exclude GKE clusters from the forced upgrade window of cloud provider. [Done]
[Monitor] Improve monitoring of spark workload failures in Arcane. [In progress]
Posted Feb 17, 2024 - 09:03 UTC
Monitoring
Mitigation applied and currently monitoring network data plane
Posted Feb 17, 2024 - 08:17 UTC
Update
We are continuing to work on a fix for this issue.
Posted Feb 17, 2024 - 08:16 UTC
Update
We are continuing to work on a fix for this issue.
Posted Feb 17, 2024 - 07:44 UTC
Update
mitigation is in progress for GCP - Europe West
Posted Feb 17, 2024 - 07:01 UTC
Identified
The issue has been identified and fix implementation is in progress

Issue: Forced upgrade for network data plane leads to incompatible versions between master and workers
Posted Feb 17, 2024 - 06:26 UTC
Investigating
We are currently investigating the issue
Posted Feb 17, 2024 - 06:01 UTC
This incident affected: Google Cloud Platform Regions (GCP - US Central 1 (Iowa), GCP - US West 1 (Oregon), GCP - Europe West 2 (London), GCP - Middle East Central 1 (Doha), GCP - Middle East Central 2 (Dammam), GCP - Asia South East 1 (Singapore), GCP - Asia North East 1 (Tokyo)).