Module 58: Change, Incident, and Problem Management
The CCSP exam expects you to distinguish between incidents (unplanned disruptions), problems (root causes), and changes (controlled modifications). These are not interchangeable terms. The exam tests whether you follow the correct process for each, and whether you understand how cloud environments change these processes.
Change Management in Cloud
Change management controls modifications to cloud infrastructure, applications, and configurations. In cloud environments, changes happen through APIs and IaC pipelines, often at a pace that traditional change advisory boards cannot match. The exam tests how to adapt change management for cloud speed without sacrificing control.
Types of Changes
- Standard changes: Pre-approved, low-risk, routine changes. In cloud environments, deploying a pre-approved golden image or scaling within approved limits qualifies. The exam expects you to pre-approve common cloud operations as standard changes.
- Normal changes: Require review, assessment, and approval. New security group rules, IAM policy changes, and new service deployments are normal changes in cloud environments.
- Emergency changes: Expedited changes for critical issues. The exam tests whether emergency changes still go through a streamlined approval process and are documented after the fact.
The exam pattern: when a question presents a cloud change that bypasses all review and causes a security incident, the answer is to implement or strengthen change management. When a question describes change management slowing down critical security patches, the answer is to use emergency change procedures, not to skip change management entirely.
Exam trap: The exam never accepts "skip change management" as a correct answer. Even emergency situations have expedited change procedures. The process may be faster, but it is never absent.
Incident Management
Incident management restores normal service as quickly as possible after an unplanned disruption. The CCSP exam tests cloud-specific incident management considerations:
Incident Classification
The exam expects you to classify incidents by impact and urgency. A security incident affecting production data is higher priority than a performance degradation in a development environment. Classification drives the response — high-impact, high-urgency incidents escalate immediately.
Cloud Incident Response Challenges
- Shared responsibility: When an incident occurs, determining whether the root cause is in the CSP's domain or the customer's domain affects the response. The exam tests whether you engage the CSP's incident response process when the cause is in their infrastructure.
- Evidence volatility: Cloud resources are ephemeral. An auto-scaled instance involved in an incident may be terminated before evidence is collected. The exam tests whether you preserve evidence (snapshots, logs, memory captures) before destroying or rebuilding resources.
- Multi-tenant implications: An incident in a multi-tenant environment may affect the provider or other tenants. The exam tests whether you consider notification obligations.
Incident Response Steps
The exam follows the standard incident response lifecycle: preparation, detection and analysis, containment, eradication, recovery, and lessons learned. In cloud environments, containment may involve isolating instances by modifying security groups, revoking credentials, or disabling API keys rather than physically disconnecting systems.
Problem Management
Problem management identifies and addresses root causes to prevent incident recurrence. The exam tests the distinction between incident management (restore service now) and problem management (prevent it from happening again).
Reactive Problem Management
Analyzing incidents after they occur to find root causes. The exam tests whether you conduct root cause analysis for recurring incidents. If the same type of security incident keeps happening, incident management alone is insufficient — problem management must address the underlying cause.
Proactive Problem Management
Identifying potential problems before they cause incidents. The exam tests whether you analyze trends, review configurations, and run chaos engineering experiments to find weaknesses before they manifest as incidents.
Integration of the Three Processes
The exam tests how these processes work together. A poorly managed change can cause an incident. An incident investigation reveals a problem. A problem resolution requires a change. Understanding this cycle is critical for exam scenarios that span multiple process domains.
Consider: a cloud engineer deploys a change without review (change management failure). The change creates a security vulnerability. An attacker exploits it (incident). Investigation reveals the root cause was inadequate change review (problem). The fix requires both a technical remediation (change) and a process improvement (change management strengthening).
Common Exam Traps
- Confusing incident and problem: Incident management restores service. Problem management prevents recurrence. They are different processes.
- Skipping post-incident review: Lessons learned and root cause analysis are not optional. The exam expects them for every significant incident.
- Emergency without documentation: Emergency changes must be documented retroactively. The exam does not accept undocumented changes.
- Destroying evidence during recovery: In cloud environments, the urgency to rebuild can destroy forensic evidence. The exam expects evidence preservation before recovery.
Key Takeaways for the Exam
Change management adapts to cloud speed through standard pre-approved changes but never disappears. Incident management must account for shared responsibility, evidence volatility, and cloud-specific containment methods. Problem management addresses root causes to prevent recurrence. These three processes are interconnected — changes cause incidents, incidents reveal problems, problems require changes. Documentation and post-incident review are non-negotiable.