Module 23: Data Retention, Deletion, and Archiving
Retention questions test your ability to balance competing requirements: regulations that mandate keeping data, regulations that mandate deleting data, and the practical challenges of doing either in distributed cloud environments.
Data Retention Policies
A data retention policy defines how long specific categories of data must be kept. Retention requirements come from multiple sources: regulations (HIPAA requires 6 years for medical records), industry standards (PCI DSS requires 1 year of audit logs), contractual obligations, and business needs.
The exam tests your understanding that retention is not optional — keeping data longer than required is a liability (more data to breach, more data to manage), and deleting data too early is a compliance violation. The correct retention period is precisely what is required, no more and no less.
Conflicting Requirements
Different regulations may impose different retention periods for the same data. GDPR's data minimization principle encourages deleting data as soon as possible, while financial regulations may require keeping it for years. The exam expects you to identify these conflicts and resolve them by retaining data for the longest applicable requirement while applying appropriate protections throughout.
Data Deletion in the Cloud
Deleting data in cloud environments is harder than it sounds. Cloud data may exist in:
- Primary storage locations
- Replicated copies across availability zones and regions
- Backup snapshots
- Archived copies in cold storage
- CDN caches
- Log files and analytics systems
- Temporary processing caches
True deletion requires confirming that all copies are removed from all locations. This is why crypto-shredding is the preferred cloud deletion method — destroying encryption keys renders all copies unrecoverable regardless of location.
Exam insight: When a question describes data deletion in the cloud, verify that the answer addresses ALL copies, not just primary storage. An answer that deletes data from the primary database but ignores backups and replicas is incomplete.
Data Archiving
Archiving moves data to lower-cost, long-term storage while maintaining accessibility for compliance and legal requirements. Cloud providers offer archive storage tiers (S3 Glacier, Azure Archive, GCS Archive) with lower costs but higher retrieval times.
Archival Considerations
- Encryption longevity: Will the encryption algorithm remain secure for the entire retention period?
- Format accessibility: Will the data format be readable when retrieval is needed years later?
- Key management: Encryption keys for archived data must be preserved and accessible for the entire retention period.
- Retrieval SLA: Archive storage may take hours to retrieve. Does the retention policy require faster access?
Right to Be Forgotten (GDPR Article 17)
GDPR gives individuals the right to request deletion of their personal data. In cloud environments, this requires identifying all locations where the individual's data exists (requiring a comprehensive data map) and deleting or anonymizing it across all systems. The exam tests whether you understand the operational complexity of fulfilling erasure requests in distributed cloud architectures.
Secure Deletion Methods
- Crypto-shredding: Destroy encryption keys. Most practical for cloud.
- Logical deletion: Removing references to data. Data may still exist on physical media. Insufficient for compliance.
- Overwriting: Writing over data with patterns. Difficult to verify in shared cloud storage.
Key Takeaways
Retention policies must balance competing requirements. Keep data for exactly as long as required. Cloud deletion must address all copies across all locations. Crypto-shredding is the most reliable cloud deletion method. Archive storage requires encryption longevity, format accessibility, and key preservation planning. GDPR erasure requests are operationally complex in cloud environments.