AWS S3 Storage Tiers and Lifecycle Policy
Part 1: Explanation of AWS S3 Storage Tiers
S3 Standard
- Ideal Use Cases: General-purpose storage for frequently accessed data. Common for dynamic websites, content distribution, mobile and gaming applications, and big data analytics.
- Durability: Designed for 99.999999999% (11 nines) durability of objects over a given year, across multiple Availability Zones (AZs).
- Availability: 99.99% availability over a given year.
- Cost Characteristics: Highest storage cost per GB. Lowest retrieval costs, as data is immediately available.
- Retrieval Times: Milliseconds.
S3 Standard-Infrequent Access (S3 Standard-IA)
- Ideal Use Cases: Long-lived, less frequently accessed data that requires rapid access when needed. Suitable for backups, disaster recovery, and long-term storage for analytics data.
- Durability: Designed for 99.999999999% (11 nines) durability across multiple AZs.
- Availability: 99.9% availability over a given year.
- Cost Characteristics: Lower storage cost per GB compared to S3 Standard, but incurs a retrieval fee. Minimum storage duration of 30 days.
- Retrieval Times: Milliseconds.
S3 One Zone-Infrequent Access (S3 One Zone-IA)
- Ideal Use Cases: Infrequently accessed data that does not require the multi-AZ resilience of S3 Standard-IA. Suitable for secondary backups of on-premises data or easily recreatable data, where resilience to an AZ loss is not critical.
- Durability: Designed for 99.999999999% (11 nines) durability within a single AZ. Data stored in this class will be lost in the event of an AZ destruction.
- Availability: 99.5% availability over a given year.
- Cost Characteristics: Lower storage cost per GB than S3 Standard-IA, but incurs a retrieval fee. Minimum storage duration of 30 days.
- Retrieval Times: Milliseconds.
Amazon Glacier Flexible Retrieval (formerly S3 Glacier)
- Ideal Use Cases: Long-term archival of data that is accessed infrequently, typically once a quarter or less, and where retrieval times of several hours are acceptable. Examples include media archives, scientific data, or regulatory compliance archives.
- Durability: Designed for 99.999999999% (11 nines) durability across multiple AZs.
- Availability: Not designed for immediate availability; data must be restored before access.
- Cost Characteristics: Very low storage cost per GB. Higher retrieval costs which vary based on retrieval speed. Minimum storage duration of 90 days.
- Retrieval Times: Configurable. Options include: Standard (3-5 hours), Bulk (5-12 hours), Expedited (1-5 minutes, at a higher cost).
Amazon S3 Glacier Deep Archive
- Ideal Use Cases: The lowest-cost storage class for long-term data archival, particularly for data that needs to be retained for 7-10+ years for compliance or regulatory requirements and is rarely, if ever, accessed.
- Durability: Designed for 99.999999999% (11 nines) durability across multiple AZs.
- Availability: Not designed for immediate availability; data must be restored before access.
- Cost Characteristics: Lowest storage cost per GB among all S3 storage classes. Highest retrieval costs and longest retrieval times. Minimum storage duration of 180 days.
- Retrieval Times: Standard (within 12 hours), Bulk (within 48 hours).
Part 2: Proposed S3 Lifecycle Policy for Log Data
Scenario: Terabytes of log data, currently in S3 Standard, rarely accessed after 30 days, but must be stored for compliance reasons. The goal is to optimize costs while meeting access and compliance requirements.
Proposed S3 Lifecycle Policy:
-
Transition to S3 Standard-Infrequent Access (S3 Standard-IA) after 30 days:
- Action: Transition objects from S3 Standard to S3 Standard-IA.
- Days: 30 days after object creation.
- Justification: This addresses the requirement that log data is rarely accessed after 30 days. S3 Standard-IA offers significantly lower storage costs than S3 Standard while still providing millisecond access if an infrequent read is needed. Durability is maintained across multiple Availability Zones.
-
Transition to Amazon Glacier Flexible Retrieval after 90 days:
- Action: Transition objects from S3 Standard-IA to Amazon Glacier Flexible Retrieval.
- Days: 90 days after object creation (which means 60 days after entering S3 Standard-IA).
- Justification: For data that is truly archival and very infrequently accessed, Glacier Flexible Retrieval provides substantial cost savings over S3 Standard-IA. While retrieval times are longer (hours), this is acceptable for data primarily stored for compliance and rarely accessed.
-
Transition to Amazon S3 Glacier Deep Archive after 365 days:
- Action: Transition objects from Amazon Glacier Flexible Retrieval to Amazon S3 Glacier Deep Archive.
- Days: 365 days after object creation (which means 275 days after entering Glacier Flexible Retrieval).
- Justification: This is the final and most cost-effective tier for long-term compliance archival. For log data that needs to be retained for many years and is highly unlikely to be accessed, S3 Glacier Deep Archive offers the lowest storage costs. The retrieval times (12-48 hours) are acceptable given the extremely low access frequency for such deep archival data.
1.
Summarize the key differences between S3 Standard, S3 Standard-IA, and S3 Glacier Deep Archive, and explain how the proposed lifecycle policy for log data leverages these differences for cost optimization and compliance.