Define the concept of database normalization and explain its primary goals in database design.
Explain the rules and characteristics of First Normal Form (1NF). Provide a concrete example of a table that violates 1NF and how to refactor it to comply.
Explain the rules and characteristics of Second Normal Form (2NF). Provide a concrete example of a table that violates 2NF and how to refactor it to comply. Assume the table is already in 1NF.
Explain the rules and characteristics of Third Normal Form (3NF). Provide a concrete example of a table that violates 3NF and how to refactor it to comply. Assume the table is already in 2NF.
Consider the following simplified initial schema designed by a junior developer:
1CREATE TABLE users ( 2 user_id INT PRIMARY KEY, 3 username VARCHAR(50) NOT NULL, 4 email VARCHAR(100) UNIQUE NOT NULL, 5 full_address VARCHAR(255) -- Stores entire address as a single string (e.g., '123 Main St, Anytown, CA 90210') 6); 7 8CREATE TABLE products ( 9 product_id INT PRIMARY KEY, 10 product_name VARCHAR(100) NOT NULL, 11 description TEXT, 12 price DECIMAL(10, 2), 13 tags VARCHAR(255) -- Stores comma-separated tags (e.g., 'electronics, sale, gadgets') 14);
Identify and explain at least three significant normalization violations or design flaws present in this schema, specifically focusing on the full_address
and tags
columns. Discuss the potential problems each flaw could cause (e.g., data redundancy, update anomalies, query complexity).
Refactor the problematic users
and products
tables from Question 5 to achieve Third Normal Form (3NF). Provide the SQL DDL (CREATE TABLE statements) for your new, normalized schema. Clearly indicate any new tables, columns, primary keys, foreign keys, and constraints added.
Explain the key advantages of your refactored schema from Question 6 compared to the original design presented in Question 5. Focus on how your changes improve data integrity, queryability (provide a simple example query demonstrating improved querying for tags), and overall maintainability and flexibility of the database.