Sql Queries Scenario Based Interview Questions

11 min read Oct 15, 2024

SQL Queries: Scenario Based Interview Questions - Mastering the Art of Data Retrieval

SQL (Structured Query Language) is the bedrock of data management, powering everything from simple data retrieval to complex data analysis. For aspiring database professionals, the ability to craft effective SQL queries is paramount. Scenario-based interview questions are a popular method to assess this skill, pushing candidates to think critically and apply their SQL knowledge to real-world situations. This article will delve into a variety of SQL query scenarios, providing insights into common interview challenges and the best strategies for tackling them.

Scenario 1: Finding the Best-Selling Products

Imagine you work for an e-commerce company and need to identify the top 10 best-selling products over the past month. You have access to two tables:

orders: Contains information about each order, including order_id, customer_id, and order_date.
order_items: Contains details about each item within an order, including order_id, product_id, and quantity.

The Challenge: Write a SQL query to identify the top 10 best-selling products, ranked by total quantity sold.

Solution:

SELECT 
    oi.product_id,
    SUM(oi.quantity) AS total_quantity_sold
FROM 
    order_items oi
JOIN 
    orders o ON oi.order_id = o.order_id
WHERE 
    o.order_date >= DATE_SUB(CURDATE(), INTERVAL 1 MONTH) 
GROUP BY 
    oi.product_id
ORDER BY 
    total_quantity_sold DESC
LIMIT 
    10;

Explanation:

JOIN: The JOIN clause combines the order_items and orders tables based on the order_id column.
WHERE: The WHERE clause filters orders to include only those placed within the last month.
SUM: The SUM function calculates the total quantity sold for each product.
GROUP BY: The GROUP BY clause groups the results by product_id to aggregate the total quantities sold for each product.
ORDER BY: The ORDER BY clause sorts the results in descending order of total_quantity_sold.
LIMIT: The LIMIT clause restricts the output to the top 10 best-selling products.

Scenario 2: Analyzing Customer Purchase History

You're working for a retail store and need to analyze customer purchase history. You have access to a table called transactions that includes information about each purchase:

transaction_id: Unique identifier for each purchase.
customer_id: Identifier for the customer making the purchase.
product_id: Identifier for the product purchased.
purchase_date: Date of the purchase.
amount: Purchase amount.

The Challenge: Write a SQL query to determine the average purchase amount for each customer who has made at least 3 purchases.

Solution:

SELECT 
    customer_id, 
    AVG(amount) AS average_purchase_amount
FROM 
    transactions
WHERE 
    customer_id IN (
        SELECT customer_id
        FROM transactions
        GROUP BY customer_id
        HAVING COUNT(*) >= 3
    )
GROUP BY 
    customer_id;

Explanation:

Subquery: The inner SELECT statement identifies customers who have made at least 3 purchases by grouping transactions by customer_id and using the HAVING clause to filter based on the count.
WHERE: The outer WHERE clause filters the transactions table to include only purchases made by those customers identified in the subquery.
AVG: The AVG function calculates the average purchase amount for each customer.
GROUP BY: The GROUP BY clause groups the results by customer_id to calculate the average for each customer.

Scenario 3: Identifying Potential Fraudulent Activity

You're a security analyst for a bank and need to identify potential fraudulent transactions. You have a table called transactions that contains information about each transaction, including:

transaction_id: Unique identifier for each transaction.
customer_id: Identifier for the customer making the transaction.
transaction_amount: Amount of the transaction.
transaction_date: Date of the transaction.
transaction_location: Location where the transaction was made.

The Challenge: Write a SQL query to identify transactions that deviate significantly from a customer's typical spending patterns, potentially indicating fraud.

Solution:

This scenario requires a more sophisticated approach. We can use a combination of SQL and statistical techniques to identify outliers. Here's a simplified approach:

Calculate the average transaction amount for each customer.
Calculate the standard deviation of transaction amounts for each customer.
Identify transactions where the transaction amount is more than 3 standard deviations away from the customer's average transaction amount.

This approach uses the concept of "z-scores," a common statistical measure for identifying outliers.

Simplified SQL Query:

SELECT 
    t.transaction_id,
    t.customer_id,
    t.transaction_amount,
    (t.transaction_amount - avg_amount) / stddev_amount AS z_score
FROM 
    transactions t
JOIN (
    SELECT 
        customer_id, 
        AVG(transaction_amount) AS avg_amount,
        STDDEV(transaction_amount) AS stddev_amount
    FROM 
        transactions
    GROUP BY 
        customer_id
) AS customer_stats ON t.customer_id = customer_stats.customer_id
WHERE 
    ABS(z_score) > 3;

Explanation:

Subquery: The inner SELECT statement calculates the average and standard deviation of transaction amounts for each customer.
JOIN: The JOIN clause combines the transactions table with the customer_stats table to get the average and standard deviation for each transaction.
Z-Score Calculation: The z_score is calculated by subtracting the average amount from the transaction amount and dividing by the standard deviation.
WHERE: The WHERE clause filters for transactions with an absolute z-score greater than 3, indicating a significant deviation from the customer's normal spending pattern.

Scenario 4: Finding Duplicate Records

You're working with a table called users that contains information about users of a website:

user_id: Unique identifier for each user.
username: Username of the user.
email: Email address of the user.

The Challenge: Write a SQL query to identify duplicate records in the users table based on the email column.

Solution:

SELECT 
    email,
    COUNT(*) AS duplicate_count
FROM 
    users
GROUP BY 
    email
HAVING 
    COUNT(*) > 1;

Explanation:

GROUP BY: The GROUP BY clause groups the records by email.
COUNT: The COUNT(*) function counts the number of occurrences for each email.
HAVING: The HAVING clause filters the results to include only emails with a count greater than 1, indicating duplicates.

Scenario 5: Optimizing a Query for Performance

You need to write a SQL query to retrieve data from a large database table, but the query is taking an extremely long time to execute.

The Challenge: How can you optimize the query to improve its performance?

Tips for Optimization:

Indexing: Create indexes on columns frequently used in WHERE clauses, JOIN conditions, and ORDER BY clauses. Indexes help the database quickly locate relevant data.
Data Type Selection: Use appropriate data types for your columns. Avoid storing large amounts of data in VARCHAR columns if a smaller data type like INT or DATE is sufficient.
Query Structure: Avoid using subqueries if possible. Use joins for efficient data retrieval.
Avoid Using SELECT *: Select only the columns you need instead of using SELECT * to reduce data transfer.
Use EXPLAIN: The EXPLAIN statement shows the execution plan for a query, helping you identify potential bottlenecks.
Consider Database Configuration: Optimize the database server configuration for optimal performance.

Conclusion

Scenario-based SQL queries test your ability to apply your knowledge to practical situations. The key to success is understanding the underlying data structures, using appropriate SQL syntax, and thinking strategically about how to solve the given problem. By practicing with different scenarios, you'll develop the confidence and expertise to tackle any SQL query challenge.

Sql Queries Scenario Based Interview Questions