Monday, September 29, 2014

Database Design & Performance Tuning interview questions and answers

1. What is Normalization?

Keyword:
reduce redundancy, divide large tables to smaller tables, create relationships between tables.

Answer:
Normalization is the process of organizing the fields and tables of a relational database to minimize redundancy. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database using the defined relationships.

The objectives of normalization are:
a. Free the database of modification anomalies;
b. Minimize redesign when extending the database structure;
c. Make the data model more informative to users;
d. Avoid bias towards any particular pattern of querying.


2. What is First Normal Form?

Keyword:
Each column contains only atomic values

Answer:
First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain.

Let's assume we have the following employee table:
employee_name        telephone
Jack                         123-456-7890
Tom                         123-321-5678, 123-321-5679

In case of an employee has more than one telephone number, the column "telephone" contains more than one value. This design violates 1NF.
To make it comply with 1NF, we can change the employee table as follow:
employee_name        telephone
Jack                         123-456-7890
Tom                         123-321-5678
Tom                         123-321-5679

3. What is Second Normal Form?

Keyword:
1NF + each column depends on the whole candidate key

Answer:
Second normal form (2NF) is normal form higher than 1NF. A table is in 2NF if and only if it is in 1NF and every non-prime attribute of the table is dependent on the whole of a candidate key.

Let's assume we have the following project table containing projects and their participants:
project_id    employee_id   project_name
1             1             Database upgrade
1             3             Database upgrade
2             4             Email System Setup
2             7             Email System Setup
2             9             Email System Setup

This table has a composite primary key [project_id, employee_id]. The non-key attribute is [project_name].
In this case, [project_name] only depends on [project_id], which is only part of the primary key. So this design violates 2NF.

To make it comply with 2NF, we can split the table into two tables:
project table:
project_id    project_name
1             Database upgrade
2             Email System Setup

project_participant table:
project_id    participant
1             1
1             3
2             4
2             7
2             9

4. How to store tree structure in a relational database?

Keyword:
Adjacency list
Materialized Path
Nested Set

Answer:
Generally, there are three ways to store tree structure in the database:
a. Adjacency list
Each record holds a reference (parent_id) to the parent as a foreign key.
For example:
node_id    node_value    parent_id
1               a                   0
2               b                   1
3               c                   1
4               d                   2

b. Materialized Path
Each record stores the full path from the root.
For example:
node_id     node_value     path
1                a                    1
2                b                    1/2
3                c                     1/3
4                d                    1/2/4

c. Nested Set
Number the records according to a tree traversal, which visits each node twice, assigning numbers in the order of visiting, and at both visits. This leaves two numbers for each node, which are stored as two attributes: left_id, right_id.
node_id    node_value    left_id    right_id
1               a                   1            8
2               b                   2            5
3               c                   6            7
4               d                   3            4

5. Give some examples of sql injection.

Answer:
Example 1:  SQL Injection Based on 1=1 is Always True
Let's say we have the following code snippet to build a SQL query to check username and password:
String userName = getUserInput("userName");
String password = getUserInput("password");
String sql = "SELECT * FROM user WHERE user_name= '" + userName + "' AND password =''' + password +"'";
If the user input userName: "a" and password: "a OR 1=1", then the sql query becomes:
SELECT * FROM user WHERE user_name= 'a' AND password= 'a' OR 1=1
Since 1=1 is always true, it will return all rows from the user table.

Example 2: SQL Injection Based on Batched SQL Statements
Let's say we have the following code snippet to get a user:
String userId = getUserInput("userId");
String sql = "SELECT * FROM user WHERE user_id=" + userId;
If the user input the string: "1; DROP table user",  then the sql query becomes:
SELECT * FROM user WHERE user_id=1; DROP table user

6. How to prevent SQL injection attacks?

Answer:
a. Reduce the attack surface. Ensure that all excess database privileges are revoked and that only those routines that are intended for end-user access are exposed. Though this does not entirely eliminate SQL injection vulnerabilities, it mitigates the impact of the attacks.

b. Avoid dynamic SQL with concatenated input. Dynamic SQL built with concatenated input values presents the easiest entry point for SQL injections. Avoid constructing dynamic SQL this way.

c. Use Parameterized statements. Parameterized statements, like JDBC PreparedStatement, eliminate the possibility of SQL injections and enhance performance.

d. Filter and sanitize input. Assume all user-submitted data is evil and validate and sanitize everything.

7. What is query execution plan?

Keyword:
an ordered set of steps used to access data,
SQL Query Analyzer, EXPLAIN PLAN statement

Answer:
A query execution plan is an ordered set of steps used to access data in a RDBMS. Since SQL is declarative, there are typically a large number of alternative ways to execute a given query, with widely varying performance. When a query is submitted to the database, the query optimizer evaluates some of the different, correct possible plans for executing the query and returns what it considers the best alternative. Because query optimizers are imperfect, database users and administrators sometimes need to manually examine and tune the plans produced by the optimizer to get better performance.

A given database management system may offer one or more mechanisms for returning the plan for a given query. Some packages feature tools which will generate a graphical representation of a query plan, such as SQL Query Analyzer in SQL Server. Another mechanism for retrieving the query plan involves querying a virtual database table after executing the query to be examined, such as the EXPLAIN PLAN statement in Oracle.

8. Explain the general steps of database performance tuning.

Keyword:
Business Rules -> Data Design -> Application Design -> DB Logical Structure -> SQL Statements -> Query Plan -> Memory Allocation -> I/O and Storage -> Resource Contention -> OS

Answer:
Step 1. Tune the Business Rules
For optimal performance, you may need to adapt business rules. These concern the high-level analysis and design of an entire system.

Step 2. Tune the Data Design
In the data design phase, you need to determine what data is needed by your applications and consider what relations are important, and what their attributes are, and how to structure the information to best meet performance goals. For example, in this step, you need to determine the primary and foreign key indexes.

Step 3. Tune the Application Design
Business executives and application designers should translate business goals into an effective system design. Business processes concern a particular application within a system, or a particular part of an application.

Step 4. Tune the Logical Structure of the Database
This step primarily concerns fine-tuning the index design to ensure that the data is neither over- nor under-indexed. For example, you may need to create additional indexes to support the application.

Step 5. Tune Database Operations
In this step, you may need to tune your SQL statements to take full advantage of the SQL language and RDBMS specific features designed to enhance application processing.

Step 6. Tune the Access Paths
In this step, you may need to use query execution plan tools to analyze your SQL statements to determine the best query plan.

Step 7. Tune Memory Allocation
In this step, you may need to detect database memory allocation issues and tune memory allocation strategies.

Step 8: Tune I/O and Physical Structure
Tuning I/O and physical structure involves distributing data, storing data in data blocks for best access, creating extents large enough for your data, etc.

Step 9: Tune Resource Contention
Take care to reduce the following types of contention: Block contention, Shared pool contention, Lock contention, Latch contention, etc.

Step 10: Tune the Underlying Platform
In the last step, you may need to tune the underlying OS performance.

More Database Design & Performance Tuning interview questions and answers: SQL Interview Notes

  • What is Denormalization?
  • What is Third Normal Form?
  • What is Referential Integrity?
  • How to implement one-to-one, one-to-many and many-to-many relationships while designing tables?
  • Give some examples of optimize SQL query statements?
  • What is full table scan? When to use it and how to avoid it?
  • Give some tips on creating proper index?
  • Explain general guidelines that determine when to use clustered index and non-clustered index?
  • What is the advantage and disadvantage of storing binary data in database?
  • What is Database Partitioning?
  • ......

SQL Interview Notes

100+ frequently asked SQL and Database interview questions with concise summaries and detailed answers.  
Topics include: SQL Basic Concepts, SQL DDL & DML, Advanced SQL, Database Design and Performance Tuning.  




JavaScript Interview Notes

100+ frequently asked JavaScript interview questions with concise summaries and detailed answers. Topics include: JavaScript Basics, DOM, BOM, Object-Oriented JavaScript, Function, Scope, Closure, JSON, XML, Ajax, jQuery. 
Download on the AppStore    Get it On Google Play


Java Interview Notes

300+ frequently asked Java interview questions with concise summaries and detailed answers. Topics include: Java & OOP, Strings & Collections, IO JVM & GC, Multithreading, Generics Reflection & Annotations, Design Patterns, Java EE, Spring, JPA & Hibernate.

No comments:

Post a Comment