Wednesday, November 12, 2008

DWH Questions

1. What is the difference between view and materialized view?

View - store the SQL statement in the database and let you use it as a table. Every time you access the view, the SQL statement executes.

Materialized view - stores the results of the SQL in table form in the database. SQL statement only executes once and after that every time you run the query, the stored result set is used. Pros include quick query results.

2. Can a dimension table contain numeric values?

Yes. But those data type will be char (only the values can numeric/char)

3.What are the data types present in bo? what happens if we implement view in the designer n report

Three different data types: Dimensions, Measure and Detail.

View is nothing but an alias and it can be used to resolve the loops in the universe.

4. What is ER Diagram?

The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views.

Simply stated the ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram which is used to visually represents data objects.

Since Chen wrote his paper the model has been extended and today it is commonly used for database design For the database designer, the utility of the ER model is:

it maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables.
it is simple and easy to understand with a minimum of training. Therefore, the model can be used by the database designer to communicate the design to the end user.

In addition, the model can be used as a design plan by the database developer to implement a data model in a specific database management software.

5.What is a surrogate key and where do you use it?

It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.
Example:

Another benefit you can get from surrogate keys (SID) is :

Tracking the SCD - Slowly Changing Dimension.

Let me give you a simple, classical example:

On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1' (that's what would be in your Employee Dimension). This employee has a turnover allocated to him on the Business Unit 'BU1' But on the 2nd of June the Employee 'E1' is muted from Business Unit 'BU1' to Business Unit 'BU2.' All the new turnover have to belong to the new Business Unit 'BU2' but the old one should Belong to the Business Unit 'BU1.'

If you used the natural business key 'E1' for your employee within your datawarehouse everything would be allocated to Business Unit 'BU2' even what actualy belongs to 'BU1.'

If you use surrogate keys, you could create on the 2nd of June a new record for the Employee 'E1' in your Employee Dimension with a new surrogate key.

This way, in your fact table, you have your old data (before 2nd of June) with the SID of the Employee 'E1' + 'BU1.' All new data (after 2nd of June) would take the SID of the employee 'E1' + 'BU2.'

You could consider Slowly Changing Dimension as an enlargement of your natural key: natural key of the Employee was Employee Code 'E1' but for you it becomes
Employee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the difference with the natural key enlargement process, is that you might not have all part of your new key within your fact table, so you might not be able to do the join on the new enlarge key -> so you need another id.

6. Difference between Snow flake and Star Schema. What are situations where Snow flake Schema is better than Star Schema to use and when the opposite is true?

Star schema contains the dimension tables mapped around one or more fact tables.
It is a demoralized model. No need to use complicated joins. Queries results facility.

Snowflake schema
It is the normalized form of Star schema. contains in-depth joins ,because the tables r spitted in to many pieces. We can easily do modification directly in the tables. We have to use complicated joins, since we have more tables. There will be some delay in processing the Query.

7. What is the main difference between schema in RDBMS and schemas in DataWarehouse....?

RDBMS Schema
* Used for OLTP

* Traditional and old schema
* Normalized
* Difficult to understand and navigate
* Cannot solve extract and complex problems
* Poorly modeled

DWH Schema
* Used for OLAP systems
* New generation schema
* De Normalized
* Easy to understand and navigate
* Extract and complex problems can be easily solved
* Very good model

8. What is Data Mart ?

A data mart is a repository of data gathered from operational data and other sources that is designed to serve a particular community of knowledge workers. In scope, the data may derive from an enterprise-wide database or data warehouse or be more specialized. The emphasis of a data mart is on meeting the specific demands of a particular group of knowledge users in terms of analysis, content, presentation, and ease-of-use. Users of a data mart can expect to have data presented in terms that are familiar.

9. What the definition is of normalized and de normalized view and what are the differences between them?

Normalization is the process of removing redundancies.

Denormalization is the process of allowing redundancies.

10. What is meant by metadata in context of a Datawarehouse and how it is important?

Meta data is the data about data; Business Analyst or Data Modeler usually capture information about data - the source (where and how the data is originated), nature of data (char, varchar, nullable, existance, valid values etc) and behavior of data (how it is modified / derived and the life cycle ) in data dictionary a.k.a metadata. Metadata is also presented at the Datamart level, subsets, fact and dimensions, ODS etc. For a DW user, metadata provides vital information for analysis / DSS.

11. What is a linked cube?

Linked cube in which a sub-set of the data can be analyzed into great detail. The linking ensures that the data in the cubes remain consistent.

12. What is junk dimension? what is the difference between junk dimension and degenerated dimension?


Junk dimension: Grouping of Random flags and text attributes in a dimension and moving them to a separate sub dimension.


Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension table with fields like order number and order line number and have 1:1 relationship with Fact table, In this case this dimension is removed and the order information will be directly stored in a Fact table in order eliminate unnecessary joins while retrieving order information..




















No comments: