Freitag, 25. März 2011

Tutorial 3: Data warehouse

Hey everyone, today I am going to talk about data warehouses, how they work and what benefits they provide.
In every business data plays a crucial role in understanding customer behavior. Examples include customer to business e-commerce, connecting trading partners, implementing CRM or supporting performance measurements. A data warehouses purpose is to provide easy access to data. It creates a data infrastructure and tries to eliminate bad quality data. A data warehouse has several important characteristics. It is first of all subject oriented. That means it is organized around a specific subject and not around an entire business process. That makes the data storage more reliable. Second of all it is integrated into the organization and it receives its information (data) from multiple sources. It maintains historical data and thereby allows detecting deviations, long-term trends and relationships. Finally a data warehouse is more or less non-volatile, which means that users cannot change or update data. This guarantees the warehouses data quality.
Data warehouses basically make use of the ETL process. This means Extraction, Transformation and Loading and is a four layer process.
The first layer is to find the data sources from which the warehouse receives its data. It can be data from many applications, like ERP systems. The problem with most of the data is that it is in a bad quality. Once the data sources are identified, data extraction has to take place. There are two ways to extract the data. The first would be to custom write data extraction programs, this however is only useful if the company has highly skilled technical experts and also wants to avoid the cost of purchasing an ETL system. The other way, as already mentioned is to purchase ETL software from a vendor. This is the most popular solution, as vendors allow a high degree of customization and ongoing support. The next layer is data staging. It is an intermediate storage area between the information source and the warehouse. In this layer the final process of data transformation takes place, which is at the same time the fourth layer of a data warehouse system. Data transformation is probably one of the most important steps. Because it is the “gatekeeper” that guarantees that only good data enters the warehouse. The transformation layer should be a standardized process, so that every data ends up in the same format in the warehouse. The process should look like this:
1.       You parse the data
2.       You correct and enhance the data
3.       You standardize the data
4.       You match the records
5.       Finally you consolidate the matching records and load them into the warehouse
The main advantages of a data warehouse are that it provides a common data source. In addition, it guarantees reliable and useful data through its transformation layer. Moreover, a data warehouse can work in conjunction with other systems which need a tremendous amount of data, like customer relationship management systems. Data warehouses also support decision support systems which help management to make strategic decisions. Another advantage is that a data warehouse can be combine with relational database management systems, like Microsoft Access, and thereby improve the performance and capability of these programs. Altogether we can say that almost every business organization needs to have a data warehouse in some way, as good data can lead to competitive parity or even a competitive advantage. See the following video for more information.

Source:
Watson, Hugh J. (2002) "Recent Developments in Data Warehousing," Communications of the Association for Information Systems: Vol. 8, Article 1

Keine Kommentare:

Kommentar veröffentlichen