On-Line Analytical Processing(OLAP)

Abstract

This paper presents the complete view of Online Analytical Processing. It explains the purpose and some history of OLAP. What is FASMI explained briefly. The complete concept of OLAP is explained by giving one example. It covers all definitions related to OLAP. Applications of OLAP such as Marketing & Sales Analysis, Clickstream Analysis are explained here. Paper has presented some of the tools of OLAP available in market.

Introduction

Definition – On-Line Analytical Processing. Designates a category of applications and technologies that allow the collection, storage, manipulation and reproduction of multidimensional data, with the goal of analysis.On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.

FASMI

This word is interpreted as Fast Analysis of Shared Multidimensional Information. Fast means that the system is targeted to deliver most responses to users within about few seconds, with the simplest analyses taking no more than one second and very few taking more than 20 seconds. Quality of analysis suffers due to less speed of analysis.


Untitled

Analysis means that the system can cope with any business logic and statistical analysis that is relevant for the application and the user, and keep it easy enough for the target user.All the required analysis functionality is provided in an intuitive manner for the target users. This could include specific features like time series analysis, cost allocations, currency translation, goal seeking, ad hoc multidimensional structural changes, non-procedural modeling, exception alerting, data mining and other application dependent features.

Shared means that the system implements all the security requirements for confidentiality. If multiple write access is needed, concurrent update locking at an appropriate level is necessary, The system should be able to handle multiple.

Multidimensional This is a one-word definition of OLAP. This is key requirement. The system must provide a multidimensional conceptual view of the data, including full support for hierarchies and multiple hierarchies, as this is certainly the most logical way to analyze businesses and organizations.

Information is all of the data and derived information needed, wherever it is and however much is relevant for the application. The capacity of various products can be measured in terms of how much input data they can handle, not how many Gigabytes they take to store it.

3. More about OLAP

  1. The term OLAP came in existence since 1993.
  2. It was introduced by E.F. Codd, the father of relational databases
  3. OLAP is useful to managers of the company
  4. Response time rather than throughput is the main performance measure.
  5. Complex SQL queries and views, based on multidimensional data
  6. Interactive and “online” queries.
  7. The database is often a data warehouse…

 

4. OLAP by Example

4.1 The Table

Consider the data of “Best Foot Forward” company

Month

Style

Quantity Amount
Jan 2000 Sky Boot 30 3000
Jan 2000 Gum Boot 40 4000
………. …………

Analysis of this data can be done as

1) For quantity

Month

Sky Boot

Gum Boot
Jan 2000 30 40
Feb 2000 40 50

2) For amount

Month

Sky Boot

Gum Boot
Jan 2000 3000 4000
Feb 2000 4000 5000
………. …………

Untitled

The above analysis has only has two dimensions. If we want to study the performance of an outlet (vendor) then the analysis has three dimensions that is month, style and outlet

Month

Style

Outlet Quantity Amount
Jan 2000 Gum Boot Lyon 50 5000
Jan 2000 Sneaker

 

Paris Bastille 30 3000
 

 

4.2 The CUBE

If “Best Foot Forward” company wants to follow the progress of its shoe sales by month, style and outlet then in OLAP terminology these are the dimensions, sometimes called axes.In general, a dimension has from two to a maximum of several thousand positions.We can now say that the measure Quantity has the dimensions Month, Style and Outlet. Each measure may have from one to several million of values. All the values of a measure are of the same data type, for example integer or decimal.


Untitled

                                    Cube showing 3D analysi                                  

4.3 The Hierarchies

Some dimensions are divided into another dimensions forming the hierarchy of the dimension. For e.g. day, month and year

Untitled

4.4 The formulas

From the existing measures we can create others, which won’t be stored by the database but will be calculated dynamically each time the user requests them are called formulas.

From the user’s point of view both stored measure and formula are same for e.g. the average price

The average value = amount / quantity

4.5 The Attributes

A fact describing each position of a dimension.

4.6 Temporal calculation

To define actions to undertake in the future we need to analyze the evolution of already filed data, and if possible to anticipate their evolution in the future. OLAP databases are very powerful for that, they provide functions allowing to manipulate temporal inquiries.

4.7 The sparcity

In many cases there are no values for a dimension with respect to other dimension then dimension is called as sparse & a dimension having all values with respect to all other dimensions is called dense dimension

With Oracle Express, we just have to specify which dimensions are dense and sparse which optimize disk space management and access to data

5. Application area of OLAP

5.1 Marketing and sales analysis

Most commercial companies require this application, and most products are capable of handling it to some degree. However, large-scale versions of this application occur in three industries, each with its own peculiarities:

1) Consumer goods industries often have large numbers of products and outlets, and a high rate of change of both. They usually analyze data monthly, but sometimes it may go down to weekly or, very occasionally, daily. There are usually a number of dimensions, none especially large (rarely over 100,000). Data is often very sparse because of the number of dimensions. Because of the competitiveness of these industries, data is often analyzed using more sophisticated calculations than in other industries. Often, the most suitable technology for these applications is one of the hybrid OLAPs, which combine high analytical functionality with reasonably large data capacity.

2) Retailers, thanks to EPOS data and loyalty cards, now have the potential to analyze huge amounts of data. Large retailers could have over 100,000 products (SKUs) and hundreds of branches. They often go down to weekly or daily level, and may sometimes track spending by individual customers. They may even track sales by time of day. The data is not usually very sparse, unless customer level detail is tracked. Relatively low analytical functionality is usually needed. Sometimes, the volumes are so large that a ROLAP solution is required, and this is certainly true of applications where individual private consumers are tracked.

3) The financial services industry (insurance, banks etc) is a relatively new user of OLAP technology for sales analysis. With an increasing need for product and customer profitability, these companies are now sometimes analyzing data down to individual customer level, which means that the largest dimension may have millions of members. Because of the need to monitor a wide variety of risk factors, there may be large numbers of attributes and dimensions, often with very flat hierarchies.

Most of the data will usually come from the sales ledger(s), but there may be customer databases and some external data that has to be merged. In some industries (for example, pharmaceuticals and CPG), a large volume of market and even competitor data is readily available and this may need to be incorporated. The benefits of a good marketing and sales analysis system is that results should be more predictable and manageable, opportunities will be spotted more readily and sales forces should be more productive.

5.2 Clickstream analysis

This is one of the latest OLAP applications. Commercial Web sites generate gigabytes of data a day that describe every action made by every visitor to the site. No bricks and mortar retailer has the same level of detail available about how visitors browse the offerings, the route they take and even where they abandon transactions. A large site has an almost impossible volume of data to analyze, and a multidimensional framework is possibly the best way of making sense of it. There are many dimensions to this analysis, including where the visitors came from, the time of day, the route they take through the site, whether or not they started/completed a transaction, and any demographic data available about customer visitors.

  1. OLAP TOOLS

6.1 Oracle OLAP

Oracle OLAP provides support for business intelligence applications without the need for replicating large volumes of data in specialized analytical databases. Oracle OLAP allows applications to support complex multidimensional calculations directly against the data warehouse. The result is a single database that is more manageable, more scalable and accessible to the largest number of applications. Business intelligence applications are only useful when they are easily accessed.

Untitled

Fig. : An example of a graphically intense clickstream analysis application that uses OLAP.

Oracle OLAP, an integrated part of Oracle9i Database, provides support for multidimensional calculations and predictive functions. Oracle OLAP supports both the Oracle relational tables and analytic workspaces (a multidimensional data type). Key features of Oracle OLAP include:

  • The ability to support complex, multidimensional calculations
  • Support for predictive functions such as forecasts, models, non-additive aggregations and allocations, and scenario management (what if)
  • A Java OLAP API
  • Integrated OLAP administration

6.1.1 Platform for Business Intelligence Applications

Oracle9i Database provides a comprehensive platform for business intelligence applications. The components of the platform include:

  • The Oracle9i Database
  • Oracle OLAP as a facility within Oracle9i Database

This platform provides

  • A complete range of analytic functionality including multidimensional and predictive functions
  • Support for rapid query response times such as those that are normally associated with specialized analytical databases
  • A scalable platform for storing and analyzing multi-terabyte data sets.
  • A platform that is open to both multidimensional and SQL based applications
  • Support for Internet based applications

6.1.2 Oracle9i Database

The Oracle9i Database provides the foundation for Oracle OLAP by providing a scalable and secure data store, summary management facilities, metadata, SQL analytic functions, and high availability features. Scalability features that provide support for multi-terabyte data warehouses include:Partitioning allows objects in the data warehouse to broken down into smaller physical components that can then be managed independently and in parallelParallel query execution allows the database to use multiple processes to satisfy a single Java OLAPI API query

6.1.3 Summary Management

Materialized views provide facilities for effectively managing data within the data warehouse. As compared to summary tables, materialized views offer several advantages:

They are transparent to applications and users

They manage staleness of data

They can automatically update themselves when source data changes Like Oracle tables, materialized views can be partitioned and maintained in parallel. Unlike proprietary multidimensional cubes, data in materialized views are equally accessible by all applications using the data warehouse.

6.1.4 Security

Security is critical to the data warehouse. To provide the strongest possible security and to minimize administrative overhead, all security policies are enforced within the data warehouse. Users are authenticated in the Oracle database using database authentication or Oracle Internet Directory (LDAP). Access to elements of the multidimensional data model is controlled through grants and privileges in the Oracle database. Cell level access to data is controlled in the Oracle database using Oracle’s Virtual Private Database feature.

6.1.5 Metadata

All metadata is stored in the Oracle database. Low level objects such as dimensions, tables and materialized views are defined directly from the Oracle data dictionary while higher level OLAP objects are defined in the OLAP catalog. The OLAP catalog contains objects such as Cubes and Measure folders as well as extensions to the definitions of other objects such as dimensions. The OLAP catalog fully defined the dimensions and facts and thus completes the definition of the star schema.

6.1.6 Disaster Recovery

  • Oracle’s disaster recovery features protect data in the data warehouse. Key features include:
  • Oracle Data Guard, a comprehensive standby database disaster recovery solution
  • Redo logs and the recovery catalog
  • Backup and restore operations that are fully integrated with Oracle’s partition features
  • Support for incremental backup and recovery

6.1.7 System Management

Oracle Enterprise Manager (OEM) provides a centralized, comprehensive management tool. OEM enables administrators to monitor all aspects of the database, including Oracle OLAP. Oracle Enterprise Manager provides management services to Oracle OLAP including:

Instance, session, and configuration management

Data modeling

Performance monitoring

Job scheduling

6.1.8 System Requirements

Oracle OLAP is installed as part of the Oracle9i Database and imposes no additional system requirements. Oracle OLAP can also be installed on a middle tier system. When installed on a middle tier system, 128 MB of memory is required. When analytic workspaces are used extensively, additional memory is recommended. The actual amount of memory for use with analytic workspaces will vary with your application.

6.2 Deltathree

6.2.1 Introduction

Deltathree’s Online Analytical Processing Tool (OLAP) is a web- based database providing in- depth statistics to help you develop strategic plans to best serve your customers and drive new revenue streams.

6.2.2 Reseller Network Analysis

Use the Reseller Network Analysis Cube to analyze network data for call quality and traffic patterns. Search for information using many different criteria such as duration of calls, number of calls made, Answer Seizure Ratio (ASR) – call completion ratio, and more. Cut and slice that data by day, month, year, quarter, day of week, destinations, and more.

6.2.3 Reseller Financial Traffic

Use the Reseller Financial Traffic Cube to analyze network data for revenues, charges, trends, and profits. Search for information using many different criteria such as the gross duration of the call, duration of the call for the billing charge, retail charge of call, wholesale charge of call. Cut and slice that data by day, month, year, quarter, day of week, destinations, and more.

6.2. 4 Customer Activities

Search for information using many different criteria such as accounts recharged, status of accounts (for example, activated or expired), change in balance of an account, and more. Cut and slice that data by day, month, year, quarter, day of week, destinations, and more. Use the Customers Events Cube to analyze user account activities in the system.

6.2.5 View CDRs

Retrieve CDRs (Call Detail Records) from our FTP (File Transfer Protocol) Directory. The files in the FTP Directory can be exported on a daily/ weekly/ monthly basis to a text file.

6.2. 6 Save your reports

When you generate reports in the Customer Events, Reseller Network Analysis, or Reseller Financial Traffic Cubes, save them in the Customized Reports folder for future use.

6.3 Oracle Express

Oracle Express provides its own specialized database for storing multi-dimensional data. Data in a multi-dimensional database is stored as business people views it, allowing them to slice and dice the data to answer business questions. When designed correctly, an OLAP database will provide must faster response times for analytical queries.

Dimensions are the elements that an Express database is broken down by. These are often the “keys” of a database. Examples of dimensions are PRODUCT, REGION or TIME.

Variables are the objects that hold data in an Express database. These are simply arrays of values (usually numeric) that are “dimensioned” by the dimensions in a database. For example, a SALES variable may be dimensioned by PRODUCT, REGION, and TIME. This three-dimensional variable or array is often visualized as a cube of data.

To build an OLAP database in a very short time, Oracle recommends the Oracle Express Administrator tool. We will use it in this first stage to create the Best Foot Forward database.

After connecting to Oracle Express, perhaps on a different machine, the first window appears and allows you to create a new database. After entering a name and a description, we have the following:

Untitled

This first window is very reassuring, because the principal terms are familiar: dimension, variable, formula and relation.

Just a little right click on “Dimension” and you’re able to create all the dimensions of the database, using the following dialog box:

Untitled

The “General” tab let you define the data type (id, numeric, text,) of each dimension. The “Labels” tab gives you the ability to define short and long labels for each dimension, for the end user. Other options are used to optimize or manually create conjoint dimensions.

A right click on “Variable”, and we can create the Quantity, TotalVTE and VAT (Value Added Tax) variables. For each one, we specify the dense and sparse dimensions.

Untitled

In this example, the TotalVTE variable has a dense Time dimension, and sparse Outlet and Reference dimensions.

Finally, you guess that a right click on “Formula” allows you to create formula. Let’s define the TotalVTI formula:

Untitled

The equation of the formula (here: Total VTE * TVA) is written in a specific language.

The database structure is now built. The main window gives us a diagram of our database.

Untitled

Now, we have to fill this structure with data. Let’s go to the next stage!

 Loading data

We will distinguish two kinds of data:

  • The items used to initialize the OLAP database the first time are the positions and labels of the dimensions. In our example, the easiest way is to use Express Administrator again, but it’s not required.
  • The numeric data sent every day by each outlet, presumably in flat files. The goal is to load these files in the OLAP database.

To fill a dimension with positions in Express Administrator, as usually, we have to right-click on the dimension label, and then choose “Edit Values”. The opening window allows us to enter all the positions, and also to define the hierarchies for that dimension. Here, for example, are the positions and levels of the Time dimension.

Untitled

The next step is to automatically load flat files every day. Here, we have to write a program and store it in the database. An assistant in Express Administrator helps us to generate this program. We just have to specify the format of the flat file, in a window looking like this:

Untitled

Here, each line of the flat file contains 5 fields. The different dimensions are followed by the Quantity and TotalVTE variables.

The tool creates the corresponding 4GL program. We start it with the following window:

Untitled

The daily data is now stored in the database. We just have to aggregate the values along the Time dimension. With Administrator, another assistant helps to create a rollup program. We just have to choose the variables to aggregate, the hierarchies and the positions to select for the rollup.

Using this assistant is very easy and not very interesting here. Unfortunately, the generated programs are very long and complex, due to the large number of parameters. Our application is now self-sufficient: The data is loaded daily, and aggregates are done just afterward. End users can now connect to the database and analyze their data. That is the subject of the next stage.

Analyzing data

Many products are available to consult the data in our Best Foot Forward database:

  • Express Analyzer allows users to access Oracle Express data in a client-server mode. You will find some hardcopy in this page.
  • Express Object allows developers to build an entire client-server application, based on Oracle Express data.
  • An Add-in allows Microsoft Excel to directly access data in Oracle Express.
  • Oracle Products such as Web Agent are used to view the Express data directly in a browser.
  • Other products such as Business Objects are able to access Express Data.

To view Express data with Analyzer, we just have to choose the desired measures. The tool creates a Table or a Graph, with variables and associated dimensions. The selected measures are positions in a specific dimension called “Measure”.

An Oracle tool, called Selector, is used to choose, for each dimension, the desired positions. This tool can make very complex selections, and allow you, for example, to choose all the months in a year, or all references having at least 10 shoes sold in September, or the best 5 outlets. It allows you to store and restore selections. The following window shows the first dialog box of the selector.

Untitled

After choosing the measures and selecting the desired positions, the table or the graph is viewed. Here are two examples for the Best Foot Forward database:

Untitled

When a Table is viewed, we drag and drop each dimension in line, column or page (in the upper left corner). We drill up or down the hierarchies with a click on the + or – sign, we aggregate dimensions, … All these operations are very fast and easy, allowing the user to choose his way of analyzing data.

 The 4GL

A specific language is available in Oracle Express, allowing you to handle the whole database. With this 4GL language, it’s possible to:

  • Entirely define a multidimensional database, including its dimensions, variables, formulas, …
  • Insert data in the database, for example by loading flat files or by connecting directly to a relational database.
  • Aggregate data along hierarchies or write very complex calculations.
  • Select specific data to build flat files or complex reports.

Products such as Express Administrator or Express Analyzer use 4GL to interact with the database.

Conclusion

Thus Online analytical Processing provides the best analysis of large amount of data, which is useful in various fields as explained above. To perform this analysis various tools are also available as oracle express, deltathree etc.

References

  1. ^ a b Codd E.F., Codd S.B., and Salley C.T. (1993). “Providing OLAP (On-line Analytical Processing) to User-Analysts: An IT Mandate”. Codd & Date, Inc. http://www.fpm.com/refer/codd.html. Retrieved 2008-03-05.
  2. ^ Deepak Pareek (2007). Business Intelligence for Telecommunications. CRC Press. pp. 294 pp. ISBN0849387922. http://books.google.com/?id=M-UOE1Cp9OEC. Retrieved 2008-03-18.
  3. ^ Apostolos Benisis (2010). Business Process Management:A Data Cube To Analyze Business Process Simulation Data For Decision Making. VDM Verlag Dr. Müller e.K.. pp. 204 pp. ISBN978-3-639-22216-6. http://www.google.com/products?q=9783639222166.
  4. ^ Abdullah, Ahsan (2009). “Analysis of mealybug incidence on the cotton crop using ADSS-OLAP (Online
  1. Analytical Processing) tool , Volume 69, Issue 1″. Computers and Electronics in Agriculture 69 (1): 59–72. doi:1016/j.compag.2009.07.003.
  2. ^ “OLAP Council White Paper” (PDF). OLAP Council. 1997. http://www.symcorp.com/downloads/OLAP_CouncilWhitePaper.pdf. Retrieved 2008-03-18.
  3. ^ O’Brien & Marakas, 2011, p. 402-403
  4. ^ Hari Mailvaganam (2007). “Introduction to OLAP – Slice, Dice and Drill!”. Data Warehousing Review. http://www.dwreview.com/OLAP/Introduction_OLAP.html. Retrieved 2008-03-18.
  5. ^ O’Brien & Marakas, 2009, pg 177
  6. ^ O’Brien & Marakas, 2009, pg 178
  7. ^ Williams, C., Garza, V.R., Tucker, S, Marcus, A.M. (1994, January 24). Multidimensional models boost viewing options. InfoWorld, 16(4)
  8. ^ MicroStrategy, Incorporated (1995). “The Case for Relational OLAP” (PDF). http://www.cs.bgu.ac.il/~dbm031/dw042/Papers/microstrategy_211.pdf. Retrieved 2008-03-20.
  9. ^ Surajit Chaudhuri and Umeshwar Dayal (1997). “An overview of data warehousing and OLAP technology”. SIGMOD Rec. (ACM) 26 (1): 65. doi:1145/248603.248616. http://doi.acm.org/10.1145/248603.248616. Retrieved 2008-03-20.
  10. ^ Gray, Jim; Chaudhuri, Surajit; Layman, Andrew; Reichart, Hamid; Venkatrao; Pellow; Pirahesh (1997). “Data Cube: {A} Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals”. Data Mining and Knowledge Discovery 1 (1): 29–53. http://citeseer.ist.psu.edu/gray97data.html. Retrieved 2008-03-20.
  11. ^ Nigel Pendse (2006-06-27). “OLAP architectures”. OLAP Report. http://www.olapreport.com/Architectures.htm. Retrieved 2008-03-17.
[a] www.oracle.com/olap/

[b] www.orafaq.com/faqxprss.htm

[c] www.oracle-olap.com

Download PPT

 

 

 

 

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *