ADTPData Warehousing SENG 691L

Jeffrey Edgell
This 10 page Class Notes was uploaded by Jaydon Gorczany on Saturday September 12, 2015. The Class Notes belongs to SENG 691L at West Virginia University taught by Jeffrey Edgell in Fall.

Date Created: 09/12/15
Class Schedule Nutethattheprupusedscheduleis sub 31 in change 1r nu have cuncems nr queshuns 2me n at me knuwandwe En adju In 7 JmZUClassmdtznngm 7 JumZSJul A Noduxheuunuf w I Myy mdtznndnemdchss hdyllclass hdyl class uyiznmnss memamu a 7 A ggt FmaJPmectsduz Advanced Dimensional Modeling Techniques Jeffrey T Edgell Techniques that are Proven 0 Common models are consistent in various business areas financial inventory retail health care etc 0 Each have different and distinct issues related to their business areas that require special attention 0 Each perform specific yet common types of analysis MM Dimensions I Sometimes dimensions may have multiple activity on a singe fact diagnosis or treatment for a patient I This creates a problem in that the dimension relates to the fact in a l m relationship The logic to collect the recorded information becomes dif th or fails I The solution is to use a bridgeassociative table The Bridge Table I Breaks the m m relationship I Collecw groups of information for a single activity I Allows the user to conduct determine summary information based on an activity or speci c detail of all actions Within an activity handled by using Weight factors to determine the allocated portion of each action The Bridge Table 39 Patient LimeikeyQK 1 Billable Patient patientikeyQK quot I m Timeikey k 39 Provider PaL Location pgviidZ39keyem 1 m locationie fk lOCatiorLkeyQK a u s PayetkeyOk attributes Payer P diagaosisikey k pairieywm 1 m billedtopayaiamount Procedure 3 5 billedtopatientiamount u procedureikeyQK m attributes Diagnosis diagnoslsikenyK E o The Bridge Table Billable Patient w y Ck top ayeriamount billedtopatientiamount The M 11 M Trap Do not combine or compare the tables through a single select I The join Will only produce results based on information that is common to both tables The use use ofan outerjoin must be utilized to guarantee accurate resulw multipass SQL The outer join Will likely perform better anyway The M 11 M Trap Customer 1 Return Fact Order Fact Attempting to create an orders and returns report by customer ii i i activity in the fact table where the firstjoin occurred Join retum fact with customer 6 all customer return combinations Join retum x customer with order 6 all customers that have returned something detailing all ofthose customer s orders and returns Role Playing Dimensions I O en there are consistent themes in a Warehouse that require separate dimensions but are based on the same data I Some examples are time origination and destination locations cable and phone providers I The dimensions represent distinct slants on the information yet are based on the same data Role Playing Dimensions I Use roles to provide the viewpoint required from the analyst or user I Create a single master table to manage all of the data I Use views and distinctly name the table and attributes to provide the roles I Remember a ship date order date and received date are alljust dates Representing Hierarchies I The typical approach is to build a list of parentchild pointers I This will not work with standard SQL GROUP BY I The Oracle CONNECT BY statement will only allow traversing of the hierarchy not allowing joins thus no fact table information can be attained Representing Hierarchies The solution is to utilize a bridge table that provides insight and direction into the hierarc y 0 With this approach joins can be performed and calculations can be conducted Representing Hierarchies Commercial Customer CustomaikeyQK customariid customariname customeriaddress custom eritype industry jroup dateiofi rst purchase purchase Jar 1e 39 r1 parenticustomerikey Representing Hierarchies Commercial Customer CustomerNavigation Bridge Parenticustomeri ey subsidiaryicustomacikey CustomaikeyQK cu customariname customeriaddress customeritype industry jrou purchase Jim i c ditJJr Standard SQL works navigation and parenticustomerikey calculations can be pe orme Large Dimension Time Stamping 0 In large semidynamic queries identifying the boundaries of a change efficiently can be difficult 0 Example is the employee information in an HR data mart 7 job grade education appraisal rating 7 last review health insurance plan retirement plan Large Dimension Time Stamping 0 Goal in this example is to provide the ability to 7 Generate month end reporting on employees 7 Analyze the employee population at a speci c moment in time 7 Report on every actiontransaction related to an employee and the sequence of event Large Dimension Time Stamping Employee Transacmm Human Resources Fact Lnuuuyeziudn 4mmqu Cnncal fur Emplny jmmmm m why nmepenud mum uanmtlnnnidescnpunn dem mmn quotgm m kg inhuman Mm Ehrumf cranmu39 7 didaleume 6min Iasummmunn g quotmung hm me I d 21mm 1 m r em t 21 39nh an Wcz nnimrned HillW mm km Emmi min nmhwjranIHns last 7 at numberitrznxfgrs mime u numb 711i eu39rmmgurame 4mm hmrmjnmanculam vacz nnjlan The Right Number of Dimensions I Typically a designer should aim for 5 to 15 dimensions for each data mart If it is less than 5 we are likely missing some dimensions which may include 9 Casual dimensions promotion weather t 9 Additional time stamp dimensions to handle varying grain 9 Role based dimensions 9 Status dimensions transaction status 9 Audit dimensions 9 Junk dimensions The Right Number of Dimensions I Ifthe number ofdimensions is approaching 20 or 30 reduction of dimensions in the data mart should be evaluated 7 prune 7 Combine dimensions as feasible it is likely some dimensions belong together do not normalize 7 Look for opportunities to create ajunk dimension 7 identify if each dimension is actually relevant in the context ofthe datamart Extending Fact Tables I Ways attempt to retain facw at the lowest grain possible However sometimes this is not possible and information in a fact table resides in multiple levels of granularity I What do you do Kee the various facts combined at multiple levels of granularity NO 7 Extend the fact table to report on another level of granularity an aggregate ifyou will YES Extending Fact Tables mm Mommas brandikeymo shipimodeike Ck planiversionikeymo planiquantity a yzzlr to Hillelunqu llva ljlllnquy The gm shi s mln39omlljlmtiqty mtzgmyjlaniqg Should aeate an extmded fact table Extending Fact Table 0 Often extended fact tables can be combined with existing or planned aggregate tables 0 What to look for to combine 7 Granularity is common between the fact table and aggregate 7 The fact table and aggregate share exactly the same dimensions Complexities with Time 0 Often the minutes and seconds of the day are important and interesting for analysis 0 The problem is that the recording of seconds and minutes greatly amplifies the fact table record only actions that occur in that exact second share a time key 0 As a result the time dimension becomes quite large as well Complexities with Time I Solution 7 place time ofday as a numeric fact in the appropriate fact tables 7 record as seconds or minutes past midnight 7 reduces the total number ofunique time keys signi cantly I When tracking time zones multiple date and time indicators are needed to represent multiple time zones date time GMTidate GMTitime Multiple Units of Measure I O en in a supply chain situation the grain is common but the unis that are measured are different 7 speci c units I Solution 7 De ne all the conversion factors 7 Place the actual converted Values in the fact table Multiple Currency I Problem Allow the tracking and reporting of monetary issues over time using multiple currencies I SolutionUse a conversion table to allow for multiple currencies Multiple Currency dateikey k productikey lt store 6 rep ortingicountryikeymo customer key k 9 a 7 n encyitmdaeed UsidollariequwalentJendered dateikey k buyingicountryikeymo sellingicountryikeymO conv ersion rate Value Band Reporting 0 SQL has no efficient means to group generalized additive values into ranges The solution is to create a table of defined bands to be joined with one or more fact tables 0 The table contains 7 group name 7 band lower and upper Values Value Band Reporting Bandigoupinamdpk bandiname bandisortinumb e pk feesieamed bandilowerj alue h n diupperiv alue


