定義 一、數(shù)據(jù)倉庫技術(shù)的發(fā)展歷程 比爾·恩門(Bill Inmon),被稱為數(shù)據(jù)倉庫之父,最早的數(shù)據(jù)倉庫概念提出者,在數(shù)據(jù)庫技術(shù)管理與數(shù)據(jù)庫設(shè)計(jì)方面,擁有逾35年的經(jīng)驗(yàn)。他是“企業(yè)信息工廠”的合作創(chuàng)始人與“政府信息工廠”的創(chuàng)始人。 比爾·恩門的思想與見識(shí)在所有重量級(jí)的計(jì)算機(jī)協(xié)會(huì)、許多產(chǎn)業(yè)會(huì)議、技術(shù)研討會(huì)上,都博得了無比的敬重。他寫過650多篇文章,大多發(fā)布在世界最知名的IT刊物里,DMReview雜志每期都有恩門先生的專欄文章,他寫了46本書籍,最著名的要數(shù)“Building the Data Warehouse”(《建立數(shù)據(jù)倉庫》),這本數(shù)據(jù)倉庫精典讀物倍受讀者喜愛,一而再再而三地升級(jí)出版發(fā)行,到目前已經(jīng)是第三版本,發(fā)行量達(dá)50多萬冊(cè)。也正是這本《建立數(shù)據(jù)倉庫》為恩門贏得“數(shù)據(jù)倉庫之父”的殊榮,國內(nèi)機(jī)械工業(yè)出版社也分別將第2第3版本引進(jìn)翻譯,恩門先生的著作也一直是亞馬遜電子商務(wù)網(wǎng)站的暢銷書,都深受廣大數(shù)據(jù)倉庫技術(shù)讀者喜歡。同時(shí)恩門又是最知名的數(shù)據(jù)倉庫咨詢顧問專家,他為許多名列《財(cái)富》1000排行榜的公司提供過數(shù)據(jù)倉庫設(shè)計(jì)和數(shù)據(jù)庫管理方面的咨詢服務(wù)。恩門這些年還創(chuàng)立過公司辦過網(wǎng)上教育,1995創(chuàng)建了現(xiàn)在的Ambeo公司。 恩門先生在上世紀(jì)80年代,其《建立數(shù)據(jù)倉庫》一書中定義了數(shù)據(jù)倉庫的概念,隨后又給出了更為精確的定義:數(shù)據(jù)倉庫是在企業(yè)管理和決策中面向主題的、集成的、與時(shí)間相關(guān)的、不可修改的數(shù)據(jù)集合。與其他數(shù)據(jù)庫應(yīng)用不同的是,數(shù)據(jù)倉庫更像一種過程,對(duì)分布在企業(yè)內(nèi)部各處的業(yè)務(wù)數(shù)據(jù)的整合、加工和分析的過程。而不是一種可以購買的產(chǎn)品。正是他當(dāng)初對(duì)數(shù)據(jù)倉庫的這個(gè)定義,已成為了業(yè)界引用最多、說得最廣的名言,每一個(gè)啟蒙的數(shù)據(jù)倉庫學(xué)習(xí)者都是從這一句名言開始的。 比爾·恩門的對(duì)頭隨著拉爾夫·金博爾(Ralph Kimball)博士出版了他的第一本書“The DataWarehouse Toolkit”(《數(shù)據(jù)倉庫工具箱》),數(shù)據(jù)倉庫行業(yè)就開始喧嘩起來,恩門的“Building the Data Warehouse”主張建立數(shù)據(jù)倉庫時(shí)采用自上而下(DWDM)方式,以第3范式進(jìn)行數(shù)據(jù)倉庫模型設(shè)計(jì),而他生活上的好朋友Ralph Kimball在“The DataWarehouse Toolkit”則是主張自下而上(DMDW)的方式,力推數(shù)據(jù)集市建設(shè),以致他們的FANS吵鬧得差點(diǎn)打了起來,直至恩門推出新的BI架構(gòu)CIF(Corporation information factory),把Kimball的數(shù)據(jù)集市包括了進(jìn)來才算平息。 在過去的15年中,Ralph Kimball和Bill Inmon一直是商業(yè)智能領(lǐng)域中的革新者,開發(fā)并測(cè)試了新的技術(shù)和體系結(jié)構(gòu)。他們都撰寫了關(guān)于數(shù)據(jù)倉庫的多本書籍,這些書也經(jīng)常被參考。Kimball 和 Inmon 都同意組織需要一個(gè)與遺留系統(tǒng)和聯(lián)機(jī)事務(wù)處理(OLTP)系統(tǒng)分開的數(shù)據(jù)倉庫,以捕獲組織的有關(guān)信息并且使之可用。他們也同意數(shù)據(jù)倉庫中的數(shù)據(jù)應(yīng)該是凈化的、一致的,并且不受到其來源的遺留系統(tǒng)和 OLTP 系統(tǒng)設(shè)計(jì)的牽制。 在開始第一個(gè)數(shù)據(jù)集市之前,他們還同意用針對(duì)整個(gè)體系結(jié)構(gòu)的思想重復(fù)構(gòu)建數(shù)據(jù)倉庫。到這里,他們的意見就發(fā)生了分歧。Bill Inmon將數(shù)據(jù)倉庫定義為“一個(gè)面向主題的、集成的、隨時(shí)間變化的、非易變的用于支持管理的決策過程的數(shù)據(jù)集合”(Building the data warehouse,第 2 版,第 33 頁)。Inmon通過“面向主題”表示應(yīng)該圍繞主題來組織數(shù)據(jù)倉庫中的數(shù)據(jù),例如客戶、供應(yīng)商、產(chǎn)品等等。 每個(gè)主題區(qū)域僅僅包含該主題相關(guān)的信息。數(shù)據(jù)倉庫應(yīng)該一次增加一個(gè)主題,并且當(dāng)需要容易地訪問多個(gè)主題時(shí),應(yīng)該創(chuàng)建以數(shù)據(jù)倉庫為來源的數(shù)據(jù)集市。換言之,某個(gè)特定數(shù)據(jù)集市中的所有數(shù)據(jù)都應(yīng)該來自于面向主題的數(shù)據(jù)存儲(chǔ)。Inmon 的方法包含了更多上述工作而減少了對(duì)于信息的初始訪問。但他認(rèn)為這個(gè)集中式的體系結(jié)構(gòu)持續(xù)下去將提供更強(qiáng)的一致性和靈活性,并且從長遠(yuǎn)來看將真正節(jié)省資源和工作。Ralph Kimball說“數(shù)據(jù)倉庫僅僅是構(gòu)成它的數(shù)據(jù)集市的聯(lián)合”(Figure 2,The Data Warehouse Lifecycle Toolkit,第 27 頁)。 他認(rèn)為“可以通過一系列維數(shù)相同的數(shù)據(jù)集市遞增地構(gòu)建數(shù)據(jù)倉庫”。每個(gè)數(shù)據(jù)集市將聯(lián)合多個(gè)數(shù)據(jù)源來滿足特定的業(yè)務(wù)需求。通過使用“一致的”維,能夠共同看到不同數(shù)據(jù)集市中的信息,這表示它們擁有公共定義的元素。Kimball的方法將提供集成的數(shù)據(jù)來回答組織迫切的業(yè)務(wù)問題并且要快于Inmon的方法。Inmon的方法是只有在構(gòu)建幾個(gè)單主題區(qū)域之后,集中式的數(shù)據(jù)倉庫才創(chuàng)建數(shù)據(jù)集市。而Kimball認(rèn)為該方法缺乏靈活性并且在現(xiàn)在的商業(yè)環(huán)境中所花時(shí)間太長。 從Inmon被人尊稱為數(shù)據(jù)倉庫之父,就可以看出,inmon對(duì)于數(shù)據(jù)倉庫領(lǐng)域的技術(shù)發(fā)展作起的作用的巨大的,無數(shù)數(shù)據(jù)倉庫愛好者甚至把《建設(shè)數(shù)據(jù)倉庫》看作是數(shù)據(jù)倉庫的“圣經(jīng)”。inmon自己創(chuàng)建的網(wǎng)站上的文章被廣為傳頌,每當(dāng)有inmon公開演講的時(shí)候,很多用戶和技術(shù)人員都把能夠聆聽inmon的最新成果為榮。在企業(yè)信息工廠的設(shè)計(jì)藍(lán)圖中,inmon清除地描述了如何從各種業(yè)務(wù)系統(tǒng)當(dāng)中捕獲需要的數(shù)據(jù),并在隨后的流程中,為適應(yīng)不同的需求,而逐漸演變?yōu)楦鞣N不同的形態(tài),所有的這一切都圍繞著一個(gè)最重要的部件來運(yùn)轉(zhuǎn),這就是企業(yè)數(shù)據(jù)倉庫。 在國內(nèi)數(shù)據(jù)倉庫領(lǐng)域,inmon和kimball的理論也一度爭論不休,但是隨著數(shù)據(jù)倉庫建設(shè)的逐步深化,把企業(yè)數(shù)據(jù)倉庫作為企業(yè)數(shù)據(jù)整合平臺(tái)的思路深得人心,越來越多的企業(yè)開始強(qiáng)調(diào)在企業(yè)內(nèi)部建立一個(gè)企業(yè)級(jí)別的數(shù)據(jù)倉庫來支持整個(gè)企業(yè)的發(fā)展和運(yùn)作。 比爾·恩門的重點(diǎn)著作以下列出恩門的幾本重點(diǎn)著作: 1、“Building the Data Warehouse ”(《建立數(shù)據(jù)倉庫》) 2、“Corporation information factory”(《企業(yè)信息工廠》) 3、“Govment information factory”《政府信息工廠》 4、“The Data Model Resource Book: A Library of Logical Data and Data Warehouse Designs”(《數(shù)據(jù)倉庫建?!罚?/font> 5、“Managing the Data Warehouse”(《數(shù)據(jù)倉庫管理》) 6、“Data Warehousing for E-Business”(《電子商務(wù)中的數(shù)據(jù)倉庫技術(shù)》) Bill InmonWilliam Harvey Inmon (born 1945) is an American computer scientist, recognized by many as the father of the data warehouse.[1][2] Bill Inmon wrote the first book, held the first conference (with Arnie Barnett), wrote the first column in a magazine and was the first to offer classes in data warehousing. Bill Inmon created the accepted definition of what a data warehouse is - a subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions. Compared with the approach of the other pioneering architect of data warehousing, Ralph Kimball, Inmon's approach is often characterized as a top-down approach. BiographyBill Inmon was born July 20, 1945 in San Diego, California. He received his Bachelor of Science degree in Mathematics from Yale University, and his Master of Science degree in Computer Science from New Mexico State University. He has worked for American Management Systems and Coopers & Lybrand before 1991, when he founded the company Prism Solutions, which he took public. In 1995 he founded Pine Cone Systems, which was renamed Ambeo later on. In 1999, Bill created the Corporate Information Factory Web site to educate professionals and decision makers about data warehousing and the Corporate Information Factory.[3] Further Bill Inmon was the creator of the Government Information Factory, as well as Data Warehousing 2.0. Mr. Inmon is a prolific author on the building, usage, and maintenance of the data warehouse and the Corporate Information Factory. His books include "Building the Data Warehouse" (1992, with later editions) and "DW 2.0: The Architecture for the Next Generation of Data Warehousing" (2008). In July 2007 Bill was named by Computerworld as one of the ten people that most influenced the first 40 years of the computer industry.[4] Bill Inmon's association with data warehousing stems from the fact that he wrote the first book on data warehousing, he coined the original term, he held the first conference on data warehousing (with Arnie Barnett), he wrote the first column in a magazine on data warehousing, he has written over 1,000 articles on data warehousing in journals and newsletters, he created the first fold out wall chart for data warehousing and he conducted the first classes on data warehousing. Recent advances by Bill include the creation of DW 2.0 - the definition of the next generation of data warehousing. In addition Bill was the creator of the corporate information factory (the "cif") which describes the larger information architecture into which warehousing fits. More recently Bill has developed the technology for including unstructured textual data into the data warehouse - the worlds first "textual ETL". PublicationsBill Inmon has published more than 40 books and 1,000 articles on data warehousing and data management. A selection:
References
|
|