Amazon.com的推薦：從商品到商品的協(xié)同過濾

rena_wang 2014-09-16

展開全文

Amazon.com 的推薦

從商品到商品的協(xié)同過濾

Greg Linden, Brent Smith, and Jeremy York · Amazon.com

推薦算法以其在電子商務網站的用途而著稱1，它們利用有關一個顧客的興趣作為輸入，來產生一個推薦商品的列表。很多應用僅僅使用顧客購買并明確表示代表其興趣的商品，但它們也可以利用其他屬性，包括已瀏覽的商品、人口統(tǒng)計特征數(shù)據(jù)、主題興趣，以及偏愛的藝術家。

在Amazon.com，我們利用推薦算法，對每位顧客提供在線商店個性化。在顧客興趣的基礎上，商店有了徹底的改觀，向一個軟件工程師展示編程類標題，向一位新媽媽展示嬰兒玩具。點擊率和轉化率——基于網絡和郵件廣告的兩個重要評估指標——極大地超越了那些未定向內容，比如banner廣告和熱賣列表。

電子商務推薦算法經常要運行在一個充滿挑戰(zhàn)的環(huán)境里。例如：
· 大型零售商有海量的數(shù)據(jù)，以千萬計的顧客，以及數(shù)以百萬計的登記在冊的不同商品。
· 許多應用要求結果實時返回，在半秒之內，還要產生高質量的推薦。
· 新顧客很典型，他們的信息很有限，只能以少量購買或產品評級為基礎。
· 較老的顧客信息豐沛，以大量的購買和評級為基礎。
· 顧客數(shù)據(jù)不穩(wěn)定：每一次交互都可提供有價值的顧客數(shù)據(jù)，算法必須立即對新的信息作出響應。

解決推薦問題有三個通常的途徑：傳統(tǒng)的協(xié)同過濾，聚類模型，以及基于搜索的方法。在此，我們就這些方法與我們的算法——我們稱之為商品到商品的協(xié)同過濾——進行對比。與傳統(tǒng)協(xié)同過濾不同，我們算法的在線計算規(guī)模，與顧客數(shù)量和產品目錄中的商品數(shù)量無關。我們的算法實時產生推薦，計算適應海量數(shù)據(jù)集，并生成高質量的推薦。

推薦算法
大多數(shù)推薦算法，都始于先找出一個顧客集合，他們買過和評級過的商品，與當前用戶買過和評級過的商品有重疊2。算法把來自這些相似顧客的商品聚集起來，排除該用戶已經購買過或評級過的商品，并向該用戶推薦其余的商品。這些算法有兩個最常見的版本：協(xié)同過濾和聚類模型。其他算法——包括基于搜索的方法以及我們自己的商品到商品協(xié)同過濾——都集中于尋找相似的商品，而不是相似的顧客。針對用戶所購買和評級的每一件商品，算法試圖找到相似的產品，然后聚集這些相似的商品，并給予推薦。

傳統(tǒng)的協(xié)同過濾
傳統(tǒng)的協(xié)同過濾算法把顧客描繪成商品的N維向量，其中N是登記在冊的不同商品的數(shù)量。對于買過或正面評級的商品，向量分量為正，對于負面評級的商品，向量分量為負。為彌補最熱賣的商品，算法典型地會把向量分量乘以反向頻率（已購買或評級該商品的顧客數(shù)量的倒數(shù)），以使不太知名的商品更加相關3。對幾乎所有的顧客來說，這個向量非常稀疏。
在與該用戶最相似的少數(shù)顧客基礎上，算法產生推薦。算法能夠測量兩個顧客的相似性，如A和B，有多種方式；一種常見的方法是測量這兩個向量之間的夾角余弦值4：

算法也能從相似顧客的商品里選擇推薦，有多種可以利用的方法，常見的一種技術是，按照購買該商品的相似顧客數(shù)量，對每件商品進行排序。

利用協(xié)同過濾來產生推薦，很耗計算。最壞的情況是O(MN)，其中M是顧客數(shù)量，N是產品目錄中商品的數(shù)量，因為算法要驗算M個顧客，并且對每個顧客最多要計算N種商品。但是，由于顧客向量的平均值很稀疏，算法的執(zhí)行更傾向于接近O(M + N)。掃描每一個顧客大約是O(M)，而不是O(MN)，因為幾乎所有顧客向量都只含有很少的商品，無需考慮產品目錄的規(guī)模。但有少數(shù)顧客，他們買過或評級過的商品在產品目錄中占有值得注意的百分比，需要O(N)處理時間。因此，算法最終執(zhí)行的大約是O(M + N)。盡管如此，對非常大的數(shù)據(jù)集來說——比如1千萬以上的顧客，以及1百萬以上登記在冊的商品——算法也會遭受嚴峻的性能和計算量問題。
通過減小數(shù)據(jù)量，可能部分緩解這些計算量的問題4。我們能夠減小M，通過對顧客進行隨機抽樣，或丟棄那些購買很少的顧客；我們也能減小N，通過丟棄那些極熱門和極冷門的商品。我們還可能減少所需計算的商品數(shù)量，通過一個小的常數(shù)因子，在產品類別或主題分類的基礎上，對商品空間進行區(qū)隔。諸如聚類和主分量分析等維度降低技術，也能很大程度減小M和N。
不幸的是，所有這些方法也會以各種形式降低推薦的品質。首先，如果算法只是驗算了一小部分顧客樣本，那么被選定顧客與當前用戶會較少相似。其次，商品空間區(qū)隔會把推薦限制在特定產品或主題領域之內。第三，如果算法丟棄了最熱門或最冷門的商品，這些商品將不會出現(xiàn)在推薦中，并且，只購買過這些商品的顧客，將不會得到推薦。向商品空間應用維度降低技術，會得到與排除冷門商品那樣相同的效果。向顧客空間應用維度降低技術，能有效地把相似顧客組合為群組，正如我們現(xiàn)在所說的，這樣的聚類也會降低推薦的品質。

聚類模型
為了尋找與當前用戶相似的顧客，聚類模型對顧客基礎進行細分，并把這個任務當做為分類問題。算法的目標是，把該用戶分配到含有最相似顧客的細分人群里，然后，算法再利用該細分顧客人群的購買和評級，來生成推薦。典型地說，顧客細分的建立，會采用一種聚類或其他無人管理的學習算法，盡管某些應用也用了手工決定的人群細分。利用一種相似性度量標準，聚類算法把最相似的顧客，分組聚合起來，形成聚類或細分人群。由于對大型數(shù)據(jù)集進行最理想的聚類不切合實際，大多數(shù)應用都采用了各種形式的greedy聚類生成。典型的情況是，這些算法始于各細分人群的一個初始集，每個初始集通常包含一個隨機選定的顧客，然后算法不斷重復地把顧客與現(xiàn)有的細分人群進行匹配，一般也會某些規(guī)定，以創(chuàng)建新的細分人群或是合并人群6。對于非常大的數(shù)據(jù)集——尤其是維度很高——抽樣或維度降低也是必要的。
一旦算法生成了細分人群，就計算當前用戶與概要描述每一細分人群的向量的相似性，然后選擇相似性最大的細分人群，并因此而對該用戶進行分類。某些算法把用戶分類進入多個細分人群，并對每組關系的強度進行描述。
較之協(xié)同過濾，聚類模型有更好的在線可擴展性和性能3，因為它們把當前用戶與可控數(shù)量的細分人群進行對比，而不是整個顧客基數(shù)。復雜和昂貴的聚類計算會離線運行。然而，推薦品質卻是低的1。聚類模型把無數(shù)的顧客分組進入細分人群，匹配一個用戶與一個細分人群，然后，以相似顧客細分人群里的所有顧客，來考慮產生推薦的目的。由于聚類模型發(fā)現(xiàn)的相似顧客并不是最相似的顧客，因而產生的推薦較少相關。通過大量精細粒度的細分人群，也可能提高推薦的品質，但那樣一來，在線的用戶-細分人群分類，就會變得與利用協(xié)同過濾來尋找相似顧客幾乎一樣昂貴。

基于搜索的方法
基于搜索或內容的方法，將推薦問題視為相關商品的搜索8。給定該用戶已買過和評級過的商品，算法構造一個搜索查詢，以尋找其他熱賣的商品，通過同一作者、藝術家或導演，或利用相似的關鍵詞或主題。例如，如果一個顧客買了Godfather（教父）的DVD系列，系統(tǒng)就會推薦其他的犯罪劇，Marlon Brando出演的其他劇目，或由Francis Ford Coppola導演的其他電影。
如果該用戶只有少數(shù)購買或評級，基于搜索的推薦算法在計算量和性能上都不錯。然而，對于有數(shù)千次購買的用戶，要以針對所有商品的查詢?yōu)榛A也不太可行。算法必須使用一個數(shù)據(jù)的子集或概要，因此降低了推薦的品質。在所有各種情況下，推薦品質相對較差。推薦通常就是要么太寬泛（比如最熱賣的劇集DVD），要么太狹窄（比如同一個作者的全部圖書）。推薦應該要幫助顧客找到和發(fā)現(xiàn)新的、相關的、有趣的商品。同一作者或同一主題領域的熱賣商品，沒有滿足這一目標。

商品到商品的協(xié)同過濾
Amazon.com在很多郵件營銷活動，以及在其大多數(shù)的網頁上，包括流量極大的網站首頁，都把推薦作為一種定向營銷工具。點擊“你的推薦”鏈接，會把顧客引向一個區(qū)域，在那里，顧客可以通過產品線和主題領域，進行推薦的篩選，為被推薦的商品進行評級，為以前的購買進行評級，并查看為什么這些商品被推薦了（見圖1，圖在最后）。
如圖2（圖在最后）所示，即我們的購物車推薦，以其購物車中的商品為基礎，向顧客給出產品建議。這一特性與超市結賬臺路線上的沖動購買類商品很類似，但我們的沖動購買類商品定向到每位顧客。
Amazon.com廣泛地采用推薦算法，針對每個顧客的興趣進行網站的個性化。因為現(xiàn)有的推薦算法，與Amazon.com千萬級的用戶和產品數(shù)量不能相稱，我們開發(fā)了自己的算法。我們的算法，也就是商品到商品的協(xié)同過濾，符合海量的數(shù)據(jù)集和產品量，并能實時得到高品質的推薦。

它如何工作
與把當前用戶匹配到相似顧客的做法不同，商品到商品的協(xié)同過濾，把該用戶所購買和評級的商品，匹配到相似的商品，然后組合這些相似的商品進入推薦列表9。
對于給定的一件商品，為了決定最相似的匹配，算法通過發(fā)現(xiàn)顧客傾向于一起購買的商品，建立一個相似商品的表格。利用對所有產品配對的迭代，以及為每個產品配對計算相似性測度，我們能建立一個產品到產品的矩陣。然而，許多產品配對沒有普通顧客，因此在處理時間和內存使用上，這種方法沒有效率。下述迭代算法提供了一種更好的方法，通過計算一件商品與所有相關產品之間的相似性：

For 每件商品 in 產品目錄, I1
   For 每位顧客 C   購買過 I1 的
   For 每件商品 I2 由顧客 C 所購買的
   記錄一顧客所購買的 I1 和 I2
   For 每件商品 I2
   計算相似度在 I1 與 I2 之間

計算兩個商品之間的相似性可以有多種方法，但通常的方法是，利用我們前面描述的余弦值，其中每個向量對應于一件商品而不是一位顧客，并且向量的M維度對應于已購買過該商品的顧客。
這個相似商品表格的離線計算極費時間，最糟糕時需要O(N2M)。但在實際運行中，它接近O(NM)，因為大多數(shù)顧客只有很少的購買。對購買最熱門商品顧客的抽樣，進一步減少了運行時間，同時對推薦的品質略有降低。
對于給定的相似商品表格，算法發(fā)現(xiàn)與當前用戶每次購買和評級相似的商品，把這些商品聚集起來，然后推薦最暢銷或關聯(lián)最強的商品。這一計算很快速，僅僅取決于該用戶購買或評級過商品的數(shù)量。

可擴展性：比較
Amazon.com有超過2900萬顧客，以及數(shù)百萬登記在冊的商品。其他主要零售商也有同等大小的數(shù)據(jù)源。在所有這些數(shù)據(jù)提供機會的同時，也是一種詛咒，遠遠突破了那些針對小三個數(shù)量級的數(shù)據(jù)集所設計的算法的限度。幾乎全部的現(xiàn)有算法，都是在小數(shù)據(jù)集上評估的。如，MovieLens數(shù)據(jù)集4包含35000名顧客和3000件商品，EachMovie數(shù)據(jù)集3包含4000名顧客和1600件商品。
對于非常大的數(shù)據(jù)集，一個可擴展的推薦算法必須離線運行最昂貴的計算。正如下面的簡要對比所顯示的，現(xiàn)有方法達不到這樣的要求：
· 傳統(tǒng)的協(xié)同過濾只做很少或不做離線計算，其在線計算量取決于顧客和登記在冊商品的數(shù)量。在大數(shù)據(jù)集的情況下，這樣的算法不可行，除非使用維度降低、抽樣或區(qū)隔——所有這些都降低了推薦的品質。
· 聚類模型能離線運行大量的計算，但推薦品質相對較差。出于改進，可以增加人群細分的數(shù)量，但這會使在線的用戶-細分人群的分類變得昂貴。
· 基于搜索的模型離線建立起關鍵詞、范疇、作者索引，但不能提供符合興趣、定向內容的推薦。對于購買和評級很多的顧客來說，這些算法的擴展性不佳。
商品到商品協(xié)同過濾的可擴展性和性能的關鍵是，它離線建立耗時巨大的相似商品表格。該算法的在線部分——針對當前用戶的購買和評級來尋找相似的商品——計算量獨立于商品目錄的規(guī)?；蝾櫩偷目倲?shù)；僅僅取決于該用戶買過或評級過多少個商品。因此，甚至是對于超大數(shù)據(jù)集，算法也很快速。由于該算法能推薦高度關聯(lián)的相似商品，推薦的品質就很出色10。與傳統(tǒng)的協(xié)同過濾不同，該算法在用戶數(shù)據(jù)有限的情況下也能運行良好，在少至2到3件商品的基礎上，產生高品質的推薦。

結論
通過為每位顧客建立個性化的購物體驗，推薦算法提供了一種有效的定向營銷形式。對于Amazon.com這樣的大型零售商，良好的推薦算法可在海量顧客基數(shù)和商品目錄上進行擴展，只需要子秒處理時間就能產生在線推薦，能對用戶數(shù)據(jù)里的變化立即做出反應，并能為所有用戶提供引人關注的推薦，而無需考慮購買和評級的數(shù)量。與其他算法不同，商品到商品的協(xié)同過濾能滿足這樣的挑戰(zhàn)。
未來，我們期望零售業(yè)為定向營銷更廣泛地應用推薦算法，包括網上和網下。對個性化來說，電子商務擁有最方便的工具，而同時，較之傳統(tǒng)撒大網的方式，該技術對轉化率的提升，也會引起網下零售商的關注，可用于信件、優(yōu)惠券及其他顧客通信中。

作者

Greg Linden是Amazon.com個性化部門的共同創(chuàng)始人、研究員及高級經理，他設計和開發(fā)了推薦算法。他目前是斯坦福大學商業(yè)研究生院Sloan Program中的管理學研究生。他的研究興趣包括推薦系統(tǒng)、個性化、數(shù)據(jù)挖掘以及人工智能。Linden在華盛頓大學獲得了計算機科學的理學碩士。聯(lián)系方式：Linden_Greg@gsb.stanford.edu。

Brent Smith領導著Amazon.com的自動化銷售團隊。他的研究興趣包括數(shù)據(jù)挖掘、機器學習以及推薦系統(tǒng)。他在San Diego的加州大學獲得了數(shù)學的學士，在華盛頓大學獲得了數(shù)學的理學碩士，并在那里做了些微分幾何的研究工作。聯(lián)系方式：smithbr@amazon.com。

Jeremy York領導著Amazon.com的自動化內容選擇和交付團隊。他的興趣包括分類數(shù)據(jù)的統(tǒng)計模型、推薦系統(tǒng)，以及網站展示內容的最佳選擇。他從華盛頓大學獲得了統(tǒng)計學的博士學位，在那里，他的論文獲得了Leonard J. Savage獎的貝葉斯應用計量經濟學和統(tǒng)計學的最佳論文獎。聯(lián)系方式：jeremy@amazon.com。

參考文獻

Amazon.com Recommendations

Item-to-Item Collaborative Filtering

Greg Linden, Brent Smith, and Jeremy York · Amazon.com

Recommendation algorithms are best known for their use on e-commerce Web sites,1 where they use input about a customer’s interests to generate a list of recommended items. Many applications use only the items that customers purchase and explicitly rate to represent their interests, but they can also use other attributes, including items viewed, demographic data, subject interests, and favorite artists.

At Amazon.com, we use recommendation algorithms to personalize the online store for each customer. The store radically changes based on customer interests, showing programming titles to a software engineer and baby toys to a new mother. The click-through and conversion rates — two important measures of Web-based and email advertising effectiveness — vastly exceed those of untargeted content such as banner advertisements and top-seller lists.

E-commerce recommendation algorithms often operate in a challenging environment. For example:
· A large retailer might have huge amounts of data, tens of millions of customers and millions of distinct catalog items.
· Many applications require the results set to be returned in realtime, in no more than half a second, while still producing high-quality recommendations.
· New customers typically have extremely limited information, based on only a few purchases or product ratings.
· Older customers can have a glut of information, based on thousands of purchases and ratings.
· Customer data is volatile: Each interaction provides valuable customer data, and the algorithm must respond immediately to new information.

There are three common approaches to solving the recommendation problem: traditional collaborative filtering, cluster models, and search-based methods. Here, we compare these methods with our algorithm, which we call item-to-item collaborative filtering. Unlike traditional collaborative filtering, our algorithm’s online computation scales independently of the number of customers and number of items in the product catalog. Our algorithm produces recommendations in realtime, scales to massive data sets, and generates highquality recommendations.

Recommendation Algorithms
Most recommendation algorithms start by finding a set of customers whose purchased and rated items overlap the user’s purchased and rated items.2 The algorithm aggregates items from these similar customers, eliminates items the user has already purchased or rated, and recommends the remaining items to the user. Two popular versions of these algorithms are collaborative filtering and cluster models. Other algorithms — including search-based methods and our own item-to-item collaborative filtering — focus on finding similar items, not similar customers. For each of the user’s purchased and rated items, the algorithm attempts to find similar items. It then aggregates the similar items and recommends them.

Traditional Collaborative Filtering
A traditional collaborative filtering algorithm represents a customer as an N-dimensional vector of items, where N is the number of distinct catalog items. The components of the vector are positive for purchased or positively rated items and negative for negatively rated items. To compensate for best-selling items, the algorithm typically multiplies the vector components by the inverse frequency (the inverse of the number of customers who have purchased or rated the item), making less well-known items much more relevant.3 For almost all customers, this vector is extremely sparse.
The algorithm generates recommendations based on a few customers who are most similar to the user. It can measure the similarity of two customers, A and B, in various ways; a common method is to measure the cosine of the angle between the two vectors: 4

The algorithm can select recommendations from the similar customers’ items using various methods as well, a common technique is to rank each item according to how many similar customers purchased it.
Using collaborative filtering to generate recommendations is computationally expensive. It is O(MN) in the worst case, where M is the number of customers and N is the number of product catalog items, since it examines M customers and up to N items for each customer. However, because the average customer vector is extremely sparse, the algorithm’s performance tends to be closer to O(M + N). Scanning every customer is approximately O(M), not O(MN), because almost all customer vectors contain a small number of items, regardless of the size of the catalog. But there are a few customers who have purchased or rated a significant percentage of the catalog, requiring O(N) processing time. Thus, the final performance of the algorithm is approximately O(M + N). Even so, for very large data sets — such as 10 million or more customers and 1 million or more catalog items — the algorithm encounters severe performance and scaling issues.
It is possible to partially address these scaling issues by reducing the data size.4 We can reduce M by randomly sampling the customers or discarding customers with few purchases, and reduce N by discarding very popular or unpopular items. It is also possible to reduce the number of items examined by a small, constant factor by partitioning the item space based on product category or subject classification. Dimensionality reduction techniques such as clustering and principal component analysis can reduce M or N by a large factor.5
Unfortunately, all these methods also reduce recommendation quality in several ways. First, if the algorithm examines only a small customer sample, the selected customers will be less similar to the user. Second, item-space partitioning restricts recommendations to a specific product or subject area. Third, if the algorithm discards the most popular or unpopular items, they will never appear as recommendations, and customers who have purchased only those items will not get recommendations. Dimensionality reduction techniques applied to the item space tend to have the same effect by eliminating low-frequency items. Dimensionality reduction applied to the customer space effectively groups similar customers into clusters; as we now describe, such clustering can also degrade recommendation quality.

Cluster Models

To find customers who are similar to the user, cluster models divide the customer base into many segments and treat the task as a classification problem. The algorithm’s goal is to assign the user to the segment containing the most similar customers. It then uses the purchases and ratings of the customers in the segment to generate recommendations. The segments typically are created using a clustering or other unsupervised learning algorithm, although some applications use manually determined segments. Using a similarity metric, a clustering algorithm groups the most similar customers together to form clusters or segments. Because optimal clustering over large data sets is impractical, most applications use various forms of greedy cluster generation. These algorithms typically start with an initial set of segments, which often contain one randomly selected customer each. They then repeatedly match customers to the existing segments, usually with some provision for creating new or merging existing segments.6 For very large data sets — especially those with high dimensionality — sampling or dimensionality reduction is also necessary.
Once the algorithm generates the segments, it computes the user’s similarity to vectors that summarize each segment, then chooses the segment with the strongest similarity and classifies the user accordingly. Some algorithms classify users into multiple segments and describe the strength of each relationship.7
Cluster models have better online scalability and performance than collaborative filtering3 because they compare the user to a controlled number of segments rather than the entire customer base. The complex and expensive clustering computation is run offline. However, recommendation quality is low.1 Cluster models group numerous customers together in a segment, match a user to a segment, and then consider all customers in the segment similar customers for the purpose of making recommendations. Because the similar customers that the cluster models find are not the most similar customers, the recommendations they produce are less relevant. It is possible to improve quality by using numerous finegrained segments, but then online user–segment classification becomes almost as expensive as finding similar customers using collaborative filtering.

Search-Based Methods

Search- or content-based methods treat the recommendations problem as a search for related items.8 Given the user’s purchased and rated items, the algorithm constructs a search query to find other popular items by the same author, artist, or director, or with similar keywords or subjects. If a customer buys the Godfather DVD Collection, for example, the system might recommend other crime drama titles, other titles starring Marlon Brando, or other movies directed by Francis Ford Coppola.
If the user has few purchases or ratings, searchbased recommendation algorithms scale and perform well. For users with thousands of purchases, however, it’s impractical to base a query on all the items. The algorithm must use a subset or summary of the data, reducing quality. In all cases, recommendation quality is relatively poor. The recommendations are often either too general (such as best-selling drama DVD titles) or too narrow (such as all books by the same author). Recommendations should help a customer find and discover new, relevant, and interesting items. Popular items by the same author or in the same subject category fail to achieve this goal.

Item-to-Item Collaborative Filtering
Amazon.com uses recommendations as a targeted marketing tool in many email campaigns and on most of its Web sites’ pages, including the hightraffic Amazon.com homepage. Clicking on the “Your Recommendations” link leads customers to an area where they can filter their recommendations by product line and subject area, rate the recommended products, rate their previous purchases, and see why items are recommended (see Figure 1).
As Figure 2 shows, our shopping cart recommendations, which offer customers product suggestions based on the items in their shopping cart. The feature is similar to the impulse items in a supermarket checkout line, but our impulse items are targeted to each customer.
Amazon.com extensively uses recommendation algorithms to personalize its Web site to each customer’s interests. Because existing recommendation algorithms cannot scale to Amazon.com’s tens of millions of customers and products, we developed our own. Our algorithm, item-to-item collaborative filtering, scales to massive data sets and produces high-quality recommendations in real time.

How It Works

Rather than matching the user to similar customers, item-to-item collaborative filtering matches each of the user’s purchased and rated items to similar items, then combines those similar items into a recommendation list.9
To determine the most-similar match for a given item, the algorithm builds a similar-items table by finding items that customers tend to purchase together. We could build a product-to-product matrix by iterating through all item pairs and computing a similarity metric for each pair. However, many product pairs have no common customers, and thus the approach is inefficient in terms of processing time and memory usage. The following iterative algorithm provides a better approach by calculating the similarity between a single product and all related products:

For each item in product catalog, I1
   For each customer C who purchased I1
   For each item I2 purchased by customer C
   Record that a customer purchased I1 and I2
   For each item I2
   Compute the similarity between I1 and I2

It’s possible to compute the similarity between two items in various ways, but a common method is to use the cosine measure we described earlier, in which each vector corresponds to an item rather than a customer, and the vector’s Mdimensions correspond to customers who have purchased that item.
This offline computation of the similar-items table is extremely time intensive, with O(N2M) as worst case. In practice, however, it’s closer to O(NM), as most customers have very few purchases. Sampling customers who purchase best-selling titles reduces runtime even further, with little reduction in quality.
Given a similar-items table, the algorithm finds items similar to each of the user’s purchases and ratings, aggregates those items, and then recommends the most popular or correlated items. This computation is very quick, depending only on the number of items the user purchased or rated.

Scalability: A Comparison

Amazon.com has more than 29 million customers and several million catalog items. Other major retailers have comparably large data sources. While all this data offers opportunity, it’s also a curse, breaking the backs of algorithms designed for data sets three orders of magnitude smaller. Almost all existing algorithms were evaluated over small data sets. For example, the MovieLens data set4 contains 35,000 customers and 3,000 items, and the EachMovie data set3 contains 4,000 customers and 1,600 items.
For very large data sets, a scalable recommendation algorithm must perform the most expensive calculations offline. As a brief comparison shows, existing methods fall short:
· Traditional collaborative filtering does little or no offline computation, and its online computation scales with the number of customers and catalog items. The algorithm is impractical on large data sets, unless it uses dimensionality reduction, sampling, or partitioning — all of which reduce recommendation quality.
· Cluster models can perform much of the computation offline, but recommendation quality is relatively poor. To improve it, it’s possible to increase the number of segments, but this makes the online user–segment classification expensive.
· Search-based models build keyword, category, and author indexes offline, but fail to provide recommendations with interesting, targeted titles. They also scale poorly for customers with numerous purchases and ratings.
The key to item-to-item collaborative filtering’s scalability and performance is that it creates the expensive similar-items table offline. The algorithm’s online component — looking up similar items for the user’s purchases and ratings — scales independently of the catalog size or the total number of customers; it is dependent only on how many titles the user has purchased or rated. Thus, the algorithm is fast even for extremely large data sets. Because the algorithm recommends highly correlated similar items, recommendation quality is excellent.10 Unlike traditional collaborative filtering, the algorithm also performs well with limited user data, producing high-quality recommendations based on as few as two or three items.

Conclusion

Recommendation algorithms provide an effective form of targeted marketing by creating a personalized shopping experience for each customer. For large retailers like Amazon.com, a good recommendation algorithm is scalable over very large customer bases and product catalogs, requires only subsecond processing time to generate online recommendations, is able to react immediately to changes in a user’s data, and makes compelling recommendations for all users regardless of the number of purchases and ratings. Unlike other algorithms, item-to-item collaborative filtering is able to meet this challenge.
In the future, we expect the retail industry to more broadly apply recommendation algorithms for targeted marketing, both online and offline. While e-commerce businesses have the easiest vehicles for personalization, the technology’s increased conversion rates as compared with traditional broad-scale approaches will also make it compelling to offline retailers for use in postal mailings, coupons, and other forms of customer communication.

Greg Linden was cofounder, researcher, and senior manager in the Amazon.com Personalization Group, where he designed and developed the recommendation algorithm. He is currently a graduate student in management in the Sloan Program at Stanford University’s Graduate School of Business. His research interests include recommendation systems, personalization, data mining, and artificial intelligence. Linden received an MS in computer science from the University of Washington. Contact him at Linden_Greg@gsb.stanford.edu.
Brent Smith leads the Automated Merchandising team at Amazon. com. His research interests include data mining, machine learning, and recommendation systems. He received a BS in mathematics from the University of California, San Diego, and an MS in mathematics from the University of Washington, where he did graduate work in differential geometry. Contact him at smithbr@amazon.com.
Jeremy York leads the Automated Content Selection and Delivery team at Amazon.com. His interests include statistical models for categorical data, recommendation systems, and optimal choice of Web site display components. He received a PhD in statistics from the University of Washington, where his thesis won the Leonard J. Savage award for best thesis in applied Bayesian econometrics and statistics. Contact him at jeremy@amazon.com.

References

1. J.B. Schafer, J.A. Konstan, and J. Reidl, “E-Commerce Recommendation Applications,” Data Mining and Knowledge
Discovery, Kluwer Academic, 2001, pp. 115-153.
2. P. Resnick et al., “GroupLens: An Open Architecture for Collaborative Filtering of Netnews,” Proc. ACM 1994 Conf. Computer Supported Cooperative Work, ACM Press, 1994, pp. 175-186.
3. J. Breese, D. Heckerman, and C. Kadie, “Empirical Analysis of Predictive Algorithms for Collaborative Filtering,” Proc. 14th Conf. Uncertainty in Artificial Intelligence, Morgan Kaufmann, 1998, pp. 43-52.
4. B.M. Sarwarm et al., “Analysis of Recommendation Algorithms for E-Commerce,” ACM Conf. Electronic Commerce, ACM Press, 2000, pp.158-167.
5. K. Goldberg et al., “Eigentaste: A Constant Time Collaborative Filtering Algorithm,” Information Retrieval J., vol. 4, no. 2, July 2001, pp. 133-151.
6. P.S. Bradley, U.M. Fayyad, and C. Reina, “Scaling Clustering Algorithms to Large Databases,” Knowledge Discovery and Data Mining, Kluwer Academic, 1998, pp. 9-15.
7. L. Ungar and D. Foster, “Clustering Methods for Collaborative Filtering,” Proc. Workshop on Recommendation Systems, AAAI Press, 1998.
8. M. Balabanovic and Y. Shoham, “Content-Based Collaborative Recommendation,” Comm. ACM, Mar. 1997, pp. 66-72.
9. G.D. Linden, J.A. Jacobi, and E.A. Benson, Collaborative Recommendations Using Item-to-Item Similarity Mappings, US Patent 6,266,649 (to Amazon.com), Patent and Trademark Office, Washington, D.C., 2001.
10. B.M. Sarwar et al., “Item-Based Collaborative Filtering Recommendation Algorithms,” 10th Int’l World Wide Web Conference, ACM Press, 2001, pp. 285-295.