Web and social media analytics: A data and technology perspective

Web analytics is the measurement, collection, analysis, and reporting of internet data for the purposes of understanding and optimizing web usage. Today, web interactions between commercial businesses and their customers are as important, if not more so, for a business’s growth as customer touches through traditional voice and bricks-and-mortar channels. This leading research deck discusses the data and technology opportunities and challenges associated with web analytics.

Show transcript

Web and social media analytics A data and technology perspective

Raj Parande Principal +1-312-578-4675 rajendra.parande@strategyand.pwc.com Yuri Goryunov Principal +1-312-578-4791 yuri.goryunov@strategyand.pwc.com

Florham Park, NJ
Ramesh Nair Partner +1-973-410-7673 ramesh.nair@strategyand.pwc.com

Steffen Gnegel also contributed to this report.

This report was originally published by Booz & Company in 2011.
Strategy& 2

What is Web analytics, and why is it so essential but challenging for today’s businesses?
What is web analytics?
Definition: Web analytics is the measurement, collection, analysis, and reporting of Internet data for the purposes of understanding and optimizing Web usage. (Source: Web Analytics Association)

Why is it important?
•  Today, Web interactions between commercial businesses and their customers take place via e-commerce stores, customer service sites, interactive real-time chat, e-mail, and social media streams. These interactions are as important, if not more so, for a business’s growth as customer touches through traditional voice and bricks-and-mortar channels •  Web data integrated with other channels provides a better picture of the customer–business relationship and helps in identifying customer trends •  It is also useful in assessing the effectiveness of marketing campaigns and optimizing marketing spend •  And it improves the customer experience through faster service, thereby driving business growth and enhancing reputation

What are the challenges?
•  Data volume growth is accelerating, making it cumbersome to capture and analyze Web data •  Unstructured social media data growth compounds the challenge, particularly as it must be integrated with enterprise structured data •  Multiple Web interaction platforms (PC, smartphone, tablet) further add to data capture and integration challenges •  Location and other smartphone sensor-based feeds also increase the complexity of continuous/real-time data capture •  There is no single tool available to capture and analyze all types of data
Strategy& 3

The number of Internet, social media, and mobile users tripled over the past decade, reaching a third of the world’s population
World Internet users, Dec. 1994–Mar. 2011 (in millions of people)

Observations •  In 2000, there were 390 million Internet users in the world •  By March 2011, there were 2.1 billion Internet users •  The number includes 78% of North Americans and 1 billion people in Asia and the Middle East combined •  More than 800 million people use Facebook, with Americans spending over 53 billion minutes a month on the site


1,018 248
1994 1996 1998 2000








Mobile data consumed, June 2009–Mar. 2011 (in millions of megabytes)

•  About 350 million currently active Facebook users access it from their mobile devices •  Twitter users are also generating more than 1 billion tweets each week •  At 140 characters per message, Twitter users alone generate nearly 500 gigabytes of information, the equivalent of 500 Encyclopaedia Britannicas, every month

700 535 230 320 400 440


04/2009 07/2009 10/2009 01/2010 04/2010 07/2010 10/2010 01/2011 04/2011

Source: Internet Usage Statistics (www.internetworldstats.com/stats.htm); Facebook Press Room (www.facebook.com/press/info.php?statistics); Twitter Blog (blog.twitter.com/2011/03/numbers.html); Opera Software State of the Mobile Web (media.opera.com/media/smw/2011/pdf/smw042011.pdf); Strategy& analysis Strategy& 4

In the consumer space alone, Internet-based social and commerce markets represent a multibillion-dollar opportunity
Social gaming (in US$ billion)
6.6 CAGR +45% 4.5 4.5 3.2 3.1 2.2 1.5 2.1 1.5 1.0 0.1 2011 2012 2013 2014 2015 2011 2012 2013 2014 2015 2016 2010 2011 2012 2013 2014 2015 3.0 5.0 CAGR +39% 6.5

Mobile commerce (in US$ billion)

Social commerce (in US$ billion)
12.0 CAGR +161% 8.0

Players §  Playfish §  Zynga §  Playdom
Source: Strategy& research and analysis Strategy&

Players §  Gilt Groupe §  eBay §  Amazon

Players §  Groupon §  Yelp §  Living Social


Customers have very high expectations for their end-to-end online experience
Customer expectations while browsing websites
•  Guide me to the site •  Allow me to shop through multiple channels •  Remember who I am between visits ‒  Persistent cookies even if unauthenticated •  Give me a personalized landing page ‒  Tailored banner ads, promotions, and/or recommendations

•  Make it easy to find what I need ‒  Customized search results based on previous purchasing activities •  Give me relevant content, let me look and compare •  Share the “wisdom of the crowd” with me ‒  Display what people “like me” have looked at •  Recommend products I might be interested in ‒  Behavioral targeting based on site activity

•  Let me configure my products •  Guide my shopping decision with relevant advice •  Give me complete visibility into availability, delivery time, and method •  Give me the right price •  Provide sales support (e.g., click-to-chat) ‒  Respond to my browsing behavior with targeted assistance

•  Allow me to use my preferred payment method •  Give me flexible shipping options •  Send me promotions and coupons ‒  Customized based on my previous browsing and purchasing behavior •  Let me go straight to checkout

Receive & use
•  Tell me when my order will arrive •  Allow me to modify or cancel my order •  Make sure my order arrives on time •  Let me download related software or apps from the site

Get support & interact
•  Make it easy for me to find product support online ‒  Direct me to appropriate articles or help desk agents based on the products I own •  Allow me to connect with other customers and enthusiasts

Web analytics enabled

Technology foundation
Source: Strategy& analysis Strategy& 6

Customer analytics example: Amazon recommends targeted products based on crowd user behavior or specific user profile data

Customer analytics

A prospect is shopping on Amazon for a smartphone

Through insights generated from users’ purchasing behaviors…

…the prospect is provided with product recommendations for similar phones

…the customer is provided with recommendations for similar products or accessories


A customer logs onto his/her Amazon account

Based on previous purchase history…

Source: Strategy& analysis Strategy& 7

Web analytics example: A client was able to significantly increase average order value by leveraging online data for behavioral targeting
Observe & learn Recognize Target Average revenue per order
(Before & during behavioral targeting pilot)

$123.54 $100.20 +23%

•  Customer interactions on a webpage •  Site navigation patterns •  Repeat visitor activity •  Search context •  Purchase/conversion history •  User profile data

•  Individual behavioral patterns •  Customer segment classification •  Wisdom of the crowd

•  Product recommendations – Increased average order value (AOV) Focus of pilot

–  “Hot” products and content •  Content personalization – Increased conversion, –  Product affinities loyalty (customers who viewed/ bought X also viewed/ bought Y) –  Popular search hits and misses •  Search personalization – Increased conversion, loyalty



Support tools (performance reporting, multivariate testing, site optimization)
Source: Strategy& analysis Strategy& 8

Companies have to migrate from a Web analysis tool infrastructure to an integrated architecture to enable a customized user experience
Web and social media analytics: architecture options
Real-time integration Benefits Product recommendation tools Web analysis tools
Technical capabilities needed •  Very limited (JavaScript tags on each webpage, familiarity with vendor software) •  Enable analyses of how people use a website (time on site, pages visited, etc.) •  Great for marketing dept. •  Low implementation efforts •  Analysis is timeconsuming, data not as granular and accurate as data warehouse •  Coremetrics, Google Analytics, Omniture Technical capabilities needed •  Limited (JavaScript tags on each webpage, application programming interfaces) •  Recommendation engines can look at userlevel behavior and suggest appropriate products or place targeted ads •  No dynamically customized websites because the data used is mainly clickstream data and not other CRM data •  Certona Technical capabilities needed •  Extensive (data warehouse integrated with CRM systems, data visualization software) •  Dynamically customized websites, enabled by realtime data from multiple sources •  Very effective targeting due to integration with other data (e.g., CRM) •  High implementation efforts •  Significant technical expertise needed (typically in-house) •  Data warehouse: Teradata, Aster Data •  Data visualization: Tableau






Cons Sample vendors

Sample vendors

Sample vendors

Implementation complexity
Source: Strategy& analysis Strategy& 9

Providing a rich experience requires a robust analytic capability, integrating disparate sources of structured and unstructured data
Example analytics
Customer analytics
•  Targeting promotions and personalizing offers (e.g., customized mailing, rewards, coupons) •  Product recommendations

Data used
•  Customer purchasing behavior •  Purchase history

Marketing analytics

•  Optimizing marketing mix and promotions •  Pricing optimization and demand sensitivity

•  Marketing response data •  Pricing sensitivity data

Web analytics

•  Customer online activity analysis •  Sentiment analysis •  •  •  •  Demand and inventory forecasting Localization Supply chain analysis Workforce optimization

•  Web activity data •  Customer social media posts •  Demand data •  Inventory data •  Location data (Web usage, smartphone) •  Customer interaction data •  Purchase returns data •  Inventory data •  Distribution data •  HR data

Operational analytics

Fraud & risk analytics

•  Fraud analytics •  Shrinkage analysis

Source: Strategy& analysis Strategy& 10

But the explosion of unstructured data volumes requires new approaches to data consolidation and analytics applications

Structured and Unstructured Data Evolution
Change Drivers 1. Speed: Data access speeds of physical storage mechanisms have not kept up with improvements in network speeds Scale: Traditional data storage techniques like RDBMS have limited scalability to manage growing data volumes (clustering beyond a handful of servers is notoriously difficult) Integration: Today's data processing tasks increasingly need to access and combine data from many different unstructured sources, often over a network Volume: Data volumes have grown from tens of gigabytes in the 1990s to hundreds of terabytes and often petabytes in recent years

Complex, unstructured



Relational, structured 1970 1980 1990 2000 2012

Relational, Structured Data •  •  •  •  CRM Financials Logistics Data marts •  •  •  •  Inventory Sales records HR records Web profiles •  •  •  • 

Complex, Unstructured Data Documents Web feeds System logs Online forums •  •  •  •  SharePoint Sensor data Audio Images/video

Source: IDC white paper sponsored by EMC and Cloudera Strategy& 11

Unstructured data integration and analytics face multiple challenges, but they can be overcome with some new ILLUSTRATIVE innovations
Leading Web analytics and industry trends
Capturing & analyzing multiple streams of data
•  Social feed integration with Web and warehouse data for advanced customer analytics •  Task- or page-targets-based unobtrusive, short, highly actionable, quick feedback data supplementing site surveys •  Added sources of data and complexity of integration from multiple platforms and form factors (smartphones, tablets) •  Complexity of integrating structured data with unstructured feeds from Web, social media, chat, and Internet-connected televisions •  New analytics, storage, and processing for accelerated integration at lower costs due to exponential growth of “big data” needs •  No single tool to capture and analyze massive Web data, requiring concurrent use of multiple analytic tools •  Targeted offers based on customer location tracking enabled by GPS and cell-based tracking mechanisms in smartphones •  Interaction opportunities from tracking customer check-ins at vendor locations using new services •  Browser-based features for customer to opt out of tracking •  Upcoming regulations like FTC “Do Not Track” initiatives •  Android- and iOS-based app developers self-regulating and asking customer permission for data collection

Vendors and products
•  Google Analytics, Adobe/Omniture, IBM/Coremetrics/Unica •  ForeSee

Multiple platforms for customers to interact on

•  Google Android, Apple iOS, RIM BlackBerry •  Google TV, Boxee, Apple TV

Increasing volume of data at a faster rate

•  Google MapReduce, Apache Hadoop, Google Caffeine •  Google Analytics, Omniture, SAS

Continuous Data streams

•  Google Latitude •  Foursquare •  Gowalla


•  Mozilla Firefox, Google Chrome, Opera

Source: Strategy& analysis Strategy& 12

To meet the challenges and gain the benefits of integrating Web and enterprise data, multiple technology enhancements are needed
Enhancements to Web and enterprise analytics •  Redesign and refine websites by optimizing site areas and page types, and rationalizing page tags to track interactions •  Upgrade infrastructure (e.g., Hadoop clusters, tag management systems) and processes to collect data from multiple streams including Web channel, social media, video, and smartphone apps •  Implement validation process and engines to ensure correct data capture •  Implement multiple Web tools (Google, Omniture, etc.) and enterprise analytics tools (SAS) to fill any gaps in data capture and enhance analytic capabilities Integrating multi-stream data with mapreduce/hadoop
Structured smartphone app data Structured web form data Unstructured web data Social feeds

Batch/real-time Structured analytic warehouse Mapreduce/hadoop

Batch/on-demand •  •  •  •  • 

Analytic modeling
Ad hoc queries Model execution Dashboard feeds Accelerate nightly batches Automatic redundant backups


Source: Strategy& analysis Strategy& 13

New technologies, such as MapReduce and Hadoop, can be utilized to quickly process large sets of unstructured data

A Traditional data integration


B Unstructured data integration using Hadoop

Business intelligence & analytics
Analytics reporting

Business intelligence & analytics
Fraud detection Behavioral ad targeting Analytics reporting Consumer analytics Algorithmic models
Distributed cloud servers for scalability

A  § Traditional technologies are A








optimized for processing structured data and presenting results for a narrow range of analytic applications. Substantial manipulations are required to process large volumes of unstructured data

ODS Granular, lowlevel details Data feeds Granular, lowlevel details Data feeds


DM Analytics Warehouse

ODS Granular, low-level details Data feeds


DM Analytics Warehouse

“Mapped” and reduced data sets

Extract/transform/ load (ETL)

Extract/transform/ Load

Create MapReduce/import

Data sources

Data sources
EDI Application DB External Feeds Text LOG Objects JSON Binary

Application DB Relational



Application DB External feeds

Application DB Relational

§   New distributed technologies B such as MapReduce (developed by Google) and Hadoop (open-source Apache platform) are created for the purpose of processing large volumes of unstructured data and importing the results for use by a broad range of analytic applications

Flat Files

Flat files

Unstructured data

Source: Strategy& analysis Strategy& 14

The MapReduce model does not replace traditional enterprise RDBMS; it tackles problems that could not be solved previously
Comparing RDBMS to MapReduce
RDBMS Data size Access Structure Language Integrity Scaling Updates Latency Gigabytes Interactive and batch Fixed schema SQL High Nonlinear Read and write Low MapReduce/Hadoop Petabytes Batch Unstructured schema Procedural (Java, C++, Ruby, etc.) Low Linear Write once, read many times High

How Hadoop complements RDBMS

•  Storage of extremely high volumes of enterprise data •  Accelerating nightly batch business processes •  Improving the scalability of applications •  Creating automatic, redundant backups •  Producing just-in-time feeds for dashboards and business intelligence •  Use of Java for data processing instead of SQL •  Turning unstructured data into relational data •  Taking on tasks that require massive parallelism •  Moving existing algorithms, code, frameworks, and components to a highly distributed computing environment

MapReduce and Hadoop enable execution of analytics on the complete universe of data rather than on a sample set, as done traditionally in an RDBMS. This provides better analytic output for higher-quality decision making
Source: “10 Ways to Complement the Enterprise RDBMS Using Hadoop,” by Dion Hinchcliffe Strategy& 15

Successful implementation of MapReduce/Hadoop requires “heavy lifting” enhancements at every layer of data architecture

Areas of consideration in Hadoop Adoption Business intelligence & analytics
Fraud detection Dashboards Behavioral ad targeting Analytics reporting Consumer analytics Algorithmic models

1. 1  7 2. 2  Data readiness of structured, external, and unstructured data sources needs to be assessed to determine which sources are suitable for the Hadoop platform and what new capabilities will be enabled or what existing capabilities will be better served ETL jobs need to be reviewed and possibly rationalized, since some data is now sourced through Hadoop. Scheduling need to be rationalized while dependencies integrity is closely monitored and preserved Define and implement the distributed file system while ensuring consistency of foreign keys, such as customer or product identifiers. On the infrastructure side, some nonfunctional requirements may be relaxed (e.g., backups and uptime do not need to be as strict as with conventional infrastructure) Modify EDW schemas to accept data feeds from Hadoop Define which data elements will be bi-directionally synchronized between EDW and Hadoop MapReduce/Hadoop capabilities need to be joined with SQL so that MapReduce routines can be managed and optimized like other SQL queries. This will allow MapReduce programs to react differently depending on the data and parameters presented at run time, eliminating the need to create many versions of a program for different situations Review and rationalize extract programs that shepherd data from the database to a series of downstream files, such as statistical analysis or data mining





3. 3 





4  4.
5. 5 


Extract/transform/ load

3 Create MapReduce/import 1

6. 6 

Data sources
Application DB Relational XML CSV Flat files EDI Application DB External feeds Text LOG JSON

7. 7 
Objects Binary

Unstructured data

Source: Strategy& analysis Strategy& 16

Besides data technologies, other dimensions of the solution stack must also be considered
Navigation patterns Channels Metadata tagging •  Now that we better understand browsing patterns, how do we change navigation or usability? •  Does the technology stack allow for appropriate and timely information updates? •  What additional content metadata can improve interpretation of clickstream data? •  What metadata attributes are common across channels? •  Where should those attributes be stored and maintained? •  What is the acceptable data lag? •  Given the volume of transactions, is our architecture used in the optimal way and does it provide answers to the most important questions? •  What new content needs to be created to address insights delivered by analytics? •  How do we ensure consistency of content across channels? •  How do we prevent it from becoming “stale”? §  In regulated industries, how can we shorten the content approval cycle while maintaining compliance? §  Do we have optimal workflows and effective governance?




Mobile Social

Real-time & near-real-time analytics

Search Video Commerce Gaming Content creation

Governance & compliance
Source: Strategy& analysis Strategy&


Integrating Web, social media, smartphone, and other unstructured data poses multiple challenges but provides significant benefits
Integrating Web data poses resource, infrastructure, and other challenges
Infrastructure/technology constraints 42 Lack of staff or resources necessary to execute 33 41 42 33 25 21 26 26 26 17 27 18 21 11 21 17 14 15 4 None 3 6 $1B or more $100M to less than $1B $50M to less than $100M 54 50

Integrating Web data provides better insights and other benefits
Quality of online marketing Overall quality of the website Quality of offline marketing Quality of search engine placement Quality of product merchandising Quality of internal marketing efforts (house ads) Customer satisfaction Quality of content Ability to meet needs of different segments Performance of the website Navigability of the website Quality of internal search Data analysis cost efficiencies Ability to perform ad hoc analysis Faster time-to-market Decreased dependence on IT resources 12 26 24 22 18 30 47 46 19 52 48 24 24 24 35 50

Marketing effectiveness

Independent data use and analysis in business units

Don’t want to share the data across applications

Customer expectations

Too difficult to perform analysis on large data set

No solution capable of meeting integration requirements

Don’t see a need to integrate data

Operational efficiencies

Source: Forrester: Jupiter Research e-Rewards Executive Survey (2/08), n = 514 (small and medium-sized business decision makers, U.S.); Strategy& analysis Strategy& 18

Strategy& is a global team of practical strategists committed to helping you seize essential advantage. We do that by working alongside you to solve your toughest problems and helping you capture your greatest opportunities.

These are complex and high-stakes undertakings — often game-changing transformations. We bring 100 years of strategy consulting experience and the unrivaled industry and functional capabilities of the PwC network to the task. Whether you’re

charting your corporate strategy, transforming a function or business unit, or building critical capabilities, we’ll help you create the value you’re looking for and impact.

We are a member of the 157 countries with more than 184,000 people committed to delivering quality in assurance, tax, and advisory services. Tell us out more by visiting us at strategyand.pwc.com.

This report was originally published by Booz & Company in 2011.

© 2011 PwC. All rights reserved. PwC refers to the PwC network and/or one or more of its member firms, each of which is a separate legal entity. Please see www.pwc.com/ structure for further details. Disclaimer: This content is for general information purposes only, and should not be used as a substitute for consultation with professional advisors.