Saturday, 25 February 2017

Using Analytics To Predict Movie Success At The Oscars

With the advent of Big Data and analytics, the world has changed in ways previously unimaginable. In a rapidly growing and thriving industry such as the motion picture industry, data analytics has opened a number of important new avenues that can be used to analyze past data, make creative marketing decisions, and accurately predict the fortunes of impending movie releases.
The timing of the movie release is critical to the success of a movie. To facilitate the release date selection, studios decide and pre-announce the targeted release dates on a weekly basis long before the actual release of their forthcoming movies. Their choices of release dates and then the subsequent changes are strategic in nature taking into consideration various factors like regional holidays, cultural events, political situation, sports events etc. Predictive analytics using the historical movie release data and their box office performance can help us identify the ideal release date of the movie to maximize performance at the box office.
Consider a scenario where a movie has already been slated for release on a particular date. Suddenly, a competitor movie is announced and the production house should decide whether it should go ahead with the release or make any changes.

Business Challenge:

Determining optimal release date for a movieleveraging analytics to enhance box office success rate for the movie.
‘Cats & Dogs’ and ‘America’s Sweethearts’ were scheduled to release on July 04, 2001. To avoid competition, ‘America’s Sweethearts’ was moved forward by a week to July 13, 2001 but soon a new entrant, ‘Legally Blonde’ was announced to be released on July 13, 2001. With a number of new players, what can be done to optimize release date for box-office success?

Approach

Social media analytics can be used to predict the optimal release date for a movie. Using data collected from social media channels, we can gauge expectations of the target audience and the buzz towards the movie.

Collection of Data

There are lot of macro and micro economic factors that affect the release date of a movie. Some of the factors that can be measured and factored during analysis are explained in the diagram below. Most of the data can be collected from public sources like IMDB, Rovi etc.
Using Analytics To Predict Movie Success At The Oscars
Studio Title
The database will contain historical data of all the movies at a studio and genre level including cast, support and box office performance.
CompetitionData of competitor movies (those getting released in the same week) needs to be analyzed carefully since other movie releases in the same genre will impact the performance of the movie at the box office.
Social Media Analysis
This process involves analyzing social media conversations along with the themes, sentiment and demographic features. This extracted data will help promote creatives that have the potential to create maximum impact and will also help identify the right target audience.
Major Events
Cultural events, sports events like the FIFA world cup, political events – elections, protests etc. also play an important role when it comes to the release of a movie. In these scenarios, it may not be ideal to release the movie during these events as theatre occupancy rates are generally lower.

Analysis:

Illustrative Scorecard
All factors considered, the ideal date on which America’s Sweethearts was to be released was decided in order to ensure higher probability of success.
A database with the release calendar provided information on competitors release, genre, budget etc. It is important to create a set of rules that satisfy the success criteria (which can be generated based on CART/ Decision Tree Rules Generation using historical data.)
Inferences:
  • The data was collected from social media by using keywords relevant to the movie; thereafter, using text mining, top themes were extracted from the tweet data. Since the movie revolved around a love story, the most popular theme for the movie was the genre, which in this case was romance. The other significant genre which created a buzz in social media was comedy. Using this analysis, the most apt genre for the movie was decided, which was then used to market the movie.Inferences From Analysis
  • Using the same social media data, sentiment analysis was carried out. This gave the production house an idea of the public sentiment about the movie. The analysis showed ‘Joy’ as the top sentiment followed by ‘Love’Sentiment Analysis
  • We also saw the excitement about the movie release at different levels like, age, gender and location. This helped us identify the age group, gender and location where the movie gained maximum publicity, which will in turn helped with the marketing
    For e.g. Women in the age group of 21-24 dominated the social conversations while men in the age group of 26 to 32 had higher social engagement

Business Impact:

The inferences from social media data helped to select the final release date for the movie. Hence, instead of releasing the movie in the second week, it would have been better to release it in the third week since it had 25% more chances of success.
Business Impact
While winning an Oscar might be the ultimate taste of success, winning at the box office is as sweet a measurement of success. Though we cannot promise an Oscar, with analytics we can make sure of the latter!
Note: The engagement numbers and impressions mentioned are for representative purpose only.


*With inputs from Sathish Prabahar

Monday, 13 February 2017

Steering Past the Big Data Black Hole

A recent Gartner survey found that 73% of companies have invested or will invest in Big Data in the next 24 months. But 60% of them will fail to go beyond the pilot stage. These companies will be unable to demonstrate business value.
At the same time, of those who have already invested, 33% have reached a stage where they have started to gain a competitive advantage from their Big Data. There is a huge chasm between deployment and demonstrating ROI and most companies are diving deep into it.
Much of the success of a Big Data strategy lies in the Data Architecture.
Its no longer adequate to collect data just for internal compliance. Data requirements are changing from pure procedural data (from ERP systems, say for example) to data for profit, the kind that can lead to significant business Insights.
This requires that things be done differently. To begin with, a Big Data Goal needs to be identified.
What does the organization hope to achieve from Big Data?
According to John D*, a senior data scientist at a Fortune 100 technology company, “companies are often tempted to ask questions just based on the data that’s available. Instead, they need to understand and frame their business strategy questions first—and then gather the data and perform the analysis that answers it.”
“Companies need to understand and frame the business strategy questions first—then gather the data and perform the analysis that answers it. It is often tempting to ask questions just based on the data that’s available”
The company he represents uses large scale datasets to understand its customers better, so as to delight them with their products and services.
‘Enhanced Customer Experience’ is the prime reason companies turn to Big Data Analytics. A study done by Gartner found it to be the Number 1 reason, followed by ‘Process Efficiency’.
summary
Both these require a blending of internal data with external sources. For enhanced consumer experience, for example, companies should be looking at geo-location data, transaction history, CRM data, credit scores, customer service logs and html data. For process efficiency, let’s say fraud detection for example, companies should be using transactional history, CRM data, credit scores, browsing history.
big-data
Once you start blending data sources, new challenges arise. First is the obvious challenge of increased amounts of data that needs to now be stored and cleaned. Second, a lot of the data is now unstructured and you need to be able to convert it into structured data, in order to analyze it and derive Insights. And because of the velocity of data being generated, you need to be able to do the conversion near real-time. Finally, we are no longer talking about just textual or numerical data but also videos.

Charting the IoT Opportunity

As the Internet of Things (IoT) gains momentum, it’s apparent that it will force change in nearly every industry, much like the Internet did. The trend will also cause a fundamental shift in consumer behavior and expectations, as did the Internet. And just like the Internet, the IoT is going to put a lot of companies out of business.
Despite these similarities, however, the IoT is really nothing like the Internet. It’s far more complex and challenging.

Lack of Standardization

Unlike the Internet, where the increased need for speed and memory was addressed as a by-product of the devices themselves, the sensors and devices connecting to the IoT network have, for the most part, in adequate processing or memory. Furthermore, no standard exists for communication and inter-operability between these millions of devices. Samsung, Intel, Dell andother hardware manufacturers have set up a consortium to address this issue. Another equally powerful consortium formed by Haier, Panasonic, Qualcomm and others aims to do the exactsame thing. This has raised concerns that each of these groups will engage in a battle to push their standard, resulting in no single solution.

New Communication Frontier

The Internet was designed for machine to human interactions. The IoT, on the other hand, is intended for machine-to-machine communications, which is very different in nature. The network must be able to support diverse equipment and sensors that are trying to connect simultaneously, and also manage the flow of large quantities of incredibly diverse data…all at very low costs. To meet these requirements, a completely new ecosystem —independent of the Internet— must evolve.

Data Privacy

The IoT also raises serious challenges for data security and privacy. Justified consumer concerns will call for stricter privacy standards and demand a greater role in determining what data they will share. These aren’t the only security issues likely to arise. In order for a complete IoT ecosystem to emerge, multiple players must use data from connected devices—but who owns the data? Is it the initial device that emits it, or the service provider that transports that information, or the company that uses it to provide the consumer better service offerings?

Geographic Challenges

For multinational organizations with data coming from various regions around the globe, things get even more complicated. Different countries have different data privacy laws. China and many parts of the EU, for example, will not let companies take data about their citizens out of their borders. This will result in the emergence of data lakes. To enable business decisions, companies must be able to access data within various geographies, run their analysis locally and disseminate the insights back to their headquarters…all in real-time and at low costs.

Tapping the IoT

In spite of all these challenges, the IoT is not something companies can afford to keep at arm’slength. Like the Internet, it will empower consumers with more data and insights than ever before, and they in turn will force companies to change the way they do business. From an analytics perspective, it’s very exciting. Companies will now have access to quality data that, if they combine it with other sources of information, can provide them with immense opportunities to stay relevant.
As an example, let’s look at the medical equipment industry. Typically these companies determine what equipment to sell based on parameters like number of beds and whether the facility is in a developing or developed market. However, these and other metrics are a poor substitute for evaluating need based on actual use. A small hospital in a developing country, for example, will diagnose and treat a much wider range of diseases than a similar facility in a more developed region. By equipping the machines with sensors, these manufacturers can obtain a better understanding of what is occurring within each facility and optimize selling decisions more effectively as a result.
This is just one example to underscore the tremendous potential that the IoT holds for businesses. In order to truly realize these and other opportunities, companies must understand the challenges outlined above and have a framework in place to address them. In the early days of the Internet, few could have predicted its transformative impact on all facets of our lives—personal and professional. As the IoT heads into its next phase of maturity, we can expect tosee a similar effect emerge

Marketing Mix Modelling: Challenges and Best Practices

The optimal allocation of funds across different channels of marketing is crucial for all organizations since investment decisions need to be made depending on the contribution each channel makes to the overall sales. Marketing Mix Modelling (MMM) helps quantify the contribution of various factors to sales and recommends fund allocation across multiple channels in order to achieve better ROI, efficiency and effectiveness. MMM is an analytical approach which is widely adopted across industries today to measure and optimize marketing budgets. While MMM has proved to be an effective technique to allocate funds more analytically, its implementation is key to achieve optimum results.
Key Challenges:
  1. The data needs to be understood thoroughly and delinked from mixed effects of any overlapping campaigns
  2. Validation of coefficients with borderline significance is important to maintain stability and consistency of new data before implementation
  3. Irregular market segments with thin and discrete history is a serious challenge for modelling and prediction. Such markets are dealt by ‘Proxy Modelling’ using higher levels of data and predictions, which are levelled by their proportional representation in the portfolio
  4. During implementation, the prediction and optimized allocation is made for all market segments by default without considering their real-time demands. If needed, depending on the marketing plans and priorities, budgeting and allocation has to be regulated as per the prevailing business or forecasting scenarios
  5. Thin market segments with irregular history may not appropriately fit for building ‘S curves’ to reflect the sensitive cost-revenue relationship; such market segments can be predicted by grouping them based on business considerations
Best Practices
  1. For superior insights, the objectives of MMM and what it plans to achieve should be clearly set by:
    1. Identifying drivers of revenue and quantifying impact
    2. Optimizing spend across different marketing channels for maximum return
    3. Time-series forecasting for future plan of action
  2. Every touchpoint in the customer journey should be defined, tracked and measured for proper accounting of cost and revenue components by marketing levels such as geography, channel etc.
  3. Revenue regressed on cost or raw variables (clicks, impressions) by channel should be accounted and available at the same granular level (either through derivation or already set up by the company.)  Data should be set at the same level – especially the cost variables since they are available at higher levels and have to be broken down to the lowest granular level on which the model is built
  4. It’s important to check key variables for both statistical and business significance
  5. Building an ‘S curve’ (sigmoid shape) to plot the growth rate of revenue as a function of cost in percentile scaling will help determine the ‘Spend Limits.’ Fitting of ‘S Curves’ to data should be done by tuning the shape, scale parameters of a chosen distributional form with respect to the empirical distribution of cost and revenue
  6. ‘Optimal Point’ should be discovered where revenue growth rate is maximized for a given cost
  7. Cost allocation by the channels that maximize the overall revenue should be optimized
  8. Test and control markets should be compared and then the feedback can be used to refine the model performance
Common Mistakes:
  1. To prevent incorrect results, disproportionate values and volatile distribution of data should be checked, trimmed and transformed
  2. Missing data should be dealt with before modelling, else it could lead to inefficient results
  3. Do not choose incorrect transformation for data in order to ensure the linearity and stability of the variables
  4. To avoid wrong attribution to marketing promotions, time-series data should be converted into a cross-sectional form before building the models by accounting and adjusting for seasonality and auto correlations in the data. If needed, models should be built on de-seasonalized and stationary data
  5. Data must be aggregated and summarized at requisite time intervals to correct the data imbalance like missing revenue to a cost point or vice-versa
  6. Spend limits are acceptable up to the saturation point in an ‘S curve.’ Promotional costs should be planned in a range between the discovered minimum and saturation points to avoid losses. Similarly, a minimum spend threshold should be maintained for stable markets
Since the stakes are high for brand building, following the best practices while implementing the model and taking care of the challenges that come along the way can provide high ROI and improve marketing decisions extensively. An MMM model can provide a consistent and more accurate set of metrics, which will help marketers influence the overall consumer journey.