Uplift model for a Voter Persuasion Case


Introduction and Business Understanding

We will use cluster analysis to analyze and better understand the voters by using "" file and interrupting the clusters of voters by using k-mean clustering analyses aids in understanding the data sets. We should give the reasoning behind the conducting of data information and the Pros and Cons of the multitude of covariates. By using data sets from the "" file, we can implement an Uplift Model. The file descriptions are listed within the columns, and the validation column is already displayed in the data set. This research's primary goal is to determine who should be notified regarding the voters' persuasion and describe how to data sets were merged.

Data Understanding

We have implemented a model called Uplift, referencing voter persuasion using data sets within the Each description is listed in the columns, validation column already in the set. To precisely who should be notified regarding the persuasion of the voters involve a multitude of tactics.

Two types of data go to uplift model: demographic data and Design of experiment data (DOE). Predict the outcome if somebody moved from not voting to vote by using covariates and treatment, which is targeting (flyers) in our case. Use this model to score the response again for reversed treatment levels. Our treatment level was one, and then we set treatment to 0 and vice versa and computing for each individual as the difference in probabilities of success for a different treatment. Uplift is a difference between the probabilities if you get a flyer rather than you did not get a brochure.

The more considerable Uplift, the more likely a person would respond to the flyer because the probability of responding to vote is much higher if that person will receive the brochure rather than not receive the booklet.

We checked the data set on outliers and missing values and changed some variables from continuous to nominal; we need 0 and 1. with 26 excluded rows. There are 24 variables used to collect data sets during the analyses of potential voters persuaded during an election campaign, using cluster analysis k-means. Appendix A. These values tell us a lot about people in the group. For instance, "femaleoric" cluster 3 is very different from other clusters; we have to pay attention to the models.

A binary response variable was determined during random analyses with data sets. "Moved_ad" was implemented to display how it would be affected in the "flyer" column. The Design of the experiment (DOE) validation column was created to determine the effect of the treatment on the outcome of "Moved_ad." To determine which groups of voters should be targeted to receive treatment – targeting (flyer). The value order was completed for "move_ad" and "flyer" by calculating the (1) instead of (0). The analysis of data sets determines precisely which group of voters during an election are to be targeted for persuasion receiving the treatment. Appendix B. Of those who didn't get a flyer, 34.38%, and those who got a flyer, 40.21% of people responded to that flyer, meaning the brochure had an effect, around 5% increase the voting, from the Design of the experiment (DOE). It determined the variables that are effective on both treatment and response for an outcome.

But we want to know more than just an overall increase by using the flyer, and we want to know who should we send the brochure to the large population, beyond the thousands of selected, rather than just sending every person – voter, because it will be costly, so we will spend money only on those who can be persuaded. And that is an Uplift model.


The uplift model is used to identify four groups of people that can be persuaded. 1st group of people that will respond yes regardless of treatment – "sure thing," 2nd group of people that are a no irrespective of the treatment- "lost cause," 3rd group of people that are easily persuaded – "persuaded" and the last 4th group of people that answering no because they will not change their minds no matter the cause – "do not disturb," also known as "sleeping dogs."

For the Uplift model, we join tables by using Original Data Set along with the Design of experiment (DOE) using voter id in the "join tables," and data was implemented using the "consumer research platform" in JMP. The model was run most notably using the 24 variables from the original data sets. The "Move_ad" was the response variable, and the "flyer" was the treatment column within the randomized design experiment. Computing Uplift model Appendix C. The Decision Tree 12 splits with an R-square of 0.234 for the validation set, meaning that independent variables explain the 23% validation within the model.

Appendix D. Uplift graph tells us that if we sort population according to the Uplift, we get the difference in probabilities, with the highest point around 0.35. On the chart, when it becomes negative, it is a sign that you don't want to contact those people. We want to stop a targeting campaign at 0.12 x-axis (sorted portion) as we have fewer resources, but it depends on criteria and the budget of the campaign. Tabulation was used to tabulate a summary of statistics, as well as for quartiles. The mean difference was derived at 0.067 along with a more standard deviation of 0.068. Uplift models are used in election campaigns to allocate resources most likely to respond positively to the treatments. Here, a person persuaded to vote for a specific candidate will have a one, but another person who received a flyer will be a 1.

The difference in the formulas for the Uplift Model results was estimated. Once a new column was implemented, we can determine what percentage of the population may be persuaded by the results exhibited. Approximately 73% of the voters were convinced, along with being labeled "sure thing." These individuals received a flyer along with voting for candidates.

According to the decision tree within the uplift model, an individual will not receive treatment (flyer) who has a variable of party_r=0, NH_white<74, age<69, party_r=1, ed_4col<44, f1<1, vg>=1 and hh_ni>=3. The treatment within such groups leads to negative effects creating those individuals as "do not disturb" sleeping dogs. The larger differences will be noted to be perusable due to when reviewing the quartiles, those above quartiles of 90 0.102 differences, these are noted to be considered persuadable. Appendix E.


The Uplift model was used during election campaigns allocating resources for individuals who would be most likely to be persuaded by treatment (flyer). Pros for Uplift is definitely would be revenue and resources utilized. The Cons are biases resulting in incorrect data sets.

Research Provided by Andrey Fateev


Appendix A. Cluster Analysis.

Appendix B. Flyer effect.

AppendixC. Uplift Model.

Appendix D. Uplift Graph.

Appendix E. Summary Statistics.