Introduction
This is a Women’s Clothing E-Commerce dataset revolving around the reviews written by customers.
It's nine supportive features offer a great environment to parse out the text through its multiple dimensions.
In this research, we attempt to analyze the customer reviews on women clothing e-commerce by employing
statistical analysis and sentiment classification. We first analyze the non-text review features
(e.g. Age groups that are likely to recommend the clothing line. , The probability of a clothing
line with high ratings getting recommended class of dress purchased, etc.) found
in the dataset, as an attempt to unravel any connection between them and customer
recommendation on the product. Then, we implement a random forest model for classifying
whether a review text recommends the purchased product or not.
Description
This dataset includes 23486 rows and 10 feature variables. Each row
corresponds to a customer review, and includes the variables:
● Clothing ID: Integer Categorical variable that refers to the specific piece
being reviewed.
● Age: Positive Integer variable of the reviewers age.
● Title: String variable for the title of the review.
● Review Text: String variable for the review body.
● Rating: Positive Ordinal Integer variable for the product score granted by
the customer from 1 Worst, to 5 Best.
● Recommended IND: Binary variable stating where the customer
recommends the product where 1 is recommended, 0 is not
recommended.
● Positive Feedback Count: Positive Integer documenting the number of
other customers who found this review positive.
● Division Name: Categorical name of the product high level division.
Graph Evaluation
(i) Youngsters aged 24-66 are more interested in providing the review.
(ii) After 50, the review has constantly declined with the age
(iii) From the matplot above, the number of people fully satisfied (5 star), is
nearly the combined sum of people from 1 star to 4 star.
(iv) Most of the customers are satisfied
(v) There isn’t a significant difference in the box-plots across various age groups.
(vi) Basically, all the age groups are satisfied to the same extent.
(vii) In the first one, intimate division has a very high probability of getting
recommended. General ones are least recommended.
(viii) Similarly in the second one, Bottoms are the first recommendation of people
followed by Tops. Trend ones are highly unlikely.
(ix) In the third one, Lounge and knits are top choices and highly likely to be
recommended. Jeans, Legwear, Outerwear, Shorts and Layering are among the
least recommended.
Research Provided by Andrey Fateev
Comments