ML in CX Industry — CSAT & DSAT
I worked for a service based company as a lead data scientist, where one of the complex problem I worked & designed was related to CX. In CX industry there are many metrics which the company want to maximize CSAT%, NPS, etc. These metrices tells how happy companies existing customers are with their product or services. Also, it gives more intrinsic insights about the performance of the company. Example : Budget allocation on retaining a customer vs acquiring new customer, promote segment of product, key performance indicators to improve, etc. In this article, I will tell you my experience or understanding of the data and it’s underlying issues. Knowing your data and it’s issues and implication is the first basic step towards making your ML model more deployable.
Background of any ML model is that it is an optimizing equation problem, in which it finds the correct parameters to equate itself to correct label or output. To reach to the correct model parameters or weights it optimizes the loss with respect to parameters involved in the ML equation. Keeping this in mind, it is easy to reach the optimized equation when you have diversity in patterns but not with very high variations. But when we talk about data which itself and its outcome also is driven by human sentiment, the patterns in the data are bound to be of high diversity with very high variance. To create any machine learning model and keeping its performance intact of such data is a complexed. Just to give headsup, in this i won’t share much on the model approaches. I will create a separate article, as it has lot more depth in the topic.
There are many important metrics which companies in CX industry focuses on but one of them is Customer Satisfaction & Dis-satisfaction , often termed as CSAT & DSAT. This metrics explains how satisfied the given customer is by your service or product on an interval range. Usually the interval range is either between 1 to 5 or 1 to 10, where as you move towards right on a number scale, customer satisfaction increases. This article will focus on this use case and build up from here.
Let say you ordered the food from an food aggregator, your food delivered was cold. What do you do? Some, to avoid the hassle will put it in microwave and eat, rest will raise the ticket with food aggregator’s customer service. They will chat or have telephonic conversation with their representative. In the backend the system is maintaining the logs of their conversation. The representative provide a solution and at the end it send you a feedback where you have to provide a feedback and rating for their service. So, hence the data which we get apart from customer demographic data are conversation logs and their rating + feedback.
These ratings are very important for company, because to acquire a new customer the cost involved will always be higher than cost involved to retain a customer. Usually the customer who tend to churn or leave are the ones who are dis-satisfied. There are other type of churners also like price-driven churners, etc. So company would like to retain than acquire.
Of 100% customers feedback + rating there lies issues in approximately 70% customers feedback + rating (driven by the CX in which Industry). The issue is many times the customers just to avoid or to move ahead :
- Give random ratings without feedback
- Give random ratings with feedback
By word “random” it means choosing any rating option at random, which might not justify the feedback(Feedback gives different customer sentiment and rating is on the contrary). Or, just typing random words or phrases(usually of very small length) and random rating. This feedback and rating is also driven by the customers current emotional state, work state — is he/she currently free or is involved in something. Lot many variables play and its permutation and combination plays in it. Such cases are noise. But to identify such cases are hard and usually is not a straight process to identify. There are multiple stages — sometimes requires building cascading models, i.e building multiple classifiers for different purpose and holistically arriving at outcome or can do rules automation. In brief these stages can be :
- Identifying holistic sentiment / emotion of the customer. If at any event customer showed negative sentiment , was that sentiment related to something specific service/product or was it just general. So it involves sentiment identification + aspect mining
- How much participative the customer was during the interaction. You can identify it in general by the customers response rate + silence time. There are other metrics also involved.
- Customer rushing at the closing time of the conversation.
- Take feedback length in consideration.
- Match sentiment of feedback with given rating
These are the few cascading automation pointers which you can use to identify random cases or using it can create intent identification model. Using the the outcome you will identify random cases. Usually we do label correction wherever there is scope or we remove such cases from the analysis. This was just the tip of the complexity.
To derive CSAT & DSAT just from customer chat relay text is very highly complexed. Because the text involves patterns with high diversity but also with high variation and also contradictions. Contradictions in itself is suggestive that with same or similar to & fro text there can be instances of both CSAT customers and also DSAT customers. Also, it is quite logical to have contradictions, because two same lines can have different sentiment & intent in general. Two same phrases can be said quite differently or have different intent behind it, for example, same phrase can be said normally and with sarcasm too. This is suggestive of highly overlap in the given data. Such factors which explains this contradictions to some extent, like sarcasm, intent, etc. need to be identified and should be used as a feature. In order to reduce this high variation, one can reduce the dimensionality(do not use PCA, it is not good with non linear data. Textual data is highly non-linear data in general). Reducing the dimensionality doesn’t only help controlling un-explained high variation, but also solves the issue of curse of dimensionality and sparse representation. But CSAT&DSAT is not only explained by text, there are different features which need to be derived from the given log of data, which will in turn explain these in-explained variation to large extent. The idea is to derive the features which can explain the variation within data quantitatively. Example of un-explained variation: there can be multiple reasons of customer getting irritated during the conversation – like higher hold time, higher agent silence time, solution not as per customers expectations, etc. These features generally belongs to below mentioned classes :
- Behavioral — Agent empathy, communication, customer intent, customer sentiment
- Time — customer silence time, agent silence time, duration of ticket, time to reach solution
- Performance — Ping Pong effect, Did agent provide right solution
- Problem Area, etc. — Customers issue category
- Demographic
Despite engineering features which explains the data variation quantitatively, still there lies the issue of class overlapping. One of the way to minimize it is by identifying different clusters. For N different cluster, create N different model. Another way is to identify overlapped regions and segregate those from the data and create two different models. There are many other approaches to solve this. I will in future write a separate article over it.
Class overlap occurs when instances of more than one class share a common region in the data space. These instances have similarities in feature values although they belong to different classes
In this industry, usually the data will be very highly imbalanced, having 90:10 proportion for company which is having descent positive sentiment in the market & 95:5 for company which has high positive sentiment in the market. This skewness effects the quality of model. There are many ways to handle it — cost/penalty based methods, sampling methods, weightage methods, etc.
Other issue is the data statistics changes very frequently, i.e, data drift occurs very frequently. In my experience, I have seen monthly recency in data drift. Out of many, there are two major contributors — a) Customer Emotion/Intent (as it is never constant. It is dependent on many factors or can be identified by — type of market it belong to, for eg: India is price driven market, price of product, net worth of customer, education, which region/area customer lives, type of product, etc) b) CX agent performance standards and metrices are changed frequently by the company to improve customer service. Hence monitoring drift is very imperative and taking related measures also. Measures are dependent on type of drift it is. In this industry, mostly there lies a cyclical pattern of drift.
Lastly in the data there are segment of customers who are termed as swayers. It means their given rating can be considered as dsat and csat. Usually it happens with the middle rating, i.e if the rating scale is 1 to 5 , those who give rating as 3 they are swayers . If you are treating as multinomial model, still keeping it is okay ( I still advice to omit such instances from training set) and build one sperate classifier for these neutral cases, but if you are treating it as binary model, then omit such instances.
Since, retaining customer is having higher importance over customer acquisition, it is imperative for correctly identifying an actual dis-satisfied customer as dis-satisfied. Hence choosing a right metrics is also very imperative, because for both False Positive or False Negative, company has to incur a cost. But, in this industry and problem statement, the company will incur higher cost if it is a False Negative (It is dependent on the objective and Nature of the Industry/Organization). Your metrics can also be a drift indicator.
It might sound easy, I just have laid down the road-map of tip of the ice-berg issues within the article, rest is for you to explore. Lastly, to model it there can be different approaches :
- Convert it in binary classification model (Many is to One : N rows represented by 1 target label) : usually solved using sequence to sequence models
- Convert it in binary classification model (One is to One : 1 row represented by 1 target label)
- Multinomial classification model (Many is to One : N rows represented by 1 target label)
- Multinomial classification model (One is to One : 1 row represented by 1 target label)
- Treat this as an anomaly detection , where DSAT is anomaly
- Use unsupervised methodology
I created a cascading sliding window model while converting the target variable into binary. It was ensemble of both Many is to One and One is to One. The architecture of model consisted of different models for different uses.
This article is not to outlay road map for those working only on similar or same project, but it is to explain holistically different data have different issues associated and different approaches . Identifying the data issues and data quality issues and nature of data is very imperative before getting deep in solving the given objective. Quality of solution is dependent on quality of data and quality of data is dependent on quality of data understanding. Just like before buying a car you take feel of the car and then decide on which one to buy, similarly before getting in depth of any analysis or modeling , get a very good in depth feel of the data and it’s issues associated. Many ML models do not go in production because of datas’ inherent issues & properties.
Hope this made sense. For any queries or feedback you can reach out at himanshubirla@outlook.in (Kindly put subject as DS Query — <your name>) or you can put down in comment. In the industry, I have over 6 years of experience, I offer my services as Data Scientist as a freelancer. You can connect with me over LinkedIn to discuss over my services for any of your project(s). I also offer mentorship to limited people at a time, no cost (at my free time), for this also can connect with me over LinkedIn or email me at i.himanshubirla@outlook.com with subject as Need Mentorship — Name.