I’ve been working as a data scientist in the private sector (in a bank) for the past three years, one year as the leader of a chapter of 15 data scientists, and it’s been quite a journey. I’d like to share some thoughts on what it’s like to be a data scientist and in particular what it’s like to work as one in a bank.
Data science is a buzzword nowadays, and critics say that the term “data science” is just fashionable rebranding of statistics. I’d rather say that it is statistics on steroids mixed with software engineering.
At its core, it’s about using machine learning (or AI, if you will) to improve and accelerate various processes (whether it is protein folding for better cancer treatment, translating a foreign language without a human translator in sight, or better user experience in your mobile banking app).
Well, AI sounds cool and stuff, but isn’t working in a bank boring as hell?
Funny that you ask. (I should note the questions were written by me, which is why they are so insightful).
I did not really understand what a bank was about until I started working in one. It seemed like a bank was just a bunch of bankers sitting in nice offices, advising you on how to invest or telling you what interest rate you’ll get on a loan.
In reality, a modern bank is more like a tech company. It develops mobile and web apps backed up by a complex software ecosystem for all the stuff that’s happening in the background (money transfers, card payments, investment, credit scoring, communication with the client via email, phone, chatbots etc.). At my bank, we have almost 500 distinct computer systems.
Clearly but somewhat surprisingly, a bank is also about hardware and logistics. Physical cards and card payment terminals, ATMs and cash logistics optimization (not only is it a security issue, but leaving too much cash inside of ATMs incurs unnecessary costs, while leaving too little means angry clients who can’t withdraw cash). Placement and equipment of branches, physical security, smart cameras…
What does this have to do with data science?
Whenever there is a process, there is a way to improve or optimize it with data or machine learning. What I really enjoy about this job is that if you are proactive, you can work on a wide range of projects, often with external vendors that can teach you a lot about a specific field.
In just three years, I’ve already had the opportunity to work on projects such as predicting future client spending behaviour, automatic evaluation of candidates’ CVs, text categorization of client feedback, expected vehicle insurance claims (we sell insurance, too), chatbots, marketing optimization, risk management, economic research, transaction labelling, pseudo-social networks, tracing of potential high-risk coronavirus contacts from card payment data, and so on.
As a data scientist, it is crucial that you understand the data that you use. The thing is that a lot of real-life activity is mirrored in banking data―from people buying new homes to how manufacturing supplier chains work. Being a data scientist in a large bank teaches you a lot about life in practical terms.
All right, maybe it’s not that boring, but what do data scientists actually do?
As the term suggests, we spend a lot of time working with data. Data scientists are usually skilled at writing complex analytical SQL queries (SQL is a language for database querying) to gain basic statistical insights about various entities such as clients, website visitors, employees etc.
Courses teaching data science usually give you ready-made datasets which you use to train machine learning models, but in real life, data tend to be hard to find, contain errors, and need a lot of preprocessing. We spend a lot of time making sure that we always have up-to-date clean data that can be used by the models.
In the modern world, lots of data come in an unstructured form, such as text written by a human. Being able to manipulate unstructured data is an important part of our job, ranging from simple programming techniques such as regular expressions to training an artificial neuron network transforming the text into a structured mathematical form.
So you just prepare data for “AI” to understand?
No, we make the “AI” too. The difficult thing is to train and use a model in such a way that it solves a real-life problem, and here’s where the huge overlap with the field of statistics comes into the picture.
Sometimes what matters is probability (e.g. of default on a loan). Sometimes it’s a category (e.g. of text feedback). It could be a number (“how many times will you log-in today?”), or the order of something (“do we need to solve this issue first and that one second?”).
Decision-making with constraints is a big deal (“what’s the best action given the limited number of resources we have?”) and leads to completely different mathematical techniques.
Looking for similarities (“which clients are similar to one another?”) and anomalies (“is this card payment weird enough that we should block it and call the client?”) answers some of the fundamental questions a bank can have.
There are sophisticated machine learning algorithms for each of those tasks and dozens of statistical metrics to evaluate the results. No one is going to tell you which one to use. You have to develop deep understanding of both the underlying mathematics and the process you are trying to improve to choose the right model, and the devil is in the detail (it’s easy to screw up and develop a model that seems good on paper but turns out to be useless in practice).
So, that’s it? That’s what data scientists do?
Yeah, that’s pretty much it. There’s also a lot to talk about in terms of IT infrastructure (cloud and big data technology, data pipelines deployment, model repositories, API integration and stuff like that), but that’s quite technical and not every data scientist needs to be in the detail (it’s OK to let a few specialists take care of that).