The Data Scientists I've met

Sunday, Jul 15, 2018| Tags: data science

As an organiser of PyDataMCR I meet a lot of data types. I keep meeting the same Data Scientists, and I keep giving the same advice. Over the last year and the half i’ve begun to hear stories that echo one another. Industry has unrealistic expectations for Machine Learning in their business. The Data Scientists have unrealistic expectations too. Data Science is 90% focused on data preparation and software engineering. As the Data Scientist you are taking on a business transformation role.

So far I’ve met the same 5 Data Scientist, and in order of value to a business are as follows:

• The Aspirant

• The Analyst

• The Researcher

• The Internal Consultant

• The Dream

The Aspirant

An individual with a hobbyist or light academic interest in Machine Learning. There is no industry experience in adding value to a business.

These people will have done a self-directed course in Machine Learning. There will be little focus on software engineering and project management. University courses or masters degrees don’t fare much better. Theory has its place, but can be learnt on the job. I have yet to meet an employer impressed by a Masters in Machine Learning.

The Aspirant may take part in Machine Learning challenges. These projects give you data on a silver platter. A large part of Data Science is discovering context and preparing the data.

What should The Aspirant do next?

Aspiration and ambition are essential in Data Science. We are building the impossible products of yesteryear. Keep this energy, if you stop learning new tools you fall behind. Self-Learning is essential in software engineering and the same applies to data roles.

My recommendation to you is find any industry experience you can to get a taste of the real world of Data Science. It will be hard to find a Data Science job, so try something Data Science adjacent. A Web Analyst role is a suitable entry point. Even with this title you may be the most advanced Data Scientist in the company.

The Analyst

This person is a Data Analyst at a large company. You are an expert on your businesses data sources and context. The best Data Scientist will still have to relearn what you know. Your insight is vital to most end-to-end data products at your company. You generate a weekly report, and people no longer pay much attention. You feel concerned people ignore these reports.

I see a desire to grow from most Analysts. You may know a querying language and a programming language. You may feel like it’s time to learn a programming language. You want more from your role. Data Science and Machine Learning is exciting to you and your company if you could figure out where it fits.

A note on Analysts

Data Analytics and Data Visualisation are foundational in Data Science. Industry expects good data visualisation. Industry dreams of Machine Learning. My advice to you is start calling yourself a Data Scientist and one day you’ll believe it. I recently met a boss who told a subordinate he was not allowed to call himself a Data Scientist. I met a team of analysts with an overnight title change to “Data Scientist” for all, despite no skill change. My point here is companies are being fluffy with the term, so please feel free to do so as well. Come up with your own definition and justification and stick to it!

What should The Analyst do next?

Learn a programming language (I recommend Python) and SQL. If you have done this then the next step is some small scale projects. You may be the closest thing your company has to a Data Scientist now so let’s leverage that.

Some example projects:

• Report Automation - if you generate something every week that’s a drain on your time. Get that time back using a library like PyDoc (or alternative).

• Forecasting - Libraries like FBProphet make advanced forecasting accessible beyond the basics. Introduce concepts like seasonality to your company. Set a presentation date and do something cool.

• Sentiment Analysis - Libraries like NLTK let you gleam positivity scores from text. Analyse your social media perception or call centre feedback. Could you plot public perception over time? Did it correlate with the money you made?

The Researcher

You have a PhD, your team has PhDs, and you prefer to hire people with PhDs. Your core product is data based, or has high potential from Machine Learning. Your academic prowess shines through on long-term projects. Projects culminate in the publishing of a white paper. They may be valuable, but not actionable in the short term and any interest fades to apathy.

You lack the tools to integrate your ideas into your product. These teams often “throw work over the wall” to a separate team of software engineers who build a product.

What should a researcher do next?

You are an academic, learning new things is your best skill. Learn Software Engineering and take ownership of your product. Other teams will be keen to teach if you are keen to learn. This approach transforms you from a researcher into a Data Scientist. You now own your end-to-end product.

The Internal Consultant

The competition has a Data Science team. Your company decides to not fall behind. Your company tried to hire a rock star capable of all aspects of Data Science. your company most likely failed. Machine Learning was the main skill tested in your interview. If you were the first hired Data Scientist your main job is now digital transformation. It was never the companies intention that you would be self-managing, but it’s your best asset. You success is dependent on your agency. You will get blocked a lot. You will have to have many side projects for the slow days.

You know the value of good software engineering. Processes and tools used by IT look appetising and if you’re lucky you’ll use them too. Despite the similarity IT will deny you resources and business priority.

You will thrive on your ability to recognise the UX problem of Data Science. If your work isn’t actionable and accessible the business will forget. A focus on low-hanging fruit will show the business what Data Science is capable of. You may get a chance to deliver a single large project of immense value. This kind of work sets you up for consulting next.

What should The Internal Consultant do next?

Ask if you’re providing value, if you are , prove it.

The Dream

I am calling this one “The Dream” because of the unrealistic expectations from both sides. Aspirational candidates love Machine Learning almost as much as the company does. but you both don’t understand how difficult the journey of Data Science is going to be.

The following skills to be essential to a Data Science team (I say team and not individual). I do not expect any single Data Scientist to have all these skills, but if you do well done!

The Dream Data Scientist is a:

• Data Analyst– You analyse and visualise data. You communicate it to the larger business. You have a good handle on statistics so you have confidence in your results.

• Data Engineer - You desire quality data. You can create good data from unmaintained sources once given context. You discover data sources and dictionaries and document them. You build data pipelines and generate long-term feature stores. You empower the businesses accessibility to data.

• Project Manager - The business expects you to manage your own projects. You gather requirements and take the lead. The burden of thought sits on you. You must discover your own stakeholders and manage their expectations. You manage your workflow alongside delivering the product.

• Software Engineer - You are an expert in many aspects of software development. Your code is high calibre. Your work is the backbone of a business and so must be tested. The business expects you to create data dashboards. You manage your own deployment pipelines. You publish and document APIs for business consumption and beyond. This alone is a lot to expect of one person.

• Machine Learning Engineer - The business expects you to understand modern Machine Learning. This is the most important aspect of your skillset in the interview. You won’t be using it much once work begins.

Our current goal is to grow towards “The Dream” Data Scientist. I posit this is an unsustainable symptom of the current industry expectations. I’ve seen this problem before with the role of “Web Masters” and recently with “Full Stack Developer”. Trying to hire all these roles at once, and in a single individual, is setting us up for failure. As an individual specialise in a few of the above, not all. If you are hiring a Data Scientist, hire a mix that share the above skills between them and work well together.

We are always happy to have a chat over coffee.