What is Data Science?


Data science is playing an increasingly important role in the 21st century. century in the life of companies.

In fact, data science isn’t about making complicated models, nor about making awesome visualizations, it’s not about coding. Data science is about using data to have the greatest possible impact on your company.

This influence can occur in several ways. It can be in the form of insight, in the form of a data product or in the form of a product recommendation. In this case, there is a need for tools such as Making data displays or complex models and come into view in writing the code as well. Basically, as a data scientist, your job is to use the data to solve real corporate problems, regardless of the tools used.

There are currently many misconceptions circulating about data science, especially on YouTube. This may be because there are huge differences in what is a popular topic of conversation today and what would be needed in the industry. That is why we want to make a few things clear.

Prior to data science, the term data mining was promoted in a 1996 article on data mining and the discovery of knowledge in databases. In this, the term referred to the general process of discovering data and useful information. In 2001, William S. Cleveland wanted to take data mining to a higher level.


He did this by combining computer science and data mining. He has made statistics much more technical, which he believes would expand the possibilities of data mining and be a force for innovation. Now you can take advantage of computing power for statistics. This combo is called data science. It all happened around the time of the advent of web 2.0, where websites were no longer just digital flyers, but platforms to share the common experiences of millions of users.




These were sites like MySpace in 2003, Facebook in 2004, and YouTube in 2005. Nowadays, there are many ways to interact with these sites, such as posting, posting, commenting, uploading and sharing, while leaving digital footprints behind on the web. It has helped create and shape the ecosystem we know and love today. Nowadays, there is so much data that it has become too much to use traditional technologies. This amount of data is called Big Data.

This has opened up a world of possibilities for finding insights using data. But it also meant that the simplest questions required a sophisticated data infrastructure just to support data management. We needed parallel computing like MapReduce, Hadoop, and Spark, so the 2010 growth of Big Data brought about an increase in data science.

So at that time, the data science journal described almost everything about data science that has to do with data collection, analysis, modeling. However, the most important part of this is the application. All kinds of applications, including machine learning. So in 2010, with the new amount of data, it was possible to use a data-centric approach instead of a machine-based approach.



All theoretical analyzes of vector machines supporting repetitive neural networks have become feasible. It’s something that can change the way we live and things in the world. Deep learning is no longer an academic concept in this dissertation. It has become a tangible useful class of machine learning that can have an impact on our daily lives.

So machine learning and AI (artificial intelligence) dominated the media and obscured all other aspects of data science, such as exploratory analysis, experimentation, and skills, traditionally referred to as business intelligence. This is how the general public thought about data science, while researchers focused on machine learning and AI.



The industry employs data scientists as analysts. In this case, there is a slight discrepancy, due to the fact that the majority of these data scientists are likely to be working on more technical problems. Big companies like Google, Facebook, Netflix need to tackle such basic golds to develop their products that they don’t require advanced machines to learn or statistical knowledge to solve these problems. Being a good data scientist doesn’t mean you have such and such advanced models. It’s more about how much impact your work can have. You are not a data fraudster. You are a problem solver. You are a strategist. Companies give you the most ambiguous and toughest problems to solve. And they expect you to lead the company in the right direction.

Okay, let’s end with real-life examples of data science at Silicon Valley. But first, let’s look at the chart below. This is a very useful chart that tells you the needs of data science. This is pretty obvious, but we tend to forget. There’s “collecting” at the bottom of the pyramid, so you obviously need to collect some data in order to use that data.

So data collection that stores the transformation of data engineering efforts is very important, and the big data has really been summarized quite well in the media because of how difficult it is to manage that data. We talked about parallel computing, which covers Hadoop and Spark, and things like that. We know about these. However, the lesser known thing is between the two.



Surprisingly, this is one of the most important things for companies because it is trying to tell the company what to do with the product. What do we mean by that? It is necessary to explain the use of the data, what insights can be revealed from this, what happens to users and then the parameters, this is important to find out what is really happening to the product.

These metrics tell you whether you are successful or not. And then, of course, B-testing is also important. This is the experiment that will allow you to find out which product version is best. So these things are really very important, but the media doesn’t address them. What the media discusses is the top part, the AI ​​and deep learning. When we approach things from a company’s perspective, or from an industry perspective, it’s not really the part that matters most, or at least it’s not the thing that puts in the least effort with the least effort.

That’s why AI and deep learning are at the top of the hierarchy of needs, but testing analytics is actually more important to the industry, which is why we employ a lot of data scientists who deal with that sort of thing. So what does a data scientist do? Exactly what a data scientist does depends on the company and its size. Well, a start-up company lacks financial support, so there is only one data scientist, as there is no money for more staff. That one data scientist has to do everything that falls into this category. Therefore, everything you see may be waiting for the data scientist, it is his job to do them. You may not be dealing with AI or deep learning as a data scientist because this is not a priority at the company at the moment. But you may have to do everything.



You may also need to configure your entire data infrastructure. You may need to enter some software code to add the log, then perform the analysis yourself, create the indicators yourself, and perform an A / B test. Therefore, for start-ups, if they need data scientists, it all comes together, so you have to do everything for you as a data scientist because there is no other person for it. But let’s look at medium-sized companies.

They now have many more financial resources. They can distinguish between data engineers and data scientists. So usually during the collection, software development will be the job of the data engineers, not the data scientist.

As a data scientist, you need to be much more technical. Therefore, people are only hired for this with a PhD or master’s degree because they want to be able to solve more complex problems as well. Let's talk about a big company now. Because the company is much larger, it probably has a lot more money , meaning it can spend more on employees.



In this case, the employee doesn’t have to think about the things they don’t want to do or wouldn’t like to do, and they can focus on the things they’re best at or what they like to do.

Data science, very simply put, is about having a lot of data and trying to extract something smart and useful from it.

Well, it's so abstract yet, I know, so I'll give you a simple example.

You must have seen a smartwatch (or even you have one). These little gadgets can measure how you sleep, how much you walk, how much your heart rate, etc.

In this example, consider the sleep quality of these.

If you see every day that you just slept that day, it means one data point each day . Let’s say your sleep quality was excellent today: you slept for 8 hours with little exercise. Great. I have a data point. The next day you sleep a little worse: you only slept for 7 hours and spun around a lot. This is another data point.

If you collect this for , say, a month , trends will begin to emerge from each data point . For example, that you sleep better and more on weekends than on weekdays. Or, if you go to bed earlier, your sleep quality will be higher. Or occasionally you wake up to something at 2 in the morning. And so on ...

If you collect this for a year , you can do even more complex analyzes. You can analyze when you should go to bed and get up. You see the stressful stages of the year when you worked a lot and slept little. Moreover, you can even “predict” these stressful stages in advance and prepare for them…

Now it's getting closer to data science.



But let's go even deeper.

If you have enough data, you can examine not only trends but also correlations.

You can see, for example, how much you have moved (walking, sports, etc. - these can also be measured by smartwatches) and how it affects your sleep. For example, you find that on the days when you took 5,000 steps, you always slept very well. That's interesting! This is more than an analysis: it is already information that provides a tangible and useful action plan : walk 5,000 steps every day!

But we can go even further…

If your watch manufacturer named out of all the data using clock (for all users that, as such, or you yourself), then he or she really can extract any information which you, as a user once, you can not imagine.

Can 3,000 steps a day really prevent depression in most people? Are people really healthier in some countries than in others? Does the weather really influence major social phenomena? And there are still a lot of interesting questions that these companies already have data on and can research…

Note: Now let’s not talk about the legal and ethical implications of things. Although it is also a rather exciting topic.



As you can see: the more and more detailed data you have in a project, the more complex, interesting and useful analyzes you can make.

Except, of course, that all of this works not only with smartwatches in the personal lives of individuals, but with plenty of other tools in plenty of other areas as well.

First and foremost, of course, business and within that the online world is what data science has conquered the most. But it has also been present in the social sciences for a very long time. And it starts to ring in a lot of other places, e.g.
  • the production
  • the industry
  • the agricultural world
  • the policy
more, more.



I will also write five specific online and business examples below, but before that I would like to say a few words about what subdivisions data science is made up of.

If you want to be a data scientist, you need to know three main areas very well.

Well, that word that everyone hates that makes everyone jump on their stomachs is crushed by water. But read on and let me reassure you: statistics are basically an interesting thing, just bad marketing. (Which is mostly due to the great university education anyway. But believe me, the statistics are interesting.)

To say the least, statistics are what have the mathematical tools to tell you how strong the relationship between the steps taken and sleep quality is in the previous smartwatch example…



Because unfortunately life doesn’t work that I take 3,000 steps and my sleep quality jumps to excellent. Rather, the more steps I take, the greater the chance that I will sleep better. Which is a big difference. Because from here, the question is exactly what the greater chance is. And here come to the fore mathematical concepts like mean, median, standard deviation, correlation analysis, probability calculation, functions, and much, much, much more.

Why need to be able to encode?

The reason is simple. If you want to access your data more seriously, you can only do so with encryption. Of course, there are Excel, Google Analytics, and similar “click-through”, no-coding tools. But these tools have a few common disadvantages.
One is that they are not flexible. (Just one example: it is very difficult to effectively link different data tables in them.)
The other is that there are hardly any more advanced machine learning and predictive analytics solutions in them.
The third is specific to Excel. Namely, that Excel with millions of serial data sets… so to speak, is not in good friendship. Anyone who has worked with Excel knows that over a certain amount of data, it simply crashes from even simpler calculations.

Because of these, the two best-known data science languages come in handy : Python and SQL.

Both are good for something else ául In SQL, for example, you can easily and instantly join tens of millions of serial tables and perform calculations on them. And there are plenty of machine learning and advanced analytics libraries available in Python that can be used for a lot of things: prediction, text analysis, image recognition, setting up automations, building self-learning sticks, and more.



But for both Python and SQL and so on for data science in general: you need to know how to code.

I don't think I need to explain that. You need a basic “business-mindset” to know what a data science project is that is useful and needs to be addressed - and what is not. This strategic thinking appears not only at the project level, but also in plenty of other places throughout the data subject. But I will write about this in more detail in the examples.

I promised to write a few specific business examples as well.



I just brought five of them. I will move from the simpler to the more complex.

The first example should be a classic data project in a classic online store.

Let’s say we have an e-commerce company and we want to put together reports for ourselves. (This is what a lot of companies do anyway - it's another issue that a lot of people don't do very well.)

In such a project, the goal is always to give the leaders and decision makers of a given e-commerce business a clearer view before making a decision. It is the job of the data analyst or data scientist to create analyzes, statements and reports for this.

The data analyst will look and see what has happened in the last week, month, year. What are the trends? What are the changes? What are the typical consumer life paths? What to expect in the future based on past data? And leaders decide based on that.

Suppose we see people buying more and more red socks and less and less yellow t-shirts. Obviously, we are also trying to align our offerings with these changes.

This is the simplest example of using data in the life of a company.



Note: At least that's simple, then of course the devil lives in the details here too - and there are already difficulties in implementation.

The second example is a higher level and more complex data science project.

Stay with the same e-commerce company. Only now should we focus specifically on your advertising spend. Within that, paid Google Ads ads for the sake of example.



Suppose a question comes from management: what should be our Google Ads budget for the next quarter? This is not so easy to shoot. If the budget is too high it is not good because we are overspending and this is why profits start to decline. If the budget is too low , it is not good either , because then we do not spend enough on advertising and the number of sales decreases, so the revenue - which also has a negative effect on the profit.

Here, as a data scientist, we need to figure out what is the optimal budget with which we can bring in the most profits.

For many companies, this can be perfectly packaged by a senior marketing manager based on all sorts of best practices and industry experience, even in Excel. But the field of data science offers an even more precise and accurate solution: various predictive analytical and machine learning algorithms.

What we are doing is essentially giving your computer detailed spending, revenue, website traffic, and all sorts of other data for the past few years. We nicely “teach” a machine learning algorithm to it. Then, using the mathematical model thus obtained, we make a needle-point prediction and an optimal spending offer for the future.

The good thing about this is not only that it is more accurate than the man-made version, but that it is even more scalable. If e.g. we complicate things and come up with 6-8-10 more new marketing channels to spend on, one person may no longer see it 100%. For a machine learning algorithm, however, these are just a few more variables in the formula.



Obviously, I’m not saying here that every company should base their advertising spending plans on data science… But there’s a company size and complexity where that already pays off a lot.

My third example is a data project from a food distribution company.

However, food is mostly a not very durable consumer item. I specifically heard this from an acquaintance of mine who works for a more well-known grocery chain as a data scientist: it is a huge challenge for FMCG companies to be able to tell exactly how much they need to order from a given product in the next month (s).

Here again, the dilemma is similar to the previous example: if too many products are ordered, they deteriorate in the store. If there are too few, then the shelves are empty and customers are angry. (In fact, they might even go over to the store next door.)

In both cases, they have a loss, so here too, the perfect balance must be found.

The solution to this problem can also be just a nicely tuned predictive analytical model that “predicts” (actually makes and uses mathematical models) based on past data and various parameters how much consumption is expected in a given future period. And the order can already be tailored to this.

There is also a concrete case study for this that has been implemented and published, from Walmart in the US, which has been replenishing its stocks on a data basis for quite some time… Their algorithms have been very polished over the years. In addition, they are taking in too many sales data from external sources in addition to extra data.

They were perhaps the first in their own industry to compare weather data with sales of their various products. And they immediately saw that if a big storm came, more umbrellas and flashlights would have to be installed because that’s what customers would bring. Of course, this is not so rough in itself. if you know bad weather is coming.

This is less trivial. But even if it is, in a store that sells millions of products, in thousands of stores around the world : there is no expert who can deliver these forecasts continuously. However, without a well-adjusted data science and solving any further.



Note: And it’s no secret that these things have thrown quite a bit at Walmart’s business results as well.

My fourth example, which I am sure you know, is the Youtube referral system .

Whenever you watch a video, Youtube offers the next one in the top right corner. It is also fully automatic and data-driven. What they use here is called collaborative filtering .
They’ll see what videos you’ve watched so far.
Then they look at other users who have watched the same videos as you that they are still watching.
And you’re already getting the next video recommendation that other people like you have been watching after the same video you’re watching right now.

It sounds simple. The implementation, however, is much more complex.



But it’s also a great example of how data science can help a business better retain its viewers and listeners… I’ll add almost the same principle used by most media platforms you know: Spotify , Netflix , some news portals, and many others.

They make robots that are now astonishingly skillful and intelligent. And much of their knowledge is due to machine learning and data science. Image recognition and voice recognition algorithms, self-learning modules and many more advanced data science concepts enliven and develop these machines.

And of course, it’s not the only company experimenting with such forward-looking technologies. They are developing on a similar basis self-driving cars, robots capable of mimicking the human voice and much more.



These were just five quick examples, but as you can imagine there are still approx. another ten thousand.

However, there is so much room for this very introductory article.

If you are even more interested in the topic and want to go to a higher level, I recommend my new online course in Hungarian, Introduction to Data Science . This is a video with 1 hour and 38 minutes of concentrated data science knowledge in it:


I present a specific data science case study, step by step - from data collection, through data analysis, to decision making.
I’ll clarify some of the often misunderstood basic concepts, such as Machine Learning, Deep Learning, Big Data, or just very trendy Artificial Intelligence lately.
I also show 14 typical data science projects, in which I also explain in more detail how different types of machine learning projects are structured.


I hope in this article I have managed to show well what data science is . It is difficult to paint the whole picture on such a large scale, but perhaps the point has been made: data science is an exciting and practical topic - which, contrary to popular belief, is not rocket science…

Of course, it’s a complex topic, and if one wants to be a data scientist, one has to learn a lot. But for sure, you don’t need to add a PhD, a special brain, or anything else that would be so unattainable.

Data science is the process of applying scientific calculations that result in the interpretation of billions and trillions of bytes of data using appropriate statistical methods.

Discipline, which involves everyone’s words these days. The type that has grown exponentially in recent years due to the huge amount of data that comes from multiple sources.




Later in this article, we will examine how data science has affected our lives and how we could also be a data scientist with the right data position and acquire the special skills needed to do so.

There is a huge debate about the exact definition of data science. In retrospect, there is no formal definition that can be attached to the ecosystem, and different areas perceive data science differently.

Suppose anyone who works as a software engineer often expresses data visualization using a tool like the Data Science role, while data from sensitive patients working with health professionals to predict cancers from cells is called this Data Scientist position. .




From a layman’s point of view, due to the diversity of the application, people in different fields define it differently, but they all point to this - extracting information from data by certain methods.

It is a blend of math and statistics, machine learning, domain knowledge, computer science and software development.

Mathematics and statistics are the key, as everything from analyzing exploratory data to model building with all the necessary numbers, vectors, probabilities, and so on.


Machine learning can be further subdivided into deep learning and artificial intelligence, and it is a model-building subset of data science. In addition, it is considered necessary to apply basic software development and IT knowledge in these areas.



Finally, business or domain knowledge can go a long way in determining the accuracy of a result, as different businesses use different data to make predictions, and using the right data is extremely important to verify the authenticity of our output.

Hidden samples are primarily revealed from the data with the help of science. These hidden patterns or insights can lead to pioneering results in many areas over a long period of time and improve people’s lives. The image above shows the six stages of a Data Science workflow that facilitates the production of forecasts and models that can be used for production. It is described in more detail in the next section.

Data science work could be classified into the following categories.


Understanding the problem - It is important that the problem note is clear before immersing yourself in the actual implementation part. Knowing what you need to know is essential to get the right data and the perfect solution.
Obtaining the right data - Once you understand the problem, you must obtain the right data to perform the operation.
Exploratory Data Analysis - It is said that ninety percent of a data scientist’s work is data management. The term data slip refers to the cleaning and pre-processing of data before it is fed into the model. The steps include examining duplicate data, outliers, NULL values, and a number of other anomalies that are not part of the data convention for the business.
Data Display - Once the data has been cleaned and pre-processed, we need to provide the data to find out the features or columns we want to use for our model.
Category Coding - This step is applicable in cases where the input characteristics are categorical and need to be converted to numeric (0, 1, 2, etc.) in order to be used in our model as the machine cannot work with categories.
Model Selection - Selecting the right model for a given problem definition is essential because not all models fit perfectly with every data set.
Using the right metric - Based on your business domain, you need to select the metric that determines the perfection of your model.
Communication - The businessman, shareholders often do not understand the technical know-how of Data Science, so it is essential that the results are simply communicated to the business, who may be able to take action to reduce the planned risks.
Installation - Once the model is built and the business is satisfied with the results, the model can be incorporated into production and used in the product.


It consumes our daily lives quickly. From waking up in the morning to going to bed, there is not a single moment that the impact of data science does not affect us. Let’s look at some of the practices of data science that have made our lives easier lately.
Example 1:

YouTube is a favorite way to have fun, knowledge and news in our daily lives. We watch videos rather than slides of long articles. But how do we become so addictive to YouTube? What made YouTube so unique and special?

Well, the answer is simple. YouTube recommends videos based on our data; we want to see the next. A recommendation system uses an algorithm to track search patterns and based on them; its intelligence system shows us videos that are somewhat related to what you see, so we stick to the channel and continue surfing the other videos.


So it basically saves time and energy if we manually search for videos that we like can be useful to us.
Example 2:

Like YouTube, the referral system is used on e-commerce sites such as Netflix, Amazon.

For Netflix, we show you TV shows or movies that are somewhat related to what you’re watching, saving you time finding similar videos.

In addition, Amazon recommends products based on our shopping pattern and displays products that other customers have purchased with the product or that we may have purchased based on our purchasing habits or patterns.
Example 3:

One of the biggest breakthroughs in data science is Amazon’s Alexa or Apple’s Siri. We often find it tedious to browse the phone to connect or feel lazy to set an alarm bell or reminder.

In this regard, virtual assistant systems do everything for us, just by listening to commands. We tell Alexa or Siri the things we want, and the system converts our natural voice into text using the natural language processing topology (we’ll see that later), and we gain insight from this text to solve our problems.


In layman’s terms, this intelligent system uses Speech to Voice terminology to save time and solve our problems.
Example 4:

Data science has made life easier for athletes and people involved in sports arenas. The vast amount of data available today can be used to analyze an athlete’s health and mental state to prepare for a game accordingly.

The data can be used to make strategies and outdo the opponent before the match starts.
Example 5:

Data science has also made health easier in the healthcare sector. Doctors and researchers can use Deep Learning to analyze cells and prevent the disease from developing.

Based on the prediction of the data, an appropriate medication can also be prescribed to the patient.
The best data science companies

This is stated in the XXI. Considered the most sought-after work of the century, professionals from a variety of backgrounds begin their journey as data scientists.

Today, almost every company attempts to incorporate Data Science products to simplify the process and speed up operations to guarantee accuracy at the optimal time. The list of such companies is huge and it would be unfair to separate the best from each other as different companies use data for different reasons.


Along with the US, the Indian market is expanding and in the future this would only benefit professionals. Here are some of the most popular companies where Data Science is exhaustive: -

JP Morgan, Deloitte, Bitwise, Salesforce, LinkedIn, Flipkart, WNS, Mc Kinsey & Company, IBM, Ola Cabs, Mu Sigma, Stripe, Amazon, Big Basket, Netflix, Wipro, Enterprise Bot, Accenture, Myntra, Manthan, TCS, Cisco, Cartesian Analytics, HCL, EDGE Networks, Walmart Labs, Cognizant, (24) 7.ai, Target Corporation, TEG Analytics, Citrix, Sigmoid, Facebook, Twitter, Google Inc., Gobble, Reliance, Square, niki.ai, Dropbox, Airbnb, Khan Academy, Uber, Pinterest, Fractal Analytics.

Those sites where you can find several Data Science inauguration, the - LinkedIn, Indeed, Simply Hired and Angel List.

Data Science deals with data, and every field uses data in some way or another. Therefore, you do not have to belong to a particular discipline to be a data scientist.



At the same time, a curious mindset and desire to gain insight from the data needs to be developed


Data science can help ease time and budget constraints and foster business growth.
Machine definition of many manual tasks that can be better than human impact.
It helps prevent loan defaults used to detect fraud, as well as many other uses in the financial field.
You can gain insight from raw, unstructured textual data.
Forecasting future earnings can prevent financial losses for many large companies.
Programming, data visualization, communication, data collection, statistics, data management, machine learning, software development and mathematics are necessary skills for anyone who wants to enter the data science space.


The use of data science in academia and in real life is extremely different. While in science, Data Science is used to solve many good projects, such as image recognition, face recognition, and so on.

On the other hand, in everyday life, Data Science is used for scams, fingerprint detection, product suggestions, and so on. Prevention.

Much of the work done today is manual, requiring a lot of time and resources, which often hinders the use of the budget allocated to the project. Large companies sometimes look for solutions to optimize such tasks and alleviate budget and resource constraints.



This provides an opportunity to automate tedious processes and achieve excellent results that may not have been possible during manual work

A survey by Forbes shows that Data Science is the future and we need to stay here. The days of manual work are over and Data Science would automate all such tasks. Therefore, if you want to remain relevant in the industry in the future, it is important that you learn about the various aspects and increase your chances of always being employed.

This is a guide to What is Data Science. Here we discussed the different subset of data science, its life cycle, benefits, scope, and so on. You can also review the other suggested articles for more information -


The difference between data science and data visualization
  • Data science interview questions and answers
  • Comparison of data science and artificial intelligence
  • Data Science vs Data Analytics
  • Introduction to data science algorithms

Comments