What is data science

 


The Big Data is here to stay, and we assume that their impact on society will be permanent. Just as it happened with writing, the media or so many other human inventions with immense cultural impact, the increase in the production and computational analysis of large volumes of data is transforming each of our activities. Some professions are in crisis, others benefit, and new ones are also created.

Big data is an imprecise term, which is used when we want to talk about the data that our society creates and processes in digital form, with increasing speed, volume, and variety.

Accordingly, data scientisto “data scientist” is also a profession, or an activity, that is not yet clearly defined. The term, which encompasses those who daily apply programming techniques to analyze data, did not exist before 2008. Only four years later the Harvard Business Review shook the waters by declaring that those who serve as data scientists can boast of the “sexiest profession of the XXI century” [^ 1]. Exaggerated titles aside, what is certain is that the discipline offers an increasingly mature set of knowledge aimed at exploiting data to extract knowledge. The techniques and principles that the data science community has developed can be exploited in many areas. Among them, that of the social sciences,

https://www.dailycbdnewz.com/urine-it-can-make-money-for-you-a-drug-test/

https://utah.instructure.com/eportfolios/45118/Home/Real_powdered_urine

https://utah.instructure.com/eportfolios/45118/Home/Test_clear_powdered_urine_reviews

https://realpowderedurine.blogspot.com/

https://realpowderedurine.blogspot.com/2020/12/test-clear-powdered-urine-reviews.html

https://realpowderedurine.blogspot.com/2020/12/dehydrated-urine-for-drug-test.html

https://www.articleted.com/article/344497/82161/Real-human-urine

https://www.articleted.com/article/344499/82161/Testclear-reviews

https://mayankrana322.medium.com/dehydrated-human-urine-for-sale-f0fcf27d4c9c

https://mayankrana322.medium.com/does-testclear-powdered-urine-work-6bd6e92b77f7

https://resources.instructure.com/eportfolios/4186/Home/Test_clear_powdered_urine_

https://resources.instructure.com/eportfolios/4186/Home/Powdered_urine

Advancing the frontiers of data science, creating the algorithms and computing techniques that open up new possibilities for analysis is a complex task, carried out by specialists with deep knowledge of mathematics. And yet "using" data science, applying its principles to solve complex problems, is much easier. To get started we just need patience to learn some fundamental programming and statistics concepts, using them to understand and communicate with data. That is what the book is about.

1.1 What does it mean to do data science?

We already said that data science is about using programming techniques to analyze data. But it's not just that; Applied data science requires the development of skills in four areas:

  • Programming . By our accepted definition, every data scientist uses programming to explain to computers what he needs from them. In doing so, it employs "computational thinking": the ability to reduce a complex task to a series of steps that can be solved with code interpreted by a computer. Let us clarify in case it was necessary that not all problems are solvable by computational means, but many are, at least in part. The data scientist uses some programming techniques (or many, depending on the degree of specialization) to solve problems that would be impractical to address otherwise.

  • Statistics . Inescapable! Also powerful, sometimes counterintuitive, when we have revealing luck. Statistics are many things, but - despite its bad reputation - never boring. It's just a matter of making friends with her. We will need it to extract knowledge from the data. It's amazing how much can be accomplished with just a few rudiments (mean, median, standard deviation, and quartiles) and from then on it's just a matter of digging deeper step by step.

  • Communication . A data scientist combines “hard” skills with others that require empathizing with others: those related to interdisciplinary communication and collaboration. Finding a way to explain complex processes, to translate the insights of a statistical model into terms that make sense to a broad audience, to create visualizations that allow third parties to “read” the data and draw conclusions on their own. Part of doing data science is knowing how to discuss the data used and the results obtained with a very diverse interlocutors: general audience, public officials, colleagues, specialists from other disciplines, and so on.

  • Domain knowledge . Domain knowledge is the accumulated experience in a particular field of human activity: agriculture, public relations, quantum physics, child rearing. Essentially complements analytical skills. Domain knowledge not only helps to discern whether the answers obtained through sophisticated statistical analysis make sense. It is also necessary to know what are the questions we should be asking.

The four skills come into play in every project that involves data science, to a greater or lesser extent according to the analysis stage. Speaking of stages, Hadley Wickham, one of the current references in the field, defines them as follows:

https://issmaacademy.instructure.com/eportfolios/117/Home/Test_clear_powdered_urine_

https://issmaacademy.instructure.com/eportfolios/117/Home/Real_human_urine_for_sale

https://ubatc.instructure.com/eportfolios/45168/Home/Real_urine_for_drug_test

https://ubatc.instructure.com/eportfolios/45168/Home/Test_clear_powdered_urine_kit

https://gumroad.com/realpowderedurine/p/dehydrated-urine-for-drug-test

https://gumroad.com/realpowderedurine/p/test-clear-reviews

https://erarpitsharma.com/what-you-need-to-know-before-passing-a-drug-test/

https://blog.libero.it/wp/realpowderedurine/2020/12/13/do-you-have-to-pass-drug-test/

https://mundoalbiceleste.com/author/oliver/

https://prlog.ru/analysis/buyrealpowderedurine.com

https://www.ted.com/profiles/25154899/about

https://mix.com/oliverjennifer


And all this carried out by programming, of course.

Throughout the chapters of this book we will learn programming techniques that allow us to go through each of the steps of the process, and in doing so we will be exercising the four skills involved in data science.

is a programming language specialized in data analysis and visualization. It is an open source product, which means that anyone can use and modify it without paying licenses or acquisition costs of any kind.

Experts from around the world are actively collaborating with the project, not only developing the language itself (called “R base”), but also extending it with new skills that can be incorporated by end users in the form of installable “packages”.

The quality of the language itself, of the installable packages that add endless functions to it (from artificial intelligence algorithms to interactive maps) and of the user community that shares information in forums and blogs, has made R one of the languages most popular programming in the world. In the field of data analysis, it is the tool par excellence in many universities, technology companies, and data journalism newsrooms.

2.1 Our first project in R

Below we will reproduce a step-by-step exercise, to illustrate the power of an analysis tool like R. Let no one worry if some of the operations seem to make no sense, or are arbitrary. It is normal! Nobody learns a language in 10 minutes, be it R or Esperanto. The idea is to have early exposure to an interesting use case, using real data. And that it serves as motivation to later practice basic exercises that are very necessary but, sometimes, not so exciting.

2.1.1 To investigate: What is the difference in infant mortality between the south and the north of the Autonomous City of Buenos Aires?

Buenos Aires is a city that for decades has presented a marked polarization between its southern neighborhoods, relatively less developed, and those in the north where the socioeconomic level and quality of life are higher.

One of the most regrettable aspects of the north-south disparity, and without a doubt the one that has generated the most controversy and cross-accusations, is the difference in the infant mortality rate according to the region of the city.

How big is that difference? How is it distributed geographically?

We will use R to answer those questions and visualize the results of our analysis, using official figures published by the city as a source.

2.1.2 Create a project in RStudio

The first step is to run RStudio, which we should already have available on our system.

Once the graphical interface is open, we create a new project, clicking on File -> New Project... -> New Directory -> New ProjectIn the window that appears, choose a name for the project (for example, "Practicing R") and finish the operation by clicking on Create project.

Using projects allows us to continue another day from where we left the task at the end of a session. It is only a matter of recovering the desired project the next time we open RStudio, clicking on File -> Recent Projects -> "nombre de mi proyecto".

For now, let's keep working. We are going to create a "script". A script, as its name in English implies, is a script; a series of steps that we write so that our computer executes in sequence. We click on File -> New File -> R ScriptA window with a text editor opens immediately. Now the action begins!

https://resources.instructure.com/eportfolios/3573/Home/Free_car_for_college_students

https://resources.instructure.com/eportfolios/3573/Home/Single_mothers_scholarships

https://resources.instructure.com/eportfolios/3573/Home/Free_lawyers_for_child_custody

https://resources.instructure.com/eportfolios/3573/Home/Social_security_apartments

https://issmaacademy.instructure.com/eportfolios/115/Home/Free_cars_for_students

https://issmaacademy.instructure.com/eportfolios/115/Home/Scholarships_for_single_moms

https://issmaacademy.instructure.com/eportfolios/115/Home/Government_senior_living

https://issmaacademy.instructure.com/eportfolios/115/Home/Free_child_support_lawyers_for_mothers

https://ubatc.instructure.com/eportfolios/44254/Home/No_money_for_lawyer_child_custody

https://ubatc.instructure.com/eportfolios/44254/Home/Low_income_apartments_senior_citizens

https://ubatc.instructure.com/eportfolios/44254/Home/Scholarships_for_single_moms

https://ubatc.instructure.com/eportfolios/44254/Home/Student_car_program

2.1.3 Writing a script

Let's take this opportunity to give a name to the areas that we see in RStudio:

We are going to write our code (the instructions it Runderstands) in the edit panel. The results will appear in the console (when it comes to text) or in the output panel (when we produce graphics)

For example, we can write the edit panel the instruction to show the result of a mathematical operation:

sqrt()it is a function . In the world of programming, functions are ready-to-use sequences of code that perform useful tasks. For example, show something on the screen. In our case, we complete the function with something else: a parameter , as this is how the values ​​that a function expects from the user to know what to do are called. The function sqrt()expects us to give it a number for which to calculate its square root , and we did that: we passed it as a parameter 144, a number. Parameters are always enclosed in parentheses, following the function name.

Now we are going to learn the most important key combination when using RStudio: CtrlEnterPressing CtrlEnterwhen you finish writing an instruction causes RStudio to execute it immediately, and wait for the next instruction, if any.

We can also search for a line that we want to execute, positioning the text cursor (which looks like a vertical blinking bar, in the edit panel) over it. If we then press CtrlEnter, the line will be executed and the cursor will only move to the next line, ready to repeat the process.

The line-by-line mode of execution is very useful for what is called “interactive analysis”. One executes a command, observes the result, and based on that decides its next action: change parameters and try again, accept the results and use them for a subsequent task ... etc.

Two of them (the first and the last) showed an output on the screen, and the middle one did not. This is because some functions return something as a result of something - a number, a text, a graphic, or other types of output that we will see - while others do their homework silently without expressing anything. In this case, the silent function was the assignment function: it mensaje <- "Hola mundo"is an instruction that asks R to create a variable called "message" (or to find it if it already exists) and to assign the text "Hello world" as a value. How do we know that the instruction was carried out, despite not producing an output? In general, it is a matter of trust. If an instruction does not generate an error message, if it is silent, it is assumed that it was able to accomplish its task. In this case, we have also verified it. The final linemensajeit asks R to find the variable, and display its content on the screen (that's a very practical feature of the language: to know the content of a variable, just type it and execute the line). And in doing so, we check that the variable contains precisely what we have typed.

In passing, it should be mentioned that the creation and manipulation of variables is a key concept in programming. Working with variables allows us to store values ​​for later use, as well as making our code easier to read and share with others, especially when we use self-explanatory variable names. As an example of the latter, let's compare

Comments