data science for everyone

Posted by tetley at 2020-04-12

Extended information

Language: Simplified Chinese

Page number: 222

Open book: 32

Original title: Data Science

Former press: MIT Press

Attribute classification: storefront

Include CD: No CD

Out of print: not out of print

Book Introduction

Book characteristics

Data science accessible to all [Ireland] by John D. Kelleher, Brendan Tierney, Zhang Shiwu, Huang Yuanxun, translation of basic elements of data science, zero basic intuitive understanding of data science system

Introduction to books

The goal of data science is to improve decision-making by building decision-making on the basis of extracting insight from big data sets. As an active field, data science consists of a series of principles, problem definitions, algorithms and processes, which are used to extract useful but inconspicuous patterns from large data sets. Data science is closely related to the fields of data mining and machine learning, but it covers a wider range. Nowadays, data science promotes the development of decision-making in almost all fields of modern society. Data science may affect all aspects of people's daily life, such as determining the presentation of online advertising, recommending movies, books and friends to users, filtering spam, providing users with appropriate preferential packages when renewing their mobile phone contracts, reducing the cost of medical insurance, planning the layout and traffic time of traffic lights within the jurisdiction, drug design, police deployment planning, etc. The emergence of big data and social media, the acceleration of computing power, the significant reduction of the cost of computer memory and the development of more powerful data analysis and modeling methods promote the growth of the demand for data science in contemporary society, among which the typical technology is deep learning. The combination of these factors means that it will be easier for organizations to collect, store and process data than before. At the same time, these technological innovations and the widespread application of data science mean that moral challenges related to data use and personal privacy have never been so urgent. The purpose of this book is to provide an introduction to data science, to cover the basic elements of the field, and to provide profound principled insights into the field. Chapter 1 of this book introduces the field of data science, briefly reviews the development and evolution history of data science, and discusses why data science is so important today, as well as some factors that promote the adoption of data science. At the end of this chapter, we review and expose some myths related to data science. Chapter 2 introduces the basic concepts related to data and describes the standard process of data science project: business understanding, data understanding, data preparation, modeling, evaluation and deployment. Chapter 3 focuses on the data infrastructure and the challenges of big data and multi-source data integration. A typical and potentially challenging aspect of data infrastructure is that data in databases and data warehouses typically resides on a different server than the one used for data analysis. Therefore, when dealing with large data sets, it may take a lot of time to move data between the server on which the database or data warehouse depends and the server for data analysis and machine learning. Chapter 3 first describes the typical data science infrastructure in the organization, and some new solutions to the challenges of moving large data sets in the data science infrastructure, including the use of database built-in machine learning algorithm, the use of Hadoop for data storage and processing, and the development of hybrid database system, which seamlessly combines the traditional database software And Hadoop like solutions. At the end of this chapter, some challenges of integrating data from the whole organization into a unified representation suitable for machine learning are emphasized. Chapter 4 introduces the field of machine learning, and explains some of the most popular machine learning algorithms and models, including neural networks, deep learning and decision tree models. Chapter 5 focuses on connecting machine learning expertise with real-world problems by examining a series of standard business problems and describing how machine learning solutions can solve these problems. Chapter 6 reviews the moral meaning of data science, the latest development of data supervision, and some new computing methods to protect personal privacy in the process of data science. Finally, Chapter 7 describes some areas in which data science will have a significant impact in the near future, and lists some important principles for determining the success of data science projects.

Back cover text

What is data science? How does data science evolve? What is the standard process for data science projects? What are the challenges of data infrastructure? What is the relationship between data science and machine learning? How to supervise data and protect personal privacy in the process of data science? What are the important principles for the success of data science projects? What is the future impact of data science?

Nowadays, data science has promoted the development of decision-making in almost all fields of modern society, and is affecting every aspect of people's daily life. This book aims to explain the basic ideas and concepts needed to understand data science, help you understand what data science is, how it works, and what it can (and cannot) do.

Brendan Tierney is a lecturer at the school of computer science, Dublin Institute of technology, director of Oracle ace, and author of several data mining works based on Oracle technology.

Translator's Preface

Library catalogue

Book reviews

Recommendation of teaching resources

Recommended reference books