Ad Code

Responsive Advertisement

AD8302 - Fundamentals of Data Science

AD8302 - FUNDAMENTALS OF DATA SCIENCE

COURSE OBJECTIVES

Will gain knowledge in the basic concepts of Data Analysis
 To acquire skills in data preparatory and preprocessing steps
 To understand the mathematical skills in statistics
 To learn the tools and packages in Python for data science
 To gain understanding in classification and Regression Model
 To acquire knowledge in data interpretation and visualization techniques

UNIT I INTRODUCTION - 9 Hours

Need for data science – benefits and uses – facets of data – data science process – setting the research goal – retrieving data – cleansing, integrating, and transforming data – exploratory data analysis – build the models – presenting and building applications

UNIT II DESCRIBING DATA I - 9 Hours

Frequency distributions – Outliers – relative frequency distributions – cumulative frequency distributions – frequency distributions for nominal data – interpreting distributions – graphs –averages – mode – median – mean – averages for qualitative and ranked data – describing variability – range – variance – standard deviation – degrees of freedom – interquartile range – variability for qualitative and ranked data

UNIT III PYTHON FOR DATA HANDLING - 9 Hours

Basics of Numpy arrays – aggregations – computations on arrays – comparisons, masks, boolean logic – fancy indexing – structured arrays – Data manipulation with Pandas – data indexing and selection – operating on data – missing data – hierarchical indexing – combining datasets – aggregation and  grouping – pivot tables

UNIT IV DESCRIBING DATA II - 9 Hours

Normal distributions – z scores – normal curve problems – finding proportions – finding scores –more about z scores – correlation – scatter plots – correlation coefficient for quantitative data –computational formula for correlation coefficient – regression – regression line – least squares regression line – standard error of estimate – interpretation of r2 – multiple regression equations –regression toward the mean

UNIT V PYTHON FOR DATA VISUALIZATION - 9 Hours

Visualization with matplotlib – line plots – scatter plots – visualizing errors – density and contour plots – histograms, binnings, and density – three dimensional plotting – geographic data – data analysis using statmodels and seaborn – graph plotting using Plotly – interactive data visualization using Bokeh

TOTAL PERIODS: 45

OUTCOMES:

At the end of the course, the students should be able to:

 Apply the skills of data inspecting and cleansing.
 Determine the relationship between data dependencies using statistics
 Can handle data using primary tools used for data science in Python
 Represent the useful information using mathematical skills
 Can apply the knowledge for data describing and visualization using tools.

TEXT BOOKS

  1. David Cielen, Arno D. B. Meysman, and Mohamed Ali, “Introducing Data Science”, Manning Publications, 2016. (first two chapters for Unit I)
  2. Robert S. Witte and John S. Witte, “Statistics”, Eleventh Edition, Wiley Publications, 2017. (Chapters 1–7 for Units II and III)
  3. Jake VanderPlas, “Python Data Science Handbook”, O’Reilly, 2016. (Parts of chapters 2–4 for Units IV and V)

REFERENCES

  1. Allen B. Downey, “Think Stats: Exploratory Data Analysis in Python”, Green Tea Press,

Post a Comment

0 Comments

Close Menu