Courses

COMP 430/533 • Introduction to Database Systems

This course is an introduction to relational and other (NoSQL) database systems, SQL programming, and database design.

The main goals of this course are for the student to:

1. Understand the benefits of using a database
2. Become familiar with database systems and terminology
3. Create well-designed databases and understand trade-offs
4. Develop proficiency in effectively managing data in a database
While the course is focused on developing skills as a database designer, it also includes discussions of database implementation details to enable students to understand underlying system functionality and how that impacts decisions a database designer makes.

Assignments in the course consist of theoretical and programming assignments, a semester long team project, and a number of in-class assessments.

COMP 543 • Graduate Tools & Models – Data Science

This course is an introduction to modern data science. Data science is the study of how to extract actionable, non-trivial knowledge from data. The course focuses on software tools used by practitioners of modern data science, the mathematical and statistical models that are employed in conjunction with such software tools and the applications of these tools and systems to different problems and domains.

On the tools side, we will cover the basics of relational database systems, as well as modern systems for manipulating large data sets such as Apache Spark, and Google’s TensorFlow. On the models side, the course will cover standard supervised and unsupervised models for data analysis and pattern discovery. In particular, this class explores the use of these tools and models in the analysis of “big” data, that is datasets that are too large to be analyzed on a typical personal computer.

At the end of this course, students will understand the development and use of modern machine learning tools and will be able to implement machine learning algorithms using these tools. They will have basic skills in querying relational databases and will understand and be able to implement and use common data science models, including gradient descent, K-nearest neighbors, deep learning and more. They will also be familiar with the theoretical basis and underlying research that motivated the systems and models discussed in class.

Assignments in this course consist of theoretical and programming assignments, written analyses of seminal research papers in the field, and hands-on labs.

A public version of the material for this course is available here: http://dstoolsandmodels.rice.edu

DSCI 302 • Data Science Tools and Models

This course is a core component of Rice Unviersity’s proposed Data Science minor. It is intended for non-Computer Science majors, and provides a carefully paced introduction to tools and models in modern data science. Data science is the study of how to extract actionable, non-trivial knowledge from data. The course focuses on software tools used by practitioners of modern data science, the mathematical and statistical models that are employed in conjunction with such software tools and the applications of these tools and systems to different problems and domains. In particular, we will cover relational database systems and Apache Spark, a distributed computing framework used in data science.

At the end of this course, students will understand the development and use of modern machine learning tools and will be able to implement machine learning algorithms using these tools. They will have basic skills in querying relational databases and will understand and be able to implement and use common data science models, including gradient descent and K-nearest neighbors. They will also be familiar with the theoretical basis that motivated the systems and models discussed in class.