Online Modules and Workshops

Develop your data analytics skill set at your own pace through a variety of free, self-guided online modules and technical workshops – open to all Penn undergraduate and graduate students.

Available Modules

Intro to SQL Bootcamp

Analytics is expanding at exceptional rates, with implications across all industries. SQL is the back-end language for direct data manipulation for many major websites, databases, and computer systems around the world. This course is intended to give you a good understanding of basic SQL and database concepts, and opportunities to practice running different kinds of queries on databases. No prior knowledge of SQL is expected or required.

Intro of Python Bootcamp

Whether you have experience in programming or are looking to get started for the first time, getting involved in the Python community will put you on the fast track to honing your skills as a programmer. In this class, you’ll learn all about Python – including how to get started, what advantages and disadvantages Python provides as a programming language, the essentials of programming, and what tools are available to build applications in Python. You’ll learn best practices, code reuse, and how to write good clean programs.

Intro to R Bootcamp

This open source software offers lots of power with relatively little code, a large user community, robust add-on packages for specialized analyses, and parallelization to make full use of multiple cores/cluster environments. It can also be plugged into a complex workflow or an analytics engine. This course was designed to give you a basic familiarity with, and understanding of, this high-level programming language. By the end of this course, you’ll be able to understand syntax and idioms for writing R code and explore the add-in packages that can expand R’s repertoire and enhance your productivity.

Intro to Big Data

The Intro to Big Data workshop is designed for students who want to learn about big data and how it can be analyzed with big data platforms. This course will provide an overview of Hadoop, the software framework used for distributing storage and processing of large datasets. Students will learn how to deploy, configure, and monitor a Hadoop Cluster; import and export data; and how to query and explore big data using Hive and interactive SQL. Students will also be introduced to Spark, a powerful in-memory processing engine used for sophisticated analytics using Python and PySpark (Spark Python API). Prior knowledge of SQL or Python is helpful but not expected or required.

Intro to Microsoft Azure and Data Science in the Cloud

Microsoft Azure is a cloud computing platform for building, testing, deploying, and managing applications and services. It offers Azure Notebooks, which is a free hosted service to develop and run Jupyter notebooks in the cloud. The Data Science Virtual Machine is a customized image built specifically for doing data science in the cloud. In this workshop, you’ll learn how to leverage Microsoft Azure cloud services to set up your own data science virtual environment in a browser.

Build a Text-Based Chatbot with Amazon Lex

In this workshop, you will learn how to build a text-based conversational chatbot using Amazon Lex, the same deep learning technologies and natural language understanding (NLU) that power Amazon Alexa.  You will run Python code in the cloud to execute business logic using Amazon Lambda, and then integrate your chatbot with Twilio to access your bot via text messaging.

Data Analysis Project: Hotel Booking Software Platform

Clientivity is a hotel booking software platform that empowers users to create, manage, and earn commission from personal, group, and corporate travel. In a recent Analytics Accelerator, the company was interested in improving the partner experience by building pipelines to increase sales volume and actively engage with partners for a longer period of time.

A student led team under the guidance of Professor Serguei Netessine reviewed the company’s large dataset which included funnel statistics, partner and end-user demographics, and hotel pricing trends.   After reviewing the data, the team recommended to segment partners to identify high-performing groups, optimizing for sales efficiency. Additionally, recommendations were given to guide partners to improve performance with timely information and friendly competitions, keeping partners active by catering to their desires, convenience, and sense of community.

This course includes all final project deliverables, including presentation materials, code, the complete dataset, and findings.

Online Modules Enrollment Form

  • To enroll, you must provide a valid UPenn email address.
  • Example: lsmith, mfoster, smootk. Please note: This is NOT your numeric Penn ID.
  • Eight digit number