Duke Machine Learning Summer School

June 6–10, 2022

The Duke+Data Science program is pleased to announce the Duke Machine Learning Summer School 2022, offered in June as a live five-day class that provides lectures on the fundamentals of machine learning.

The curriculum in the MLSS is targeted to individuals interested in learning about machine learning, with a focus on recent deep learning methodology. The MLSS will introduce the mathematics and statistics at the foundation of modern machine learning, and provide context for the methods that have formed the foundations of rapid growth in artificial intelligence (AI). Additionally, the MLSS will provide hands-on training in the latest machine learning software, using the widely used (and free) PyTorch framework.

Attendees will be able to choose their participation with 2 different options:

In-person attendance on Duke University’s campus in Durham, North Carolina in the Duke Engineering Wilkinson Building
Virtual attendance via Zoom

This is the 11^th Duke Machine Learning School presented since 2017. This series has reached hundreds of participants from academia and industry and including international audiences at the SingHealth/Duke NUS Medical School and the Duke Kunshan University campus. Last year’s machine summer learning summer school attracted 170 participants from around the world, representing 43 universities, institutes, and corporations.

Who Should Attend

The MLSS is particularly well-suited to members of academia and industry, including students and trainees, who seek a thorough introduction to the methods of machine learning, including interpretation and commentary by respected leaders in the field.

The MLSS is meant to provide value to students at multiple levels of mathematical sophistication (including with limited such background). On each day, an initial emphasis will be placed on presenting the concepts as intuitively as possible, with minimum math and technical details. As the concepts are developed further, more math will be introduced, but only the minimum necessary to explain the concepts. Then case studies will show how the technology is used in practical computer vision applications, and these discussions should be accessible to most students (concepts emphasized over detailed math). Strength in mathematics and statistics is a significant plus, and will make all MLSS material more accessible; however, it is not required to benefit from much of the program. Finally, the class will also introduce participants to the coding software used to make such technology work in practice.

To register for the MLSS, please visit https://events.duke.edu/mlss2022

To request consideration for a scholarship, please visit https://duke.qualtrics.com/jfe/form/SV_3DHqTxScoyamsJw

Curriculum

The broad areas of emphasis for the five-day class will include:

Monday, June 6, 2022 (9:00 AM – 5:00 PM)

Basic concepts in machine learning (ML)
Introduction to model building, logistic regression, and the multi-layered perceptron (MLP)
Scaling to “big data” with stochastic gradient descent
Backpropagation as an efficient computation method
Concepts of ML programming, including the use of notebooks, virtual machine resources, and programming language overview
Hands-on coding session covering introductory concepts of ML

Tuesday, June 7, 2022 (9:00 AM – 4:30 PM)

Image analysis with convolutional neural networks (CNNs)
Deep convolutional neural networks
Image classification and transfer learning
Hands-on coding session covering image classification with CNN models

Wednesday, June 8, 2022 (9:00 AM – 4:00 PM)

Advanced image analysis with CNNS for object detection and segmentation
Hands-on coding session covering object detection and segmentation with CNN models
Case study around advanced image analysis

Thursday, June 9, 2022 (9:00 AM – 4:00 PM)

Natural language processing (NLP) with neural networks
Hands-on coding session covering NLP models
Case study around NLP

Friday, June 10, 2022 (9:00 AM – 3:30 PM)

Generative adversarial networks (GANs)
Image synthesis with generative models
Conditional image generation and translation
Case study in ethical issues around machine learning

Teaching assistants will be present throughout the program to support the attendees and will be easily available for assistance and consultation.

Program Format

The 5-day class will provide lectures on the mathematics and statistics at the heart of machine learning, plus hands-on training on implementing machine learning tools with the PyTorch software platform, and case studies of the methods applied to specific application areas.

Each day of the MLSS will be arranged as follows (Eastern Time):

9:00-10:15am Lecture 1: Mathematically-light introduction to the focus of the day
10:45am-noon Lecture 2: Mathematically rigorous discussion of the focus of the day
Afternoons beginning at 1:00pm Coding sessions and case studies

At the end of the MLSS each student will have a deeper understanding of the fundamental concepts of machine learning and applied computer vision, including context for the rapidly evolving field of artificial intelligence. For those students with sufficient mathematical background, the underlying methodology of machine learning will also be learned. Each student should be able to utilize PyTorch to implement the latest machine learning methods for analysis of images, video, and natural language (text).

Program Details: Location, Registration and Cost

Students (with a valid ID, at Duke or other universities) will pay a course fee of $150; the fee for non-students is $400, payable through the registration site. All fees are non-refundable. Once we reach maximum registration, we will maintain a waitlist, and will contact those on the waitlist as spots become available. We also have a small number of scholarships available for those who would be otherwise unable to join.

There will be no difference in cost for participants who attend in person vs. participants who attend virtually. This will allow maximum flexibility and personal choice for attendance options.

Each participant will receive a personal link for the virtual webinars, which will be held live and provide opportunities for questions and engagement with each lecturer. We strongly encourage live participation, but every participant will also have access to the video recordings to use for their personal reference.

Relevance and Context

Machine learning is a field characterized by development of algorithms that are implemented in software and run on a machine (e.g., computer, mobile device, etc.). Each such algorithm is characterized by a set of parameters, and particular parameter settings yield associated algorithm characteristics. The algorithms have the capacity to learn, based on observed data. By “learn” it is meant that the algorithm can infer (or learn) which algorithm parameter settings are best matched to the data of interest. After algorithm parameters are so learned, the associated model ideally captures the underlying characteristics of the data. The algorithm, with learned parameters, may subsequently be applied to new data, with the goal of making predictions or learning insights. Machine learning methodology is primarily concerned with designing appropriate models/algorithms for datasets and problems of interest, plus the capacity to learn the model parameters given data (with challenges manifested when that data is of a massive scale).

In the context of prediction, one may be interested in developing algorithms that are capable of automatically classifying and interpreting imaging data; for instance, in a healthcare setting, to improve clinical care. In this case, the healthcare data may be radiological images (e.g., x-rays, ultrasound videos, computed tomography volumes, etc.) and/or a history of patient care. In healthcare, the goal is to use machine learning to make improved diagnoses, interpretation (e.g., location of abnormalities within an image) and recommendations for care. Similar concepts are of interest in business, where one may be interested in tailoring advertising and products to individuals or improving image search. In education, machine learning may be used to tailor educational material to the level and interests of each student. Machine learning is increasingly making an impact in almost all areas of personal and professional life.

Recently, with increasing access to massive imaging datasets (e.g., ImageNet), and to significant advances in computing resources, the quality of machine learning performance (e.g., prediction accuracy) has improved markedly. Further, over the last seven years, significant advances have been made in a subfield of machine learning called “deep learning” that have completely changed the landscape and boundaries of computer vision to the point in which computer vision models, in multiple scenarios, have surpassed human ability in image classification and object detection tasks, as well as being able to generate images of such quality that are indistinguishable from real images by most humans.

This class will focus on the areas of machine learning that have made the biggest advances in utility over the last several years, including deep learning. The class will concentrate on methods that allow machine-learning algorithms to train effectively on massive datasets, i.e., “big data”.

The 2022 MLSS is presented by the Duke+Data Science (+DS) program, which is one of the partner programs supporting the mission of the Duke Center for Computational Thinking (CCT), and with support from Duke AI Health’s Health Data Science (HDS) Program.

If you have any questions, please send an e-mail to plus-datascience@duke.edu