Deep Learning and Differential programming: Exercise session

The dataset used in this exercise is a very tiny version of the counterfactual learning dataset we used in the following paper (This is not the real counterfactual learning dataset, which will be published later!!):

Fabien Baradel , Natalia Neverova , Julien Mille, Greg Mori, Christian Wolf. COPHY: Counterfactual Learning of Physical Dynamics. pre-print arXiv:1909.12000, 2019. [ArXiv]

The dataset has been created by my student Fabien Baradel during his PhD at INSA-Lyon.

Exercise 1: detect balls in images

We will take as input images of the following kind:

The objective is to detect the spheres of different colors in the image. There are 9 different colored spheres:

COLORS = ['red', 'green', 'blue', 'yellow', 'lime', 'purple', 'orange', 'cyan', 'magenta']

Only 3 spheres will be in each individual image. We provide ground truth data for each image in the form of two matrices:

A 9x1 matrice (a 9D vector) which binary indicates the presence of the ball of a given color. 3 balls are present in each image.
A 9x4 matrice with ball positions:
- Rows correspond to the balls of different colors. Only 3 balls are in each image, so only 3 rows in this matrix are non-zero.
- Columns correspond to: x1,y1, x2,y2.

Task 1: Write a neural model which detects the presence of each of the 9 possible balls in each image.

Task 2: Augment the neural model such that it also detects the bounding box coordinates of the 3 balls which are present.

The set consists of 21000 images and ground truth matrices [TGZ].
Don't forget to split it into a training and a validation set.

The following python code is also provided:

A data loader (a subclass of torch.data.Dataset class), which provides for each sample a 3 tuple of (image tenor (3,100,100), presence vector (9), bounding box matrix (9,4)).
Visualization code, which allows to paint a bounding box on an image.

Zip archive of both files: [ZIP].

Exercise 2 (Advanced!): future forecasting

Objective: Write a neural model, which takes two sequences of ball bounding box coordinates. Take the initial sequence of coordinates and predict the positions of the last instant.

This is actually an illposed problem, since the data has been created with several different physical properties, namely ball masses, friction coefficients, restitution coefficients. These coefficients are not observable from a single time instant (but they can be partially inferred by the network taking into account time). Therefore, do not expect the error to be close to zero.

The set consists of 7000 sequences, each composed of 20 time steps [TGZ].
Don't forget to split it into a training and a validation set.

Again, as before, only 3 spheres will be in each individual image. We provide ground truth data for each sequence in the follwing format:

A 9x1 matrice (a 9D vector) which binary indicates the presence of the ball of a given color. 3 balls are present in each image.
A 20x9x4 matrice with ball positions:
- Dimension 0 is time, from 0 to 19.
- The other two dimensions are like in the first dataset above:
  - Dimension 1 corresponds to the balls of different colors. Only 3 balls are in each image, so only 3 rows in this matrix are non-zero.
  - Dimension 2 corresponds to: x1,y1, x2,y2.

Zip archive of the data loader for this sequences dataset: [ZIP].