Impressions and Links from
DeepLearn 2019

3rd International Summer School on Deep Learning.
July 22-26, 2019. Warszawa, Poland.

I had the great pleasure of taking part in DeepLearn 2019 in Warszawa, Poland (Summer School on Deep Learning. July 22-26, 2019).

Below you will find impressions from the conference, and links for further reading.

The DeepLearn 2019 conference was held in Warszawa (Poland), inside the rather colossal Global Expo building.

Tried to follow as many talks as possible. But, well, these notes are, of course, in no way, shape or form complete...
Rather, these notes were written on conference nights, as my way of keeping track of the events that I attended at the conference. And as a way of storing links and references for future reference.

But enough disclaimers, below, you'll find impressions and links from some of the conference talks and seminars, including links for further reading.

Great stuff indeed. And much (ML & DeepLearn stuff) to look forward to in the coming years!

1. Introduction.

The venue for DeepLearn 2019 was the massive Global Expo, in Warszawa.

1.1. Page Overview.
- Presentations, Keynotes and Workshops.

Below, in section (2 - 6), you will find impressions and links from presentations that I followed Monday - Friday.
Impressions from the ''employer session'' & misc. (post conference) impressions from Warszawa, including the entertaining ''Dinner with Strangers'' conference experience, can be found in section (7).

Please notice: These notes don't do justice to the often brilliant presentations that initiated them!
So, please read the original presentations to avoid any distortions ...

Lost in a Random Forest. Facebook group for DeepLearn 2019

2. Impressions from Monday.
July 22nd.

2.1. Using Neural Networks for Modeling and Representing Natural Languages.

Tomas Mikolov started the Summer School with a talk about ''Using Neural Networks for Modeling and Representing Natural Languages''.

As I understood it, Mikolov is now working as a research scientist for Facebook AI Research.
But has previously worked on software for word-embedding's at Googgle. I.e. according to Wiki:

Tomas Mikolov et al. (at Google) created word2vec, a word embedding toolkit, which can train vector space models faster than previous approaches.

He writes on his Facebook homepage that:

My long term research goal is to develop intelligent machines capable of learning and communication with people using natural language.

The session started with an ''introductory'' presentation of NLP.
Still, there was a lot to unpack here, as:

Text processing is the core business of internet companies today (Facebook, Google, Twitter, Baidu, Yahoo etc.)

Digging a little deeper, we moved on to word vectors, and vector space models.
On TensorFlow they write:

Vector space models (VSMs) represent (embed) words in a continuous vector space where semantically similar words are mapped to nearby points (''are embedded nearby each other'').
...
All (VSM) methods depend in some way or another on the Distributional Hypothesis, which states that words that are used and occur in the same contexts tend to purport similar meanings...

See:
An Introduction to Word Embeddings.
For more see Mapping Word Embeddings with Word2vec. Or Word Embeddings.

Kaggles word2vec-nlp-tutorial is also worth a look.

Mikolov then went on to talk about FastText:

The classifier FastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation [1].

E.g. in the Facebook universe it is relevant that we can figure out which texts has to do with sports etc.

Facebook deals with an enormous amount of text data on a daily basis in the form of status updates, comments etc. And it is all the more important for Facebook to utilize this text data to serve its users better. And using this text data generated by billions of users to compute word representations was a very time expensive task until Facebook developed their own open source library, FastText, for Word Representations and Text Classification.

Youtube video: How to Classify Text with FastText.
FastText tutorial: FastText classification, FastText:Tutorial.
And there are, of course, many interesting products out there that can do text analysis. See e.g. ParallelDots.

Super interesting, but this was, of course, only the start. Later sessions would deal more with some of the techniques behind FastText, just as there was more on ''Using Neural Networks for Modeling and Representing Natural Languages'' later on.

Code: Facebook Research on GitHub, FastText.

2.2. Understanding the Brain with Deep Learning.

Bertrand Thirion, Neurospin, talked about ''Understanding the Brain with Deep Learning''.

This first session was mostly about fMRI, and''decoding'' and ''encoding'' processes.

The human Brain. DeepLearn 2019, Warszawa, Poland

How can we decode something we measure in the brain (fMRI, EEG) into relevant concepts and objects in our models (for perceptual processing, working memory etc). And how are stimuli inputs encoded into brain activity that can be measured?

EEG and MEG have a good temporal resolution. fMRI has a slow temporal resolution, but a good spatial resolution (5 - 1 mm). Thirion noted that it would be nice with an even better spatial resolution, so that it would be possible to follow individual cortical columns, but we are not there yet (See: Nxt-Gen Scanners, 10.5 Tesla scanner).
Even though the fMRI resolution isn't high enough for cortical column mapping, and the temporal resolution is too coarse for activity chronometry and it doesn't give us access to neural mechanisms - there is, of course, still a lot to learn from fMRI.
A lot...

Taking a closer look at vision, Riesenhuber and Poggio's (See Ml-Prague 2019, Section 4.1. Solving the 3 main theoretical puzzles of Deep Learning) H-Max (Hierarchical Model and X, a computational model of object recognition in cortex) was presented.

Indeed, there do seem to be links between AI/DL and neuroscience research, when it comes to the processing of visual stimuli.

According to Demis Hassabis et al.:

We believe that the quest to develop AI will ultimately also lead to a better understanding of our own minds and thought processes.
Distilling intelligence into an algorithmic construct and comparing it to the human brain might yield insights into some of the deepest and the most enduring mysteries of the mind.
See: Cell: Neuroscience-Inspired Artificial Intelligence.

And ''Identifying (read out, decode) natural images from human brain activity'' is indeed possible.

Models make it possible to identify, from a large set of completely novel natural images, which specific image was seen by an observer [1].

Also, pretty cool, mapping semantic space in the brain: See the (awesome) ''Brain-viewer'', for ''Associating brain regions with categories; modulation of attention''.

Indeed, lots of things to do.

Wanting to connect the dots yourself, well ...
NiLearn: Machine learning for Neuro-Imaging in Python might be worth a try...?

2.3. Afternoon Sessions.

Gave us ''Part II'' of ''Using Neural Networks for Modeling and Representing Natural Languages'' & ''Understanding the Brain with Deep Learning''. But first there was a session with Aaron Courville about ''Deep Generative Models''.

''Generative models take training samples from some data distribution and learn a model that represents that distribution''. Why: To synthesize new observations. Where we should expect a good model to be able to distinguish between real and fake data.

Interesting, and certainly something to consider for future classes in Deep Learning...

For more about the 2 other sessions, look in the sections above...

3. Impressions from Tuesday. July 23rd.

3.1. Using Neural Networks for Modeling and Representing Natural Languages. Part III.

Back in Global Expo's rather enormous ''Location P1'', this ''Part III'' of ''Using Neural Networks for Modeling and Representing Natural Languages'' continued with more about the latest SOTA (State Of The Art) neural networks for ''Modeling and Representing Natural Languages''. Starting with some thoughts about Bengio's ''A Neural Probabilistic Language Model'' [1] (2003) ideas, and then moving onwards to later ''state of the art'' neural net models.
See e.g. Exploring the Limits of Language Modeling by Rafal Jozefowicz et al.

Again, a very interesting session, and certainly highly useful stuff.

3.2. Understanding the Brain with Deep Learning. Part III.

Here, Bertrand Thirion continued with more about ''Understanding the Brain with Deep Learning''.

I.e. (from the first sessions we had that) it would be nice to gain more insights into how we should do things like:

Encoding: Study the association of brain activity in any given region with the stimuli/tasks.
Decoding: Assess whether a region is important or not in the prediction of stimulus occurrence.

Clearly, there is no lack of data:

Since its beginning more than 20 years ago, functional magnetic resonance imaging (fMRI) has become a popular tool for understanding the human brain, with some 40,000 published papers (2016) according to PubMed [1].

But, sadly, there is lack of consistent vocabulary in psychology, which makes it quite hard to extract useful knowledge from all of the studies...
Enter: NeuroSynth, ''a platform for automatically synthesizing the results of many different neuroimaging studies''.
Pretty neat.

And with open source tools like Martinos:

Open-source Python software for exploring, visualizing, and analyzing human neurophysiological data: MEG, EEG, sEEG, ECoG, and more.

It should be somewhat easier to understand our own minds...
But, well, it is, of course, still difficult, very difficult...

E.g. (Apparently) Humans solve most cognitive problems with (neural net architectures with) 6-7 layers, while the latest artificial neural nets, like Resnet (2015), might use 152 layers [2].

Clearly, we haven't exactly figured out how to make artificial neural nets just as smart as the brains neural nets, not yet anyway...
There is still a lot to learn...

Three awesome lectures indeed!

3.3. Physics meet ML.

Here, Sergei V. Gleyzer talked about ''Physics meet ML''.

CMS particle detector data depicting a Higgs boson produced by colliding protons

What we want here is object and event type classification in particle physics experiments.

I.e. we want Machine Learning that can do:

Physics object identification (Particle Identification).
Event type classification (Pattern Recognition, tracks).

As it is likely that the interesting events are quite rare, we need to smash a massive amount of particles together, in order to get to the more complex interactions.

This means that we need techniques that can automatically identify a massive amount of collisions.
In other words, we want to use ML to help identify the collisions (de Oliveira, 2015):

Building on the notion of a particle physics detector as a camera and the collimated streams of high energy particles, or jets, it measures as an image, we investigate the potential of machine learning techniques based on deep learning architectures to identify highly boosted W bosons [1].

New physics is expected to show up as anomalies (And there must be a lot of new physics out there... as the Universe is thought to consist of approximately 5 % atoms, 23 % dark matter & 72 % dark energy. Where no one, in 2019, knows what dark matter and dark energy really is...).

Top quark and anti top quark pair decaying into jets

So, we should look for anomalies in the particle collisions.

GANs (generative adversarial networks) are also useful, as they can be used to come up with realistic simulations:

Where: Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics.

Here this technique can be used to:

Create the simulation of particle-detector response to hadronic jets [2].

If we want to make sure that the networks only detects physics, where the laws of special relativity still stands... it is possible to add a ''LoLa''-layer to the networks: ''Deep-learned Top Tagging with a Lorentz Layer''.

And there is, of course, no shortage of data to train on at Cern.
See OpenData (Cern) (And explore more than two petabytes of open data from particle physics!).
From 2010 - 2035, 15 Exa byte is expected. So, in order to fully exploit the detectors, and all of the low-level data, deep learning is essential.

For more, see: Online Detector Monitoring Using AI (Processing more data),
Detector monitoring with artificial neural networks (Reliable data quality monitoring)
& Deep Learning for the CMS Data Quality Monitoring.

All very cool indeed. And, most befittingly, the guy sitting in front of me in this lecture was wearing a Hannover University T-Shirt: ''Shaping the future with knowledge''.

Implementation of neural networks for multi-jet regression.
Development of Machine Learning Tools in ROOT, Machine Learning Developments in ROOT.

Global Expo Hallway.

3.4. Knowledge Discovery.

Mark Zhang talked about ''Knowledge Discovery''.

Knowledge discovery is the process of extracting useful knowledge from data.
In the age of Big Data, we need to look at the five V's of our data:

Volume (Data Size).
Velocity (Velocity refers to the speed at which vast amounts of data are being generated, collected and analyzed. Certainly, data has to be available, and processed, at ''the right time'' in order to be useful for business decisions).
Variety (Is the data structured, semi-structured or unstructured. And where does the data come from? Smartphone GPS technology, social networks or?).
Veracity (Quality, credibility).
Value (The data need to add value).

Hopefully, our data-sets will be so good, that we will eventually be able to move on to classification (Classify data based on training, using e.g. DL) and prediction (Model continuously valued functions, and predict unknown or missing values).

All pretty cool. But we should always remember that knowledge discovery might not be as easy as it sounds...?

Tom Chatfield - Writer, broadcaster and tech philosopher

(We know from human reasoning that) There are both very simple concepts (low level) and very complex concepts (high level) out there. And learning them all might not be so easy...
Clearly, we should always be careful out there...

4. Impressions from Wednesday.
July 24th.

4.1. Adversarial Machine Learning. Part I & II.

Fabio Roli talked about ''Adversarial Machine Learning''.

If you know the enemy and know yourself, you need not fear the result of a hundred battles.
Sun Tzu, The art of war, 500 BC.

On ''A Half-day Tutorial on Adversarial Machine Learning'', Biggio & Roli writes about a tutorial on ''Adversarial Machine Learning'':

Learning algorithms have to face intelligent and adaptive attackers, who can carefully manipulate data to purposely subvert the learning process.
As these algorithms have not been originally designed under such premises, they have been shown to be vulnerable to well-crafted, sophisticated attacks, including test-time evasion and training-time poisoning attacks (also known as adversarial examples).
Countering these threats is the subject of ''adversarial machine learning''.

See also: ''Ten Years After the Rise of Adversarial Machine Learning''.

Wednesday. DeepLearn 2019, Warszawa, Poland

Roli gave the following model for the different attack types
(See their article for a complete and more precise version of their model [1]):

	Attackers goal: Misclassifications.	Attackers goal: Misclassifications.	Attackers goal: Query strategies to reveal info about learning model or users
Attackers capability	Integrity Compromise system, without compromising normal system operation	Availability Compromise normal functionalities available to legitimate users	Privacy/Confidentiality
Test Data	Evasion (a.k.a. adversarial examples. Manipulating input data to evade a trained classifier at test time)	-	Model Extraction / Stealing. Aimed to steal machine-learning models
Training Data	Poisoning (to allow subsequent intrusions. I.e. create specific backdoor vulnerabilities, neural network trojans)	Poisoning (to maximize classification error)	-

Roli: Categorization of attacks against machine learning based on threat model.

Obviously, Deep-learning algorithms have been used for malware detection.
And these algorithms are, of course, also susceptible to adversarial examples [2].
Where small, carefully-crafted, changes to the input (malware) can lead to misclassifications.

In one example Biggio and Roli find that:

A convolutional neural network does not learn any meaningful characteristic for malware detection from the data and text sections of executable files, but rather tends to learn to discriminate between benign and malware samples based on the characteristics found in the file header [3].

Given this knowledge one doesn't need to change much in order to fool the malware detection app:

By only changing a few specific bytes at the end of each malware sample, while preserving its intrusive functionality.... these binaries can then evade the targeted network with high probability... [4].

In ''Intriguing properties of neural networks'' [5] Christian Szegedy, Ian Goodfellow et al. writes that:

While neural nets expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties...

E.g.:

We can cause the network to misclassify an image by applying a certain hardly perceptible perturbation, which is found to maximize the network's prediction error.

Clearly not good.
Now, a selfdriving car doesn't see a ''stop'' sign, but random noise. Or now a pedestrian wears a T-shirt that causes a self-driving car to reclassify the pedestrian as background noise?

So, indeed, we should use adversarial examples to investigate whether a given DL algorithm is safe, or not.
Some DL algorithms might overemphasize some features, while others might be too sensitive to very small changes in the input, which could all lead to wrong classifications.

Certainly, we do not want people to be able to 3d print something that could fool a robots vision system into making dangerous mis-classifications. Just as we don't want self-driving cars that could be fooled by small changes to road signs...

Retraining with adversarial examples (often) make the algorithms more robust (I.e. help avoid overfitting & improve the generality of the learned model).

For more conference impressions...
E.g. see ...

5. Impressions from Thursday.
July 25th.

5.1. Deep Learning for Machine Translation
(Speech Recognition and Machine Translation, Part III).

Hermann Ney, Aachen, talked about ''Deep Learning for Machine Translation''.

Scribbled down a few highlights from the talk:
Machine translations used to be based on linguistic information about the source and target languages (dictionaries and grammars). I.e. machine translations were rule-based (RBMT). In 1997, Neco and Forcada (''Asynchronous translations with recurrent neural nets'') started using ''encoder-decoder'' structures to do machine translations [1]. Language models based on neural networks followed in 2003.

Bahdanau, Cho & Bengio writes (2015):

Unlike the traditional phrase-based translation systems, which consists of many small sub-components that are tuned separately, neural machine translation attempts to build and train a single, large neural network that reads a sentence and outputs a correct translation [2], [3].

Vaswani, Shazeer et al. (Google Brain) came up with an ''attention model'' in 2017, ''Attentition is all you need'' [4], [5]. Here, attention mechanisms allow dependency modelling to draw global dependencies between input and output, which has improved translation quality (''after being trained for as little as twelve hours on eight P100 GPUs'') [6].

About recent work on ''Artificial Neural Nets in Speech and Language Technology'', Ney noted that:

They result in significant improvements.
They provide one more type of probabilistic models.
They are part of the statistical approach.

With much more to come in the coming years...

5.2. Cognitive Architectures for Object Recognition
(in Video), Part I.

Jose Principe, CNEL, Florida, talked about ''Cognitive Architectures for Object Recognition in Video''.

According to Principe, there are four fundamental functions in cognition:

Attention.
Memory.
Perception action cycle.
Intelligence.

Where biological systems try to extract information from the environment through mechanisms like:

Sensory stimulation.
Adaptation and Learning.
Stored representations.
Information processing.

I.e. the brain takes in information, and tries to come up with actions that matches our future world.
See Fuster's hierarchy [1]:

Starting from sensory receptors, Fuster suggested that information flows upward in the sensory hierarchy, where the motor hierarchy can be viewed as going in the opposite direction.
Information is exchanged between the two hierarchies in an ongoing perception-action cycle. From a low level, to high levels of planning, thinking and anticipating the future.
Bernard J. Baars: ''Cognition, Brain, and Consciousness''.

Small regions of the world are processed, and then combined into an overall view of a scene.

Machine perception is very different from biological perception though...
I.e. pattern recognition in computer vision is a kind of statistical learning. A passive process, where the system reacts to data in a bottom up proces. And in most computer vision systems learning is done through supervised learning, which requires labels (that biological systems often do not have). Provided by services like image-net (With a lot of images that can be used to train image classification systems).

Still, both natural and biological systems are bombarded by huge amounts of data..., where most of it isn't that interesting. The interesting data hides, and the systems must then distill information from these floods of data.

Or, in terms of information theory:

If someone is told something they already know, the information they get is very small.
It will be pointless for them to be told something they already know. This information would have very low entropy.

If they were told about something they knew little about, they would get much new information. This information would be very valuable to them. They would learn something. This information would have high entropy [2].

When the data source produces a low-probability value (i.e., when a low-probability event occurs), there is more ''information'' (''surprise") than in a high-probability value. Where Entropy is a measure of the unpredictability of the state, of its average information content, or reciprocal of the probability of the event [3].

Indeed, as Shannon would have said, everything is Information theory: Quantification, storage, and communication of information.

A rather ''difficult'' talk, but very interesting.

5.3. Abstractions, Concepts and Machine Learning.

Haixun Wang, WeWork, Palo Alto, talked about ''Abstractions, Concepts and Machine Learning''.

Here it was all about the challenges of language understanding.
And how concepts, priors and structures can help us make sense of language, and our world...

Starting out with some comments about how difficult it can all be. E.g. language translations are pretty hard if the only thing you have are translation rules & word matching, followed by harmonization & spell checks.

Machine translations learning from human produced translations are usually better. And, as more data becomes available the quality of the translations usually becomes better. And, much better than rule-based translation systems.
Still, there are limits to how good most of these systems are. 95 % accuracy might be fine, when it comes to spam filtering etc, it might not be so good, when it comes to information extraction and understanding.

Wang reminded us of the Semantic Web, where annotation should make the meaning of web pages easily understandable for machines.
A (road map for the future) web, where machine reasoning would be ''ubiquitous and devastatingly powerful''...

In the words of Clay Shirky:

Since it's hard to make machines think about the world, the new goal is to describe the world in ways that are easy for machines to think about.

But, but...

Many important aspects of the world can't be specified in an unambiguous and universally agreed-on fashion.
...
Because meta-data describes a worldview, incompatibility is an inevitable by-product of vigorous argument.
...
It would be relatively easy, for example, to encode a description of genes in XML, but it would be impossible to get a universal standard for such a description, because biologists are still arguing about how genes exactly function... [1].

So, it does not work...one ontology expressed in one language covering the whole web, was the dream.
But, in the real world, there are many ontologies, covering overlapping subdomains.

Language itself is slippery:

Mary puts on a coat everytime she leaves the house.

Who is ''she''? Is She = Mary ?

She puts on a coat everytime Mary leaves the house.

She = Mary is now very unlikely.

According to Wang, the big question is then how we can make sense of so little: ''The input data we get is often sparse, noisy and ambiguous. In many ways far too limited to support the inferences that we make''.

When we go beyond the data that we are given, some other mechanism must make up the difference?
Here it was suggested that we should take a closer look at ''concepts, priors and stuctures''.
Where ''concepts are the glue that holds our mental world together''.

A concept map depicts suggested relationships between concepts

See e.g. Microsoft Concept Graphs.

And, for more facts & knowledge (to help us make sense of our world), ''Freebase'' (A large collaborative knowledge base consisting of data harvested from many different sources), ''DBpedia'' (Structured content from the information created in the Wikipedia project), ''Visipedia'' etc. might be useful...

Still, understanding sentences like:

An Apple engineer is eating an apple.

Isn't exactly easy...

Well... For more about our brains and how to make sense of it all... follow the ASSC 23 link here...

6. Impressions from Friday, July 26th.

6.1. Learning to Track Objects. Part I & II.

Ming-Hsuan Yang, UC Merced, talked about ''Learning to Track Objects''.

Tracking. DeepLearn 2019, Warszawa, Poland

Tracking is a lot of fun. E.g. see my 2018 OpenCV Object Detection Experiments.

Btw. note that it is also possible to do tracking with OpenCV
(Not just Object Detection, as demonstrated in my 2018 OpenCV experiments).

Still, things like occlusion, scale change etc. can, of course, also make tracking quite difficult!

In this talk, a number of different techniques were presented for tracking a single object. See e.g.:

Incremental Learning for Robust Visual Tracking.
Visual Tracking with Online Multiple Instance Learning.
Superpixel tracking (It remains a challenging problem for a tracker to handle large change in scale, motion, shape deformation with occlusion. Here a tracker is demonstrated that can handle heavy occlusion and recover from drifts).
Collaborative tracking (Through a collaborative inference mechanism, where a target is not only determined by its own observation and dynamics, but also through help from estimates of adjacent targets).
Hedge tracking (A novel CNN based tracking framework, that ''uses an adaptive Hedge method to hedge several CNN trackers into a stronger one'').

Yang also pointed our attention to some pretty interesting adversarial learning techniques for tracking:
VITAL: Visual Tracking via Adversarial Learning:

Add a generative network between the last convolutional layer and the first fully connected layer.
The generative network augments positive samples by generating weight masks randomly applied to the features, where each mask represents a specific type of appearance variation.
Through adversarial learning, the network then identifies the mask that maintains the most robust features of target appearance in the temporal domain.

Still, things like large scale variations in complex image sequences are of course difficult to deal with.

However, Yang's presentations of work by (e.g.) Martin Denelljan et al. did give the impression that there is hope that some of these problems can be tackled.
See Accurate Scale Estimation for Robust Visual Tracking:

Visual object tracking is still difficult due to factors such as partial occlusion, deformation, motion blur, fast motion, illumination variation, background clutter and scale variations.
Most existing approaches provide inferior performance when encountered with large scale variations in complex image sequences...
In this paper, we tackle the challenging problem of scale estimation for visual tracking...

And Martin Denelljan's Github page for visual tracking.

Indeed, there is a lot to keep track of out there...
And, a super interesting talk!

Simon Laub. DeepLearn 2019, Warszawa, Poland

Self-Tracking at DeepLearn 2019...

7. Conclusion.

7.1. Employer Session.

At the employer session Wednesday, I had a talk with the good people from ''Zeta Alpha'', Amsterdam. Super interesting stuff. Apparently, they have now begun looking for DL people to create software to improve the way large companies are managed!

Indeed, there must be opportunities in that area...
Will be interesting to follow that company in the years to come!

7.2. Dinner with Strangers.

Met a lot of great people at the ''Dinner With Strangers'' meetings
Wednesday and Thursday evening.
That I certainly hope to stay in touch with in the coming years!

Among others:
Wednesday: Sara, Franzisca, Reza, Jürgen & Ammar.
Thursday: Konstantinos, Amanda, Leonardo & Matej.