Impressions and Links
from the conference:

Nordic DataScience and MachineLearning Summit

Stockholm, October 18-19th, 2017.


In October, I had the great pleasure of taking part in the Nordic DataScience and MachineLearning Summit in Stockholm.



Tried to follow as many talks as possible. But, well, these notes are, of course, in no way, shape or form complete...
Rather, these notes were written on conference nights, as my way of keeping track of the events that I attended at the conference. And as a way of storing links and references for future reference.

But enough disclaimers, below, you'll find impressions and links from some of the conference talks and seminars, including links for further reading.
Great stuff! Indeed, already looking forward to 2018.
Disclaimer

Workshops.

Wednesday, October 18th, 2017.

Workshop 1.

Room Tor. Basics of GPU computing for Data Scientists.
by Albin Sunesson and Peter Hultin, Laketide.



Cuda Illustration

GPU's.
GPUs (Nvidia and others) power millions of applications around the world, accelerating computationally-intensive tasks for consumers, professionals, scientists, and researchers. And CUDA is a parallel computing platform that allows software developers to use (CUDA-enabled) graphics processing units (GPU) for general purpose processing - aka. GPGPU (General-Purpose computing on Graphics Processing Units).

Recently, Open Graphics Libraries, i.e. OpenGL, has also been used for other purposes than strictly graphic processing (i.e. handling large volumes of data very fast). Where the popularity has to do with availability on many platforms, and because of good fallback options, in the absence of hardware-accelerated drivers.

Still, here, in the workshop, the focus was on the CUDA platform.

In Robert Lucianis (Laketide) words:
If you're serious about big data and machine learning, you're already taking advantage of GPU, MIC, and FPGA powered analytics tools. This new breed of software can allow a single workstation to outperform a 100-node compute cluster in tasks like machine learning, graph analysis, financial modeling, and many other scenarios.
First up was a short presentation of JuliaBox and the Jupyter notebook: Jupyter
Including a short presentation of the Julia Language (A high-performance dynamic programming language for numerical computing, which includes mature, best-of-breed open source C and Fortran libraries for linear algebra, random number generation, signal processing, string processing etc).

High-Performance GPU Computing in the Julia Programming Language was introduced (For an online induction, see GPU Computing in Julia, as well as here. Github projects are also available).
For more on Parallel Computing and Julia, online, it might be possible to start here.

Altogether, a pretty exciting workshop!

Workshop 2.

Room Tor. Overcoming Machine Learning Challenges.
by Anya Rumyantseva, Hitachi Vantara.

The next workshop was about ''Overcoming Machine Learning challenges''.
With discussions about Data Preparation and Feature Engineering.

Conclusion: One should expect to spend 80% of the time in a real life Machine Learning project doing stuff like Data Preparation and Feature Engineering, rather than working on (fancy) models...

Basicly, ones model performance will be poor, if things like Data Preparation and Feature Engineering aren't done properly, obviously...

This was followed by some thoughts on the art of choosing the right ML algorithm:
The following steps in the process was identified:

- Identify suitable class(es) of algorithms to move forward with.
- Train and validate multiple models.
- Consider tools for automated Machine Learning.
- Can Deep Learning give an even better result?
(Should you make calls to TensorFlow from your flow?)

Conference.

Wednesday 18th, 2017.

Talk.

Uniting domain experts and data scientists for teams with greater impact.
by Niklas Noren, Uppsala Monitoring Center.

Had a number of clever thoughts about projects where datascientists or domain experts become support actors.
- But no good. Domain experts and datascientists should work together.

Certainly, we need domain experts to ensure that methods are relevant to real-world needs (Which increase the chance that project challenges are addressed).

It is all too easy to drown in information, and low-hanging fruits.
We need domain experts to guide the datascientists to the good fruits, where the real information is.

Jon Moon

Talk.

Sensors, Data and Prediction - The holy Trinity of IOT.
by Anya Rumyantseva, Hitachi Vantara.

Next up was Anya Rumyantseva from Hitachi Vantara with ideas about the dataflow from IOT towards Machine Learning processing.

IOT ML Moon

She walked us through an example with data from a rail system.

Millions of data points were collected each second.
Which was then transferred to a Big Data platform, and eventually analyzed in order to predict operational events.
Not surprisingly, the happy ending was that the new system was in fact able to predict operational events, and therefore save millions (of dollars/euros).

Talk.

Punch above you weight; introducing data science to SME's (Small and Medium sized businesses).
by Brynjolfur Borgar Jonsson, Data Lab Iceland.

Brynjolfur Borgar Jonsson had looked at trends at trendwatching.

One particular interesting trend, according to Johnsson, is that customers feel: Of course...

Equally obvious, what the customers really want to know is:
HOW DO YOU MINIMIZE THE RISK AND MAXIMIZE THE BENEFITS WHEN ADOPTING DATA SCIENCE IN YOUR OPERATIONS?
So, what to do?
Again, it is probably not that surprising that a good recommandation is that we should look at different kinds of solutions. Starting small, and then gradually moving towards bigger and more sophisticated solutions:
AT FIRST THE SOLUTIONS ARE MINIMUM VIABLE PRODUCTS BUT AFTER SEVERAL ITERATIONS THEY SHOULD BE GENERATING VALUABLE AND ACTIONABLE INSIGHTS.
The first solution might not even be an ML solution. If a solution gives value to the customer, then thats fine.
We can always improve our solutions.
BUILD - MEASURE - LEARN. THEN DO IT AGAIN.
A great motivational talk. It is indeed all about getting started.

Talk.

Context is king, putting the science in datascience.
by Mikael Klingvall, Vattenfall AB.

According to Klingvall:
Every phenomenon is the result of some process; contextual analysis of that phenomenon tries to understand the complexity of that process (the determinants and their dynamics) to better predict the outcomes.
Trying to come up with good models to describe phenomenons is clearly difficult, but why is this?

Based on his own Klingvall model:
Map of ML Challenges
- It is easy to see that Machine Learning is something that takes place in a world right next to the Jungles of AY-TEE. Models are apparently something you find in the Hidden Temple of accurate models.
No wonder, that it is not easy to get it all right.

A great talk!

Conference.

Thursday 19th, 2017.

Panel discussion.

Advisory board panel - from Data first to AI first.
by Anders Arpteg.

Talked about the big trends:
From Artificial Intelligence in the 1950s to 1980s.
Followed by Machine Learning in the 1980s until now.
And moving forward with Deep Learning, from the 2010s and onwards.
Where Deep Learning is beginning to flourish.
As
- Large datasets of labelled training data, like image net, begin to become available for everyone.
- Computer power, GPUs, has again increased tremendously in recent years.
- AI libraries, like TensorFlow, are now available for everyone.
- And (AI) algorithms, such as deep learning reinforcement, has again improved.

According to Kevin Kelly it is also pretty easy to see where this is all going:
The business plan of the next 10.000 start-ups are easy to forecast...
Take X and add AI...
Indeed, many businesses, such as Google, are now transitioning from a Mobile First to an AI first business plan.

The problems are also pretty easy to see:
- Lack of labelled data.
- (Problems with) Model transparency and troubleshooting
(If you have a model with 2000 co-dependent variables how could that be explained to humans?).
- Massive software engineering overhead.
- Lack of knowledge/experienced talent.
- Privacy and safety issues.
etc.
Indeed, what are the full consequences for business models, organizations, and cultures, as we move towards an AI first world?

Sure, AI should be all about empowering humans. So, that more people can do more complicated tasks.
But there are of course huge consequences.
- Currently, in the US, the most common job is being a truck driver.
Hardly something we will have in an ''AI first'' world (with self-driving cars) !??

What does it even mean to live in an ''AI first'' world?
Certainly, as soon as we understand AI, we stop calling it AI, then we consider it un-intelligent...?

But as long as we don't understand it, we are a little bit afraid of it. And think that its ultimate optimization goal must the same as ours, to survive....But that is of course not necessarily true ...

A great panel discussion!

Talk.

Are you a Yoda, a young Luke Skywalker or a StormTrooper?
by Ingo Paas, Apotek Hjaertet.

Talked about key skills in Silicon Valley, and elsewhere:
Key skills: Passion, focus, engagement, risk taking, value creators, in-depth industry experience, have a track record of failure...!
And then of course, whether we were ''Troopers'' (The majority of people in a company), or ''Luke Skywalkers'' (not quite there yet), trying to learn from the ''Yodas'' (which there are plenty of in the Valley).

And there is of course always a lot out there to learn, with opportunities for many new businesses.

Todays, 2017, catchphrases and popular words, things like:
Mobility, Virtual Reality, Augmented Reality, Bots, Blockchain, IOT, Cloud, Big Data, Machine Learning and AI
- Are, of course, very exciting, and the starting place for businesses for years to come. But even these things will, of course, eventually give way to new ideas. And being positioned for that is what ''the game'' is all about.
Making it a reality has to do with:
Thinking Big, starting small.

It also allow us to go after the interesting projects (instead of the boring ones).
I.e.
Most projects are boring, because we have done them before.
Actually, innovation and risk is not an option anymore in todays world.
This is something we all have to do!

Take away:
- Promote curiosity.
- Solve real problems.
- Unlearn and learn.

Talk.

Data empowerment through user-centric design.
by Werner Kruger, Klarna.

Talked about machine learning for all.

Logically, this has to do with adding real value to businesses through ML:
I.e.
- Identify opportunities to use ML.
- The ability to execute on those opportunities.
- Have a competitive advantage (higher quality ML solutions than your competitors).

Talk.

The paradigm shift of the enterprise R&D.
by Jesse Chao, Ericsson AB.

Started by giving a short introduction to the need for the coming 5G network:
- Broadband experience everywhere.
- Smart vehicles, transport and infrastructure.
- Critical control of remote devices. Robots.
- Interactions. Humans - IOT.
With speeds up to 5-10 GBit/s, 5G will be about 100 times as fast as the current 4G network.
With Stockholm and Tallinn among the first movers in 2020.

Controlling all of this data, will obviously take a lot of Data Scientists...
And being a Data Scientist, is, of course, the SEXIEST job you can have in the 21th century.
Right there in the intersection of mathematics, statistics, computer science, domain experience and business acumen...

5G devices

All great.

Especially, ML and AI.
Still, the speaker warned us about the Gartner Hype cycle:

Gartner Hype Cycle

So, sure, there are going to be disappointments along the way.
But technologies will eventually end up being useful, and practical...

Finding the right people to make it all happen will be the real challenge for corporations.
- And we should probably not expect Unicorns, people who can do everything.

Instead, the message here was that the game is all about building teams that can do the job...

Talk.

Open source vs. proprietary tools. Pros and cons, and solutions.
by Robert Moberg, Houston Analytics.

Started by a John Naisbitt quote: :
''We have for the first time an economy based on a key resource (information) that is not only renewable, but self-generating. Running out of it is not a problem, but drowning in it is.''
A key question is of course how to work with these tidal waves of data out there.
Should we ''code away'' in our own tools, or should we use some standard tools, that might not be perfectly suited for our needs, but standardized.
Obviously, there are going to be pros and cons here.

Moberg presented his ideas, and members of the audience offered their opinions on how to proceed.
Indeed, it soon became obvious that the entire field has not reached the level of maturity that are seen in more ''standard'' fields. The frontline is rarely a place where everyone agrees on what to do (next)...

Talk.

Deep learning for music recommendation and generation.
by Sander Dieleman, Deepmind.

SanderDieleman


Music recommendation has become an increasingly relevant problem, as music is increasing sold and stored in the cloud. Dieleman talked about the difference beween recommendation systems using a bag-of-words approach for representation of audiosignals and a deep neural network based system.

Apparently, it turns out that recent advances in deep learning translate very well to the music recommendation setting.
Still, predicting what kind of latent factors music audio signals might contain clearly not that easy.

Some results were pretty good though.
E.g. Looking at Beyonce's ''Speechless'' the system predicted that Rihanna's ''Haunted'' and Daniel Bedingfield's ''If You're Not The One'' would be close matches..

If you like one, then you will probably also like the others...?

More about ongoing work here.
SanderDieleman


Jesse Chao, Ericsson AB.






Jesse Chao, Ericsson AB.

Talk.

Deep learning without a Big Dataset. Segmenting Brain Tumors.
by Lars Sjoesund, Peltarion.

With 14 million new cancer cases per year, 8.8 million deaths, and 2/5 of all people diagnosed at some point in life, cancer is a pretty terrible disease.
Brain cancer especially so.

And finding (brain) tumors, in 3D MRI scans, is so far a very time consuming and costly manual procedure.
So, this talk was all about setting up artificial neural nets that can find tumors by looking at these scans.

Also, a pretty tricky thing, as a neural net that predicts ''no tumors'' are almost always right.
Tumors are rare, but when they are there, it is important that the system picks them out.

It is suggested that U-Net's (artificial neural nets) are well suited for the task
(For a presentation see U-Net: Convolutional Networks for Biomedical Image Segmentation).

A relatively cheap machine (with 4 TITAN X (12 GB) GPUs, 256 GB RAM, 40 CORES INTEL XEON E5-2630, costing 100.000 S.Kr.) is used for the project.
Setting it up takes less than 1000 lines of codes, with app. 20 for the model, using TensorFlow.
Results looks pretty encouraging.

But, well, doctors will probably still be needed to overlook the process, at least for now.
What happens 5 years time from now, after 2022, is anybodys guess though.

Talk.

Reinforcement learning. From ATARI to the real world applications.
by Andreas Merentitis, Zalando Research.

Generally, (Supervised) learning is difficult:
- When interacting with the environment we get feedback sparsely.
- When interacting with the environment we get delayed feedback.
Still, in the real world we want solutions that can operate in:
- Sparse reward environments.
- Can deal with partially observable environments.
Among other interesting ideas, Merentitis (also) talked about the introduction of rewards (early on) that are less sparse and more smooth early on (Idea: Maximize reward while learning). Sounded very promising.

Talk.

Big Data and ML for production optimization. From planning to day to day.
by Alla Sapronova, Center for Big Data Analysis, Uni Research A/S.

Within renewable energy, things like cost optimization, failure prediction and short term forecast are very important concepts.
And, models build on things like MLP, Decision trees/Random forest and Genetic Algorithms can obviously be helpful.
But (we should remember that) custmers don't like:
- Complex models. Indeed, they like models to be simple and understandable.
- Black box models, that we can't look into.
Things get even worse, when the models aren't based on concrete physical things out there in the environment, or, worse still, try to predict chaos.

So, lessons learned:
1. Appreciate simple models.
2. Customers do not appreciate ''black boxes''.
3. Use physical-based models if possible.
4. Don't try to predict chaos.

Talk.

Listening to Data in order to drive growth.
by Ismail Douieb, Schibsted Group.

The customers are always right. And the more of them we have, the more right they are...

Smart companies certainly tend to fight hard for more customers and users.
Clever companies, such as Facebook, even ask us to add our friends, all in the hope of getting more customers.

This will all create a lot of ideas, where it is important that we prioritize:
- What is the impact of the idea?
- Ease (How easy is it to implement the idea).
- Confidence (How confident are we that the idea will work).

People.

All of the above, was, of course, only a small part of the conference.

Obviously, it is impossible to mention all of the great speakers and attendees that I actually saw at the conference. Still, here at the end, it should be mentioned that I got many great (data science) insights from Jaime Pastor (from Combient), and many others conference attendees.

Certainly, loads of great people at the conference.
A great event.

Thinking about becoming a datascientist?
Interestingly, Ferrologic now has a course that can make you a ''certified datascientist''...
Well, well ...

Pictures from Stockholm, Oct. 17-20, 2017.


Conference Venue.
7A Odenplan, Stockholm


Enactive Cognition Conference (Reading 2012) | Nasslli 2012 | WCE 2013 | Aspects Of NeuroScience 2017
About www.simonlaub.net | Site Index | Post Index | Connections | Future Minds | Mind Design | Contact Info
© October 2017 Simon Laub - www.simonlaub.dk - www.simonlaub.net - simonlaub.com
Original page design - October 20th 2017. Simon Laub - Aarhus, Denmark, Europe.