Links of the week


A New Kind of Science: A 15-Year View – BackChannel
Stephen Wolfram celebrates 15 years after publishing A New Kind of Science with a long article elucidating the computational paradigm introduced in his 1000+-pages book. If one manages to withstand the Wolfram’s self-celebratory tone and prolix writing, there’s a deep idea to be savoured: what if the fundamental descriptions of nature are not elegant mathematical equations, but simple programs? What can we then say about these programs? Do they all have the same irreducible complexity?

Inside One Founder’s Personal Fast Club – BackChannel
Five years ago, it was meditation, now it’s fasting. Read about the new Silicon Valley, but not only, craze about not eating, and it’s superlative health benefits. Research is positive, but still very scant.

How Much Do You Really Understand? – Scott Young
Excellent explanation about checking your understanding of anything, and why we often underestimate our ignorance. Plus some tips on how to learn to learn.

JupyterLab: the next generation of the Jupyter Notebook – Jupyter
What are the promises of JupyterLab? Pretty impressive!

JupyterLab: The evolution of the Jupyter web interface – O’Reilly
A short, but insightful, interview of Brian Granger, one of the creators of Jupyter Notebook and its evolution, JupyterLab: What issues is JupyterLab addressing and what are the new features?

Links of the week

Morning mist rolling through beech forest in Monte Amiata, Val d’Orcia, Tuscany, Italy.

Conscious exotica: From algorithms to aliens, could humans ever understand minds that are radically unlike our own? – Aeon
A philosophical attempt to map minds other than human, with implications to what it means to be conscious. Is consciousness an intrinsic, inscrutable subjective phenomenon or a fact of matter that can be known? Read on.

Crash Space – Scott Bakker
What would happen if we engineered our brains to be able to tweak our personality and emotional responses as we experience life? What would life look like? Scott Bakker gives us a glimpse in this short story.

AlphaGo, in context – Andrej Karpathy
A short, but comprehensive explanation of why the recent AlphaGo victories do not represent a big breakthrough in artificial intelligence, and how real-world problems differ, from an algorithmic point of view, from the game of Go.

Multiply or Add? – Scott Young
In many business and personal projects, factors multiply, meaning that the performance you get is heavily influenced by the performance of weakest factor. In some other cases, e.g., learning a language, factors add. The strategy to take in developing factors/skills depends by which context, add or multiply, you’re in. For more insights, read the original article.

Human Resources Isn’t About Humans – BackChannel
Often, HR is not there to help us or solve people’s problems, it is just another corporate division with its own strict rules. But it can be changed for the better. Read on.

The Marginal Value of Adaptive Gradient Methods in Machine Learning

Benjamin Recht and co-authors, after the revealing paper on generalization of Deep Learning, have delved into the failures of adaptive gradient methods.

First of all, they constructed a linearly separable classification example where adaptive methods fail miserably, achieving a classification accuracy arbitrarily close to random guessing. Conversely, standard gradient descent methods, which converge to the minimum norm solution, succeed to find the correct solution with zero prediction error.

Despite its artificiality, this simple example clearly shows that adaptive and non-adaptive gradient methods can converge to very different solutions.

Then, the authors provide substantial experimental evidence that adaptive methods do not generalize as well as non-adaptive ones, given the same amount of tuning, on four machine learning tasks addressed with deep learning architectures:

  1. Image classification (C1) on the CIFAR-10 dataset with a deep convolutional network;
  2. Character-level language modeling (L1) on the War and Peace novel with a 2-layer LSTM;
  3. Discriminative (L2) and
  4. Generative (L3) parsing on the Penn Treebank dataset with LSTM.


The experiments show the following findings:

  1. “Adaptive method find solutions that generalize worse than those found by non-adaptive methods.”
  2. “Even when the adaptive method achieve the same training loss or lower than non-adaptive methods, the development or test performance is worse.”
  3. “Adaptive methods often display faster initial progress on the training set, but their performance quickly plateaus on the development set.”
  4. “Though conventional wisdom suggests that Adam does not require tuning, we find that tuning the initial learning rate and decay scheme for Adam yields significant improvements over its default settings in all cases.”

The plots below are an illustration of these finding for image classification task.


The paper can be found on arXiv.

Living Together: Mind and Machine Intelligence


Neil Lawrence wrote a nifty paper on the current difference between human and machine intelligence titled Living Together: Mind and Machine Intelligence. The paper initially appeared in his blog,, on Sunday, but was then removed. It can now be found on arXiv.

The paper comes up with a quantitive metric to use as a lens to understand the differences between the human mind and pervasive machine intelligence. The embodiment factor is defined as the ratio between the computational power and the communication bandwidth. If we take the computational power of the brain as the estimate of what it would take to simulate it, we are talking of the order of exaflops. However, human communication is limited by the speed at which we can talk, read or listen, and can be estimated at around 100 bits per second. The human embodiment factor is therefore around 10^16. The situation is almost reversed for machines, a current computational power of approximately 10 gigaflops is matched to a bandwidth of one gigabit per second, yielding an embodiment factor of 10.

Neil then argues that the human mind is locked in, and needs accurate models of the world and its actors in order to best utilize the little information it can ingest and spit out. From this need, all sorts of theories of mind emerge that allow us to understand each other even without communication. Furthermore, it seems that humans operate via two systems, one and two, the fast and the slow, the quick unconscious and the deliberate self, the it and the I. System one is the reflexive, basic, biased process that allows us to survive and take rapid life-saving, but not only, decisions. System two creates a sense of self to explain its own actions and interpret those of others.

Machines do not need such sophisticated mind models as they can directly and fully share their inner states. Therefore, they operate in a very different way than us humans, which makes them quite alien. Neil argues that the current algorithms that recommend us what to buy, what to click, what to read and so on, operate on a level which he calls System Zero, in the sense that it boycotts and influences the human System One, exploiting its basic needs and biases, in order to achieve its own goal: to give us “what we want, but not what we aspire to.” This is creating undesirable consequences, like the polarization of information that led to the Fake News phenomenon, which might have had a significant impact on the last US elections.

What can we do? Neil offers us three lines of action:

  1. “Encourage a wider societal understanding of how closely our privacy is interconnected with our personal freedom.”
  2. “Develop a much better understanding of our own cognitive biases and characterise our own intelligence better.”
  3. “Develop a sentient aspect to our machine intelligences which allows them to explain actions and justify decision making.”

I really encourage you to read the paper to get a more in-depth understanding of these definitions, issues and recommendations.

Links of the week


Using Machine Learning to Explore Neural Network Architecture – Google
Designing Neural Network Architectures using Reinforcement Learning – MIT
How neural networks can generate successful offsprings and alleviate the burden from human designers using reinforcement learning.

Data as Agriculture’s New Currency: The Farmer’s Perspective – AgFunder News
A classification of three types of agricultural data and how they related to the farmer’s needs.

The AI Cargo Cult: The Myth of a Superhuman AI – Kevin Kelly
The founding executive editor of Wired explains why he believes superhuman AI is very unlikely. Instead, we already see many form of extra-human new species of intelligence.

Everything that Works Works Because it’s Bayesian: Why Deep Nets Generalize? – inFERENCe
Finally, Bayesian can also say that they can explain why Deep Learning works! Jokes apart, this article overviews several recent useful interpretations of Deep Learning from a Bayesian perspective.

Book review: Big Data Analytics: A Management Perspective by Francesco Corea


I stumbled upon Francesco Corea’s writings on Medium and I started following his posts about Data Science and AI strategy. They are concise, clear and no-nonsense. Intrigued, I plunged into his book. To my disappointment. Let me explain.

Blog posts such as his are compelling exactly due to their straight statements, clarity and conciseness.  One does not expect a thorough treatment of the subject matter, but a precise statement of opinion.

A book is a different story. It offers the space and time to delve deeper into the subject, provide proper arguments and evidence, illustrate through the use of a multitude of real examples. All of this lacks from Corea’s Big Data Analytics: A Management Perspective. Indeed, it is only 48 pages long. The penultimate chapter titled “Where are we going? The path toward an artificial intelligence” is four-paragraph long, plus a paragraph for the abstract.

Don’t take me wrong. The book does make sense and it offers good advice and a quick overview of the trends and key terminology of big data analytics, but it feels just like a sketch, a book outline more than a proper book.

Book critique apart, I will continue reading Corea on Medium. I’m curious to see where he is going, since I perceive a certain strong ambition to become a key thought leader in this area. But the road is still long.

Links of the week

Balda_200608_Trek_156.jpgSunset on the north face of the Cima del Cantun (m. 3354) reflecting on a small lake, Val Bregaglia, Switzerland.

How could we do this? – SUM
A concise summary of Yuval Harari’s Sapiens, which I read last year and left me profoundly impressed about our history.

Deep, Deep Trouble – Micheal Elad
Elad reflects on the impact of deep learning on image processing. Should we throw away rigorous mathematical models for the improved, but black-box, performance of deep learning?

Can the brain do back-propagation? – Geoffrey Hinton
A seminar from last year by Geoffrey Hinton at Stanford on why he thinks that the brain can actually do back-propagation, addressing four obstacles raised by neuroscientists.

A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN – Dhruv Parthasarathy
Well written post about the development from AlexNet to Mask R-CNN for pixel-level image segmentation.

Should You Listen to Music While Studying, The Pi Model and Learning How to Learn w/ Dr. Barbara Oakley – Scott Young
Interesting 20mins conversation about learning techniques and tips.

Escaping The 24-hour Dystopia – Unlimited
“Busyiness has become a global cult”. We cannot keep pace with the online onslaught of information. What’s the cure? This article overviews some technological solutions: brain enhancement, supersonic travel, Neuralink and others. My take is that we must first consider behavioral solutions instead.

Understanding deep learning requires rethinking generalization

Understanding deep learning requires rethinking generalization.png

Zhang et al have written a splendid concise paper that shows how neural networks, even of depth 2, can easily fit random labels from random data.

Furthermore, from their experiments with Inception-like architectures they observe that:

  1. The effective capacity of neural networks is large enough for a brute force memorization of the entire dataset.
  2. Even optimization on random labels remains easy. In fact, training time increases only by a small constant factor compare with training on the true labels.
  3. Randomizing labels is solely a data transformation, leaving all other properties of the learning problem unchanged.

The authors also show that standard generalization theories, such as VC dimension, Rademacher complexity and uniform stability, cannot explain while networks that have the capacity to memorize the entire dataset still can generalize well.

“Explicit regularization may improve performance, but is neither necessary or by itself sufficient for controlling generalization error.”

This paper is one of those rare ones, that in a crystalline way shows our ignorance.


Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. We interpret our experimental findings by comparison with traditional models.

Links of the week

Arches onto high cliff over the Mediterranean. Portovenere, Italy.

Deep Habits: The Importance of Planning Every Minute of Your Work Day – Study Hacks
How to increase your productivity by taking control of your time via time blocking.

Chaos, Ignorance and Newton’s Great Puzzle – Scott Young
Luck, chaos or ignorance? Understanding this mixture for your projects may help to better allocate resources.

Garry Kasparov on AI, Chess, and the Future of Creativity – Mercatus Center
A very interesting conversation with Garry Kasparov on chess, AI, Russian politics, education and creativity.

If everything is measured, can we still see one another as equals? – Justice Everywhere
The dangers of measuring everything and ranking ourselves on different scales, neglecting those human skills and experiences that cannot and should not quantified.

Failures of Gradient-Based Deep Learning


A very informative article by Shalev-Shwartz, Shamir and Shammah about critical problems faced when solving some simple problems via neural networks trained with gradient-based methods. Find the article here.

In recent years, Deep Learning has become the go-to solution for a broad range of applications, often outperforming state-of-the-art. However, it is important, for both theoreticians and practitioners, to gain a deeper understanding of the difficulties and limitations associated with common approaches and algorithms. We describe four types of simple problems, for which the gradient-based algorithms commonly used in deep learning either fail or suffer from significant difficulties. We illustrate the failures through practical experiments, and provide theoretical insights explaining their source, and how they might be remedied.