Book review: Big Data Analytics: A Management Perspective by Francesco Corea

Francesco_Corea_Big_Data_Analytics_A_Management_Perspective

I stumbled upon Francesco Corea’s writings on Medium and I started following his posts about Data Science and AI strategy. They are concise, clear and no-nonsense. Intrigued, I plunged into his book. To my disappointment. Let me explain.

Blog posts such as his are compelling exactly due to their straight statements, clarity and conciseness.  One does not expect a thorough treatment of the subject matter, but a precise statement of opinion.

A book is a different story. It offers the space and time to delve deeper into the subject, provide proper arguments and evidence, illustrate through the use of a multitude of real examples. All of this lacks from Corea’s Big Data Analytics: A Management Perspective. Indeed, it is only 48 pages long. The penultimate chapter titled “Where are we going? The path toward an artificial intelligence” is four-paragraph long, plus a paragraph for the abstract.

Don’t take me wrong. The book does make sense and it offers good advice and a quick overview of the trends and key terminology of big data analytics, but it feels just like a sketch, a book outline more than a proper book.

Book critique apart, I will continue reading Corea on Medium. I’m curious to see where he is going, since I perceive a certain strong ambition to become a key thought leader in this area. But the road is still long.

Links of the week

Balda_200608_Trek_156.jpgSunset on the north face of the Cima del Cantun (m. 3354) reflecting on a small lake, Val Bregaglia, Switzerland.

How could we do this? – SUM
A concise summary of Yuval Harari’s Sapiens, which I read last year and left me profoundly impressed about our history.

Deep, Deep Trouble – Micheal Elad
Elad reflects on the impact of deep learning on image processing. Should we throw away rigorous mathematical models for the improved, but black-box, performance of deep learning?

Can the brain do back-propagation? – Geoffrey Hinton
A seminar from last year by Geoffrey Hinton at Stanford on why he thinks that the brain can actually do back-propagation, addressing four obstacles raised by neuroscientists.

A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN – Dhruv Parthasarathy
Well written post about the development from AlexNet to Mask R-CNN for pixel-level image segmentation.

Should You Listen to Music While Studying, The Pi Model and Learning How to Learn w/ Dr. Barbara Oakley – Scott Young
Interesting 20mins conversation about learning techniques and tips.

Escaping The 24-hour Dystopia – Unlimited
“Busyiness has become a global cult”. We cannot keep pace with the online onslaught of information. What’s the cure? This article overviews some technological solutions: brain enhancement, supersonic travel, Neuralink and others. My take is that we must first consider behavioral solutions instead.

Understanding deep learning requires rethinking generalization

Understanding deep learning requires rethinking generalization.png

Zhang et al have written a splendid concise paper that shows how neural networks, even of depth 2, can easily fit random labels from random data.

Furthermore, from their experiments with Inception-like architectures they observe that:

  1. The effective capacity of neural networks is large enough for a brute force memorization of the entire dataset.
  2. Even optimization on random labels remains easy. In fact, training time increases only by a small constant factor compare with training on the true labels.
  3. Randomizing labels is solely a data transformation, leaving all other properties of the learning problem unchanged.

The authors also show that standard generalization theories, such as VC dimension, Rademacher complexity and uniform stability, cannot explain while networks that have the capacity to memorize the entire dataset still can generalize well.

“Explicit regularization may improve performance, but is neither necessary or by itself sufficient for controlling generalization error.”

This paper is one of those rare ones, that in a crystalline way shows our ignorance.

Abstract

Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. We interpret our experimental findings by comparison with traditional models.

Links of the week

Balda_P0030.jpg
Arches onto high cliff over the Mediterranean. Portovenere, Italy.

Deep Habits: The Importance of Planning Every Minute of Your Work Day – Study Hacks
How to increase your productivity by taking control of your time via time blocking.

Chaos, Ignorance and Newton’s Great Puzzle – Scott Young
Luck, chaos or ignorance? Understanding this mixture for your projects may help to better allocate resources.

Garry Kasparov on AI, Chess, and the Future of Creativity – Mercatus Center
A very interesting conversation with Garry Kasparov on chess, AI, Russian politics, education and creativity.

If everything is measured, can we still see one another as equals? – Justice Everywhere
The dangers of measuring everything and ranking ourselves on different scales, neglecting those human skills and experiences that cannot and should not quantified.

Failures of Gradient-Based Deep Learning

limitations_of_gradient-based_learning

A very informative article by Shalev-Shwartz, Shamir and Shammah about critical problems faced when solving some simple problems via neural networks trained with gradient-based methods. Find the article here.

Abstract
In recent years, Deep Learning has become the go-to solution for a broad range of applications, often outperforming state-of-the-art. However, it is important, for both theoreticians and practitioners, to gain a deeper understanding of the difficulties and limitations associated with common approaches and algorithms. We describe four types of simple problems, for which the gradient-based algorithms commonly used in deep learning either fail or suffer from significant difficulties. We illustrate the failures through practical experiments, and provide theoretical insights explaining their source, and how they might be remedied.

Links of the week

Close-up of a gall on oak leaf.Close-up of a gall on oak leaf.

The Attention Paradox: Winning By Slowing Down – Unlimited
Time and attention are limited resources that most cognitive workers waste in unnecessary behaviour. Some useful advice on how to think about cognitive resources and plan your working day accordingly.

The Problem of Happiness – Scott Young
Have we evolved to be unhappy? What are the pros and cons of some of the proposed solutions to be happier? Read this concise summary to know more.

The Dark Secret at the Heart of AI – MIT Technology Review
Machine learning and, in particular deep learning, are notoriously inscrutable. This may be an issue in deploying them to mission critical applications, such as health care and military. But are humans much more transparent? Or are they just capable of providing ad-hoc a-posteriori explanations?

Academia to Data Science – Airbnb
Some insights on how to shift from academia to industry from the perspective of Airbnb.

Scaling Knowledge at Airbnb – Airbnb
How does a company effectively disseminate new knowledge across their teams. Airbnb proposes and open-sources the Knowledge Repository to facilitate this process across their data teams.

 

Book review: The Trails Less Travelled by Avay Shukla

I’ve always dreamed to hike the great Himalayas, but never made a concrete step in this direction. A year and half ago, in between jobs, I was truly thinking of going there, but then a good job offer came in the way. However, I’d been talking so much about it, that my partner decided to give The Trails Less Travelled by Avay Shukla to me as a Christmas gift. It sat on the book shelf for a bit more than a year, before I finally decided to open it…

The book describes several treks in the Himachal Himalayas, in the Northwestern Indian state of Himachal Pradesh. This mountain ranges also includes the Great Himalayan National Park, established in 1984, which covers an area of more than 1100 square km at an altitude between 1500m and 6000m. In June 2014, the park was added to the UNESCO list of World Heritage Sites.

The author belongs to the Indian Administrative Service and has served in Himachal Pradesh for 30 years. His reports from the remote valleys of Himachal contain both awe-inspired natural descriptions, but also poignant reminders of how the encroaching economic development may soon destroy these natural beauties. He does not refrain from criticizing his own employer, the government, for his lack of action to better preserve these unique valleys, but also to offer the local communities support for a more and more difficult way of life.

The region is full of culture, natural diversity, rich ecosystems and varying landscapes, from the jungle forests of the lower altitudes to the high pastures to the barren glacial terrains. The treks described in the book require strength, endurance, perseverance and some technical skills, as they often have to negotiate deep gorges, boulder-strewn river beds and glacier crossings. But they also offers plenty of rewards, from crystalline lakes to rare wildlife sightings to small temples found in the most remote of passes.

On one hand, I would like to immediately go and venture in Himachal, on the other, I’m afraid that some of these treks would be unrecognizable 10 years after the author walked them. It’s yet another reminder that if we want to preserve these natural wonders for the future generations, we have little time to act and a lot to do.

TheTrailsLessTravelled