2018, The Summary

Self-education & side-projects overview

This post is part of a chain of yearly summaries of noteworthy self-education and side-project work. I'm publishing these overviews to provide pointers to interesting resources, to show I'm serious about continuous self-improvement, and to inspire others to have fun with new technologies.

Previous year: 2017 <---> Next year: 2019 (upcoming)

2018

At the beginning of the year, I moved teams at Amazon to the Alexa Smart Home SLU team. I spent some time ramping up on the inner workings and workflows for building voice interfaces at Alexa, and also started reading NLP (natural language processing) papers again. It's a lot of fun to get back to NLP, and having lots of flashbacks to my bachelors in AI.

Besides work-related resources, I also looked into some data engineering tools (e.g., Hadoop), did some tinkering at my own "Smart" Home - are 6 smart speakers on 45 m^2 too many? - and got quite a lot of reading done.

Alexa ring animations

As a small, fun project, I wrote a small Arduino sketch to simulate the various animations used on Alexa devices' LED rings. You can now buy small WS2812B LED rings and attach Alexa's characteristic funky LEDs to any smart (?) device. At the same time, I also flashed my physical development Echo to show rainbow LED animations. Because, why not.

OpenFrameworks & Deep Face

I built an artsy, physical interface showing an eye as face for our Smart Home system. The interface is a tribute to A Space Odyssey's HAL 9000. It has a single red eye, made out of of thousands of tiny particles. The particles react to events in the home, change their colour to mimic lights in the house, react to touch and sound, and generally just look like a pretty creepy eye, staring at you.

I built the project with OpenFrameworks, a creative coding framework written in C++, which has easy bindings with OpenGL, OpenCV, OSC, and many other C++ libraries. This project was an excuse to dive into OpenFrameworks, read its documentation, and generally just to experiment and have fun with a bunch of libraries interfacing with both hardware and software.

I wrote more about this project on this page. You can also read my overview of creative coding tools here.

Hadoop

I've used Hadoop for searching logs at Amazon, but wanted to know more about the inner workings of Hadoop itself. I followed this coursera course, which gave an overview of the basics of HDFS, YARN, the MapReduce framework, and Spark. It wasn't very practical, though, so I also worked through this udacity course to run some more examples. I also read a large chunk of Hadoop - The Definitive Guide, which actually was pretty good and included details on the (Java) API for various components. After that, it was just a matter of picking large datasets and run some basic analyses on a large AWS EMR cluster.

The whole Hadoop ecosystem is a bit overwhelming, with many choices of additional components to run beyond Hadoop itself. However, almost all components are extensible (in Java), and there are plenty of options when the MapReduce paradigm is too primitive for your problem.

Data Science command line tools & shells

Spending a large chunk of your day running data science tools at the command line helps strengthening your command line reflexes, even if you have been an avid Linux user for over a decade. Additionally, I found some new tools and ideas in the book Data Science at the Command Line, although it's quite basic and mostly about simple data manipulations (and ASCII graphs!). It motivated me to have another look at my command line setup. I switched from the Fish shell to zsh, with some powerful plugins and better Bash compatibility. This zsh documentation, written in 1995, is great.

I also played a few days with Microsoft's PowerShell - which actually runs on Linux - and found some interesting ideas on how to take the strong UNIX basics to the next step in the PowerShell for Developers book - such as piping objects instead of strings, and having the shell behave more like a REPL (e.g., more advanced programming) enhanced with UNIX pipes.

Cryptocurrencies course

If you've read Satoshi's bitcoin paper, you can skip the first few weeks of this Bitcoin and Cryptocurrencies course. After that, however, the course discusses some interesting topics, such as the economics from the miners' perspective, community politics, and many interesting - and non-obvious - applications that can be built with smart contracts (e.g., scripting languages on blockchains).

Blockchains are a practical solution for a theoretically unsolvable problem - the Byzantine Generals Problem: reaching distributed consensus between parties that don't trust each other. It's such a general problem that there is a surprisingly wide range of applications for blockchains. Cryptocurrencies might have lost a lot of value in 2018; the technology itself still has much potential.

Woodworking: Patio Bench

I made a wooden bench for our large patio. I made the design on-the-go and, after a few additional trips to Home Depot, was pretty happy with the result. The tilted design made it tricky to fit all pieces together and some iterations were required, but the result feels sturdy and is tilted just right. See the photos in woodworking.

Recommended Readings

Here are some books I read this year that I would recommend.

Where Wizards Stay Up Late - Katie Hafner & Matthew Lyon

A lively description of the origins of the internet. This books tells the story of the ARPA-NET, the successful predecessor of the internet and one of the first large (heterogeneous) networks of computers. From the very first experiments linking universities together over telephone lines, the development of the first routers (IMPs) by BBN, the first protocols (telnet, FTP, later SMTP), the first RFCs (hard-typed and send by physical mail), the surge of this thing called e-mail, and the countless individuals that made it all possible. This true classic spans developments from the late 50s to the late 80s, when the ARPA-NET was gradually turned off and the first HTTP server was connected to the TCP/IP network called the Internet.

Building Microservices - Sam Newman

A great overview of practical considerations to make while building microservices, or evolving existing monoliths to a more decoupled architecture. There is an emphasis on practical considerations, and the book isn't your average, well structured textbook with definite rules to follow. Instead, it lays down multiple options to consider, and recognises most systems evolve rather than be designed completely up front without room for changes. Includes sections on cross-service authentication and authorisation, testing and deploying, separation of concerns, interfaces, sharing data, and organisational considerations.

Coders at Work - Peter Seibel

If you like reading reading light interviews with famous coders, this is a book for you. The author sat down with fifteen well-known computer scientists (or should I say engineers?) and just started talking about their career, technical interests, design of languages, and tips for fellow coders. Although there's a slight bias in the selection (American, studied at MIT or Stanford, language designers, 50+), the interviews are often interesting to read, and each interviewee gives his or her own view on the field of computer engineering. It gives a human touch to the famous names you always looked up to. Nice coffee table book for the occasional read.

Storytelling with Data - Knaflic

This is an introductory book on data visualisations. It was less technical than I expected, but is a nice primer on how to make convincing arguments with pretty graphs. It's easy to recognise good graphs, but making them yourself requires some thought and practice. Read as primer on The Functional Art (next on my list).

Zero Bugs - Kate Thompson

This book was a delightful find in a bookshelf at the office. Zero Bugs is a short book with lots of examples on beautiful code, code as communication channel to other developers, tips on testing, and keeping it simple.

Enlightenment Now - Steven Pinker

In a world filled with negative news of terrorist attacks, immigration issues, horrible accidents, and political disagreements, it becomes hard to be optimistic about the futrue. This negativity, Pinker argues, is wholly unjustified. The world has never been in a better state than it is right now. Almost anything you can measure has consistently been getting better over the last 100 years. Fatalities, child mortality and criminality are going down. People live longer, healthier lives and are wealthier than ever before, even the lower quadrants and in the developing countries (although the United States is a lagger by most metrics, compared to western countries).

Does that mean nothing goes wrong? No, it doesn't. But the general trends are undeniable: we're making significant progress on almost all fronts, and should celebrate those advances more while turning down the negativity. This book contains enough graphs and arguments to at least counter part of that negativity. In fact, it contains a bit too much, and might have been a bit shorter.

The 5 Choices

A readable, no-nonsense book on productivity. It contains to-the-point tips on how to become more reactive, focus on and schedule time for large tasks first, some (obvious) notes on how to get less distracted by the internet/email, and energy management. These are all things you already know, but it's good to be reminded sometimes.

The Everything Store - Brad Stone

Being an Amazonian, I couldn't not have read this book. I was a bit afraid of too much negativity, but was pleasantly surprised by the interesting stories from the early days at Amazon. The first part of the book definitely has that startup vibe one would expect for any success story. The early days have lively descriptions of key figures in Amazon's history, and many flashbacks to the early days of the world wide web. Inevitably, later parts of the book highlight less flattering tactics executed by the by then mighty Amazon, and the prose moves from individual contributors to large-scale corporate strategies. Overall an interesting read of the origins of the largest Everything store (e.g., selling the long tail) to date. Since it came out even before the launch of Alexa and mostly concerns the retail and Kindle stories, I hope we'll see a sequel at some point.

Alibaba: The House that Jack Ma Built

In many ways the counterpart of Amazon, Alibaba grew to be one of the largest retail websites and best-visited websites in the world. Jack Ma, being an early believer in the Internet and having visited early internet companies in the US, managed to build a gigantic empire of ecommerce websites (Alibaba, Taobao) and services (Alipay), in a way that made a lot of sense in the Chinese market. This book tells the story of the humble teacher that became to be one of the most successful entrepreneurs. A nice addition to the large number of existing western startup stories.

The Hard Thing About Hard Things - Ben Horowitz

The message of this book is easy: the life of a startup founder is not easy at all. In fact, the rosy stories you read in the news are just the happy tip of an enormous iceberg of shit. This book mostly describes the startup career of the author, but many great insights are being told on the way. They will be most useful for experienced founders in "grown-ups": dilemmas coming up when your growth curve reaches a plateau and difficult decisions need to be made. But don't expect to find any shortcuts in this book - because in startup life, there are no shortcuts.

Everybody Lies - Seth Stephens-Davidowitz

A well known problem in social sciences is the fact that surveying people never really gets the full picture of how people really behave. When asked - even with semi-anonymous surveys - they still bend the truth, both for themselves and for the researchers, about their own behaviour. They either don't fully realise or don't want to admit their deepest desires, beliefs, or questions they may ask themselves when alone. What's more, getting a representative sample for any inquiry is practically impossibly.

In practice, this leads researchers to apply all kinds of "corrections" to their research data, in order to "normalise" the sample and get reliable answers. However, there is another way. In this book, the author and ex Googler uses search data and other 'raw' datasets of online behaviour to infer a different view of the human psyche. There are a few insights here, sometimes funny but also disturbing anecdotes of what people search for when they're alone. The author IMHO goes a bit too far in deriving conclusions from this data. But while there is certainly still bias in search traffic (which isn't really acknowledged in the book), it might become an important research tool, and we'll certainly see more data-driven research into actual behaviour while we keep rising the amounts of data we produce on a daily basis.

 

        -------------------------------------
        |                                   |
        |           related graph           |
        |          /             \          |
        |       graphs         relations    |
        |                                   |
        -------------------------------------