2020, The Summary
This post is part of a chain of yearly summaries of noteworthy side-projects and self-education. I'm publishing these overviews to provide pointers to interesting resources, to show I'm serious about continuous self-improvement, and to inspire others to have fun with new technologies.
I won't be the only one looking back at 2020 as an unusual year. After a good start with a couple of personal highlights, soon we found ourselves locked up at home with the world in pandemic mode. During these spring months, I got quite a lot done in terms of side-projects, including some long-standing projects from previous years. But by the end of the year, corona fatigue had built up to have a significant impact on overall motivation, slowing down work as well as my side-project roadmap. Overall still a good amount of progress though in a diversity of subjects. Enjoy!
I created a github repo containing computer vision scripts, library wrappers, and some utilities for image manipulation. The goal is to have a set of common components to reuse in multiple computer vision projects, both software and hardware devices. I plan on extending this "computer vision workbench" over time. It should also speed up a couple of hack projects which are on my mind. Follow progress on the project page.
CV papers and courses
In continuation of last year, I read a couple dozen more computer vision papers. There is a lot going on in the field right now, so I'm trying to catch up one area at a time. Next year I'll also have a bunch of deep dives into other subjects (incl. 3D reconstruction and reinforcement learning). With model zoos added to the cvlab repo from the previous section, I could quickly try out the papers that were available as pre-trained models. It's always refreshing to see models perform (and often, fail) in real life, to calibrate expectations with the lofty results sections in papers.
In addition to the papers, I decided to watch the lectures of two recent computer vision courses, to see how all the fancy deep learning advances are being taught to contemporary students. Stanford's CNNs for Visual Recognition course is a great overview of deep learning papers in a couple of CV tasks. This Ancient Secrets of CV course (YT mirror) - by the author of YOLO - is more basic, but does actually take the viewer all the way from classical CV till popular CNN architectures. And it's presented less dry.
ML in practice & in production
Bringing machine learning from theory into production is becoming a new expertise, with job titles like ML engineer and MLOps. I wanted to get a better feel of what other companies (than Amazon) are doing, so I read a couple of resources that describe 1) how to go about actually doing ML in real, often ill-defined company projects, and 2) how to bring them into production.
On 1) I found some insights into Ng's short ML Yearning book, and in mini-courses from facebook and Google. On 2) I read a couple of blog posts from tech corporates describing their internal infrastructure, and looked at AWS services for ML (like Sagemaker). However, these are vendor-specific, so next year I want to look more extensively into open-source "ML lifecycle" frameworks like MLFlow and TFX.
While setting up serverless services at work, I realised the number of AWS services is growing faster than the number of COVID-19 infections, and I need to catch up. I followed the acloudguru course for Cloud Practitioner, which gives a survey of many services, but doesn't provide a lot of depth. So I made a list of 20 services to dive deeper into, and worked through scattered tutorials to build some prototypes. Coupled with infrastructure setup challenges at work, this gave a good basis to start sampling from the other 80% of the AWS offerings.
I also watched the acloudguru Big Data Specialty course. It gives a little more depth, especially on Redshift table design and on EMR/Hadoop components. What's left on my wishlist is the ML course - or maybe just a SageMaker course and trying the higher-level CV+NLP services.
Spark intro resources
I've set up a data processing service built with Spark as its core last year. However, to better understand the inner workings and origins of Spark, I read a couple of additional resources. The two original Spark papers are actually a great and very concise overview. I also read a significant part of the Definitive Guide to Apache Spark.
Finally, I played with the ML(Lib) and the GraphX libraries. The problem with the ML library is that algorithms in existing libraries need to be reimplemented into Spark primitives, but industry and academia are rallying around existing hugely popular (C++/Python) libraries, especially in the case of deep nets. The MLLib data cleanup and transformations utilities seem useful though, because they can be used to (pre)process large amounts of training and live data in parallel. And there are some efforts to use Spark for custom-built functions (UDF or Python code) to run existing ML libraries in parallel on batches, such as distributed SGD on models implemented in PyTorch/TF/MXNet.
7 Databases in 7 Weeks
The NoSQL movement has been rapidly growing over the last decade, and this book tries to give a quick intro in 7 databases, with non-trivial examples for each. I read this book as a sequel of the 7 Languages in 7 Weeks, and found it just as interesting. Likewise, this book also took me multiple years (so far for the 7 weeks promise) as it got moved back to the backlog multiple times. But it was worth it: the databases have very different data models, and handle distributed setups in interesting ways.
The world of databases moves quickly, and many of the examples were out of date once I tried them. So I created a repository with Dockerised examples for each database, with fixed versions so they'll keep working in the future. The book also has a second edition now, which replaces Riak with AWS DynamoDB, showing the increased importance of cloud databases these days.
Java SE 8 - Cay Horstmann
I read and prototyped more on various Java libraries and frameworks, but I won't bore you with the details, since it was indeed a bit boring.
How else to start a pandemic than with courses on epidemics & pandemics? I watched this one, and the longer course from Hong Kong uni, both recorded before the current pandemic. I learned about zoonoses, existing disease surveillance including sequencing to construct transmission trees, local and global outbreak measures that are available (lockdowns apparently weren't a thing yet), examples of recent epidemics from new infectious pathogens, outbreak prediction modelling, the WHO, local epidemiologic laws, public communication strategies in the face of incomplete and ongoing research, and vaccine development.
It was great to get an overview of a field that, to me, seems like an active field of research as well as a known global risk that a couple of organisations actively prepared for, although with (far) to few resources. Watching an epidemics course could have helped anyone in 2020, including journalists, who seemed to be in constant confusion over what was happening for most of the year (sorry, I've just been so annoyed by the coverage and constant complaining by everyone all year long).
I got interested in the international network of diplomatic channels. Often invisible to the public, but used as a solid backbone for discussions between countries, this seems like an important asset of a globalising world. It fosters building communities among countries with different interests, and even allows progress between countries that seem completely at odds with each other in public perception. Diplomacy is the pragmatic counterpart of the often polarising and populist politics that seems to be on the rise.
The courses I could find were quite lightweight. This course has a series of short clips by diplomats on general diplomacy. This UN course is short on videos as well, but contains interesting reads about the League of Nations and the UN, its structure, and the challenges of international collaboration (such as vetoes). And finally, a course on the changing global order on, you guessed it, upcoming global powers, as well as more details on UN institutions and more regional blocs of countries, such as the European Union and a surprising number of South American unions. Much of the content was sort of common knowledge, but there were pieces I didn't know (or maybe forgot).
Here are a couple of books I read this year and that I can recommend.
Four-hour work week - Tim Ferriss
Since reading this book, I've been thinking about it more than I expected. Tim Ferriss might look like a self-promoting over-confident lifestyle coach who got lucky, and he is, but there are a bunch of thought-provoking ideas in this book on how to organise life. I'm probably doing it all wrong.
Even if you don't want to set up an online reselling business and travel the world for the rest of your life, there are nonetheless plenty of aspects of his thesis that might be worth thinking about, and they might seem less far-fetched after reading his arguments. Think mini-retirements, working from home or elsewhere (e.g., 'digital nomad'), especially in cheaper countries to increase relative income, eliminating unnecessary tasks, outsourcing necessary tasks, working out other unrealistic-sounding goals, or even setting up a low-involvement income-generating side business. These things are still hard, but not impossible.
Remote - Fried & Heinemeier Hanssen
This book was good, but not excellent. Like other books from this duo, they contain a lot of short, but very opinionated, essays on how you should go about doing your work. In this case it contains a few gems on how to hire, collaborate, socialise, and get shit done from a distance. It's good to be aware of strategies to mitigate some of the pitfalls of remote working, but overall I found that this book contains pretty straightforward advice and, as said, is often too opinionated. Different teams work differently, so try to find a way that works for you and your team. Still worth scrolling through when you're considering (or be forced to) work from a distance.
A Crack in Creation - Jennifer Doudna
A fascinating research area that offers plenty of optimism for new breakthroughs in treatments and in manipulating genomes for various other reasons. I can also see how CRISPR and related techniques make biology more and more interesting to computer scientists (i.e., bioinformatics), as the code of life becomes code to manipulate. The book could have been written with more technical details and less anecdotes, but it was interesting overall and a welcome distraction from my regular software engineering reading list. I read the book before the pandemic, but it turns out CRISPR was used this year to develop COVID-19 detection tests, and the author also received the 2020 Nobel prize.
Ghost in the Wires - Kevin Mitnick
If you've read more of my yearly reviews, you know I'm a big fan of hacker stories. This book is another gem. It's also the last book on my list of Mitnick classics. This book tells the autobiographical story of one of the most (in)famous hackers, and covers his youth as well as his fugitive and often hilarious journey trying to stay out of the FBI's reach. There are a lot of details in the book on hacks and clever tricks he used. Since these adventures have been a while ago, it offers historic insights into the early world of phone hacking. Recommended!
Amsterdam - Russell Shorto
A fun read and a complete history of the city of Amsterdam, from the Golden Age built on trade til the aftermath of the second world war. There are upbeat stories on the rise of liberalism and tolerance, and there are plenty of fun-facts I didn't yet know about my favourite city.
The Shortest History of Europe - John Hirst
Short, humoristic, and complete.