2021, The Summary
This post is part of an ongoing series of yearly summaries with noteworthy side-projects and self-education. It contains short descriptions of projects and pointers to interesting resources. These posts aim to show I'm serious about continuous self-improvement and to inspire others to have fun with new technologies.
This year marked the second year of the covid-19 pandemic, affecting how we work (from home) and how we spend our free time (at home). On top of that, two other factors affected my year.
Firstly, the final construction work of our new apartment. I'm pretty happy we could finally move in, but the process took more time and energy than anticipated. It also meant I didn't have a stable project space for a while.
Secondly, I changed teams at work. My new team is working on a variety of projects in the space of image quality and computer vision. For an Amazon retail team, the team has quite a bit of freedom, and the potential to work on innovative projects. I'm currently leading the design and development of a new internal computer vision service. The service enables ML/CV scientists to build computer vision applications tailored to retail images.
Barcode scanning shopping list
I made a groceries shopping list based on a barcode scanner. When a product is running out, I scan its barcode before tossing away the packaging. This makes it very easy to reorder groceries we need regularly or like to keep stocked. I also attached a list of barcodes for common products next to the scanner.
3d printer fixed
I still had an Ultimaker Original laying around from my time at Zazzy, but it needed repairs and something to do. I bought a replacement nozzle and some colourful filament and got tinkering. Surprisingly, the decade-old printer started whirring like it's 2011 again, and started producing printed parts of reasonable quality. For a friend, I printed a holder for an Xbox controller. For myself, I printed parts for a robotics project on which I'm planning to write more next year.
Raspberry pi Pico
The Raspberry Pi Foundation released a microcontroller called Pico. I got a few of those, accompanied by some shields, and tinkered around with them. I built an LED strip controller and a tiny screen with temperature-affected animations. Pretty cool, and "made in Europe", although the equally cheap ESP32 microcontrollers are similar. Also, I couldn't find ultra-cheap cameras for the Pico, at least not at the price point of an ESP32-CAM.
ML @ AWS course & resources
Last year I worked through some resources on data engineering on AWS. This year I focussed on developing ML applications on AWS. I worked through the AWS Machine Learning specialty course from A Cloud Guru. The course gives a good overview of some of the machine learning services on AWS, as well as on SageMaker. There was also a lot of general boilerplate information about ML projects.
I also followed a longer internal course on SageMaker and did tutorials on specific services like the SageMaker FeatureStore. By now, SageMaker is a large ecosystem supporting many different use cases. You can do anything from premade hosted models to bringing and training and serving your own. It's great that AWS has such a large ecosystem of ML tools. But most of it is proprietary and there is little integration with upcoming "ML lifecycle" and deployment tools, such as MLFlow or KubeFlow. Bringing ML into production is still messy.
MLOps course - Amazon MLU
Learning about ML services on AWS is useful if you're concerned with designing ML applications in production. But the full lifecycle of a new ML application involves many more aspects, and roles. Large companies, like Amazon, have distinct roles for different stages of development. You might need a Data Engineer to set up infrastructure to collect and query the right data. A Data/Applied Scientist may then perform experiments, train models, and sometimes develop prototypes. Finally, Software/Fullstack/DevOps Engineers may be needed to develop a production-ready system.
This (internal) course tried to bridge the gap between those different role types. It covered the full "lifecycle" of ML projects. This includes gathering data from multiple internal sources, performing model experiments, bringing stuff into production, and monitoring ML systems over time. It's good to be aware of all steps in the development process, and it was interesting to hear tips from people from different backgrounds and with different priorities.
GraphML course - Amazon MLU
There is a recent hype around machine learning for graphs, mainly driven by developments in graph deep learning. Graphs are everywhere. Knowledge representations are graphs. Molecules are graphs. Images are (very regular) graphs. Most database schemes are (regular) graphs. And with graph neural networks we can perform a variety of tasks on them, including recommendations, similarity, and classification of nodes or full graphs.
This (internal) course covered recent developments in this space. We went over important papers and played with two popular libraries (DGL and PyG). Most networks propagate some initial node representations to nearby nodes, up to a few hops. The final network contains better node representations that encode neighbourhood information. There are a bunch of techniques for scaling up to huge graphs, although it's mostly sampling. There's certainly more work to do here. Nonetheless, this course was a cool addition to the graph resources I looked into last year when diving into Neo4J and Spark.
Retro gaming + the game console
The closest I got to a pandemic hobby was playing retro games. When my parents moved, I found a box with my old Nintendo consoles in the attic. Surprisingly, they all still worked. I also got the book The Game Console by Evan Amos. It contains pretty pictures of lots of retro gaming consoles. I started playing some classics and bought some others that I missed over the years. I replaced cartridge batteries, some still working since the 90s. I got a Nintendo DS and a Nintendo Switch with more recent games. A thriving community sells old games, retro devices, and new consoles for playing or emulating old games. I don't know if this new hobby will stick, but I enjoyed it while locked up at home.
Here are a couple of books I read this year that I recommend.
The Black Swan - Nassim Taleb
Humans have the urge to predict the future. However, there are many highly improbable events with outsized impact. These events are largely unpredictable but may affect the bottom line so much that any prediction is rendered void. This book is about those "black swan" events. It covers many of the biases we have that make our behaviour irrational. Those biases also make us very bad predictors, even when we know about them. To make matters worse, modern society moved many domains we run predictions on from small-scale to scalable. In those scalable domains, outliers with extreme impact become more common, and more impactful. The world is much more random than we think, and there isn't much we can do about it.
I enjoyed this book, as it was much in the same spirit as Thinking Fast and Slow by Kahneman. However, the more I learn about biases affecting our thinking, the less certain I become about the state of human knowledge. And about the systems we have to update our knowledge (such as the media and even scientific research). The book is also a bit anecdotal.
Hyperfocus - Chris Bailey
It's hard to focus in a world that keeps adding distractions. This book aims to give some strategies to gain back some focus. The main strategy is to learn to manage your attention, as this is one of our scarce resources. With practice, we can increase our attentional space available during focus. Inside this attentional space, we should keep the intention of the task at hand, and fill up the remaining space with the real task. Preventing distractions has two sides: preventing external distractions and preventing internal ones. External distractions can largely be dealt with up-front (disable notifications, challenge meetings). But preventing yourself from distracting you needs practice (meditation, mindfulness, increasing focus duration).
There's a second part of the book dealing with "scatterfocus". This is defined as creative thinking: still thinking intentionally, but letting your mind wander around instead of focussing on one particular task. This is where innovation happens. Overall some nice tips, but this part was a bit more hand-wavy (less focussed?) than the hyperfocus part. Good book for a rainy afternoon.
Permanent Record - Edward Snowden
Edward Snowden showed the world what a modern intelligence agency is capable of, and the world was shocked. The US intelligence community (e.g., CIA and NSA) built far-reaching capabilities for mass surveillance, and still uses them. There is also no reason to believe other (powerful) countries are less ambitious. Except, for historical reasons the US still holds outsized power over the internet.
This book tells Snowden's story in his own words. It's a nice companion to No Place to Hide by Greenwald. There are no big new revelations in Snowden's book. The first half covers his upbringing and is probably meant to show normal Americans that he has had a normal childhood in a normal American suburb. The second half reads more like a thriller and covers his time at the CIA and NSA. It covers his growing concerns about the surveillance capabilities, the smuggling out of top-secret information, and the reporting of those capabilities.
De Wereld van de Stad
Urban city planning affects how we live in cities and how they grow over time. This short Dutch book contains articles on a variety of aspects of urban city planning. There are many different visions of what makes a city 'liveable', and those visions also change over time. This book is written from a European (and Dutch) perspective. It already incorporates the need for bike lanes and good public transport, so it can go a bit deeper into less indisputable aspects of city planning, including economic, health, and sustainability factors.