Once we get the DHCP part set up, here comes the exciting part - to actually install the Linux distro and boot the Pi using it instead of the vanilla 32-bit Raspberry Pi OS.

Now our Pi is still stuck at the initial screen, but don’t worry, as long as it is trying to read the tftp path we have made progress.

Enable TFTP on the Server Side

Here I will be using the Synology NAS server as the file system server.

Synology offers tftp support in file sharing directly. On the other hand, it’s also fairly straightforward to set up a TFTP server using a Linux server. One thing important is to remember the tftp root directory. That will be used in the next step to copy over the boot files to.

Under that directory create the Pi corresponding directory like

<tftp root>/123456789

123456789 refers to last 9 chars of serial no. Pi’s bootloader by default will load from there.

Create the TFTP boot directory

We need to first grab the Linux distro from an image.

Here I picked Ubuntu 20.04 LTS which is “officially” supported. Technically this is probably not the best option as the OS itself includes a lot of things that we may not actually need and the size is pretty bulky. On the other end the Alpine one is very skinny and lacks some of the tooling we may need initially. For raspberry, unfortunately each distro may be slightly different (in terms of boot) so while you are free to choose whatever you want, YMMV.

Flash the image into the SD card and we’ll get 2 partitions - boot and root.

In general, it’s a good idea to first boot the system using the SD card just in case there’s some essential setup we need to complete ahead of time. After that, the remaining Pi boards can replicate the same set of files with minor tweaks.

Continue reading

Raspberry Pi 4 is shipped with a flashable EEPROM and supports netbooting. However, the entire setup process is not that straightforward and it’s thus worth writing down all the pitfalls through the path especially when it involves a non-“native” Linux distribution.

One thing worth noting is that even though this post mentions “PXE” and so does the official document, the boot process isn’t entirely PXE compliant and thus regular PXE boot support setup process may not work at all or at least not directly.

Here’s a diagram of the network topology I use for the setup:

The Ubiquiti will also be serving as the DHCP server. The Synology NAS will be serving data via tftp (pxe boot) and nfs (post boot). Due to kernel limitation, my NAS cannot support overlayfs. But for newer version of NAS it might be possible.

Raspberry Netboot Sequence

The bootloader in EEPROM by default does not enable netboot so we’d need to enable that first.

Once enabled, upon powering up, Pi will first send a DHCP request to ask for the TFTP server location and verify if netboot is supported.

After that it will fetch the firmware (start4.elf), configs and kernel files (vmlinuz) from that tftp location. The bootloader has a config to specify the exact path for the current device. By default, that’s the last 9 chars of the serial number. This is important as certain config (like cmdline.txt) would tell the OS how to mount the root fs and thus needs to be separate.

The kernel will later be loaded which takes over the boot process and eventually loads the rest of the system distribution.

The official doc can be found here.

Flashing the EEPROM to Enable Netboot

With the basic knowledge equipped, we can continue by first enabling the netboot in bootloader.

To do that, we need to grab the updated version of bootcode.bin file which supports netboot and update the config.

This also requires using the raspberry cli binary vcgencmd.

Pitfall #1: vcgencmd can only work in native Raspberry Pi OS

Yes the vcgencmd binary does not work in Ubuntu or any other “supported” Linux distros, even if you compile from source. They would just silently fail. Technically it should be doable if they are shipped with the right “stuff” but I decide not to waste more time figuring out what the “stuff” is.

Here we need a small microSD card to flash the Raspberry Pi OS onto it and boot the system. For headless setup, don’t forget to touch ssh to create the file to enable ssh by default.

Continue reading

We all write shitty code.

This seems like a stupidly bold statement from a random person, especially when this person very likely writes shitter stuff than most others. After all, there’s a well known joke across the industry - that is, Jeff Dean does not write buggy code, and when he does, it’s the problem of the compiler. So at least we have one counter example.

In common sense, masters don’t write shit, and Linus Torvalds very likely nods in the line.

Well, technically speaking, before we criticize this non sense bold statement, we need to define what shit is. And here, I think shitty code is anything that does not live up to the expectation. Buggy code is shitty - because it doesn’t provide consistently correct results when people expect it to. However, in the real world, we tend to operate at a much higher level - we care more about contracts, functionalities, and behaviors, and less about the exact implementations. For example, when we use an API service, we tend to care about what features it can provide, the QPS it can support, and how easily we can use it - in essence, our expectation. We probably won’t care about whether it’s run on bare metal or some public cloud, whether the memory is manually managed, by compiler, or by garbage collection.

Does that then mean all services/software that fail to match expectation start with shitty code? Well not necessarily. Windows 9x was great at the time it was first introduced and enabled a whole industry. Yet as things evolved, a lot of the design issues started to get exacerbated and it eventually got completely replaced by NT kernel. I’m not trying to defend 9x and saying it was very well designed masterpiece by any means - it had some notorious kernel safety issues after all, but just using it as an example to show that even if some software had highlights, over time other factors could expose the shittiness inside - it’s just a matter of time. What I believe is that no matter how well the systems were designed and implemented at the beginning, they deteriorate over time. We cannot prevent things from turning into shit - we can only slow down the speed.

What makes things more complicated is that, often times, there are way too many factors that have impact on how we value software:

Continue reading


I was able to “break” Orbi’s 2.4G backhaul fallback
and hence force it onto 5G.

So instead of this

I got this:

The solution isn’t clean. Basically telnet to the satellite and then

root@RBS20:/# config set wlg_sta_ssid=<original ssid>_disabled
root@RBS20:/# config commit

Then reboot.

More details and investigation

Continue reading

Microservice architecture is typically useful to solve certain scaling problems where service decoupling/segregation is required to improve development velocity, make service more fault tolerant or handle performance hotspots.

However, everything comes with a price and so does microservice. One typical issue is:

While this is half joking, monitoring and fault resilency are definitely more challenging in microservice world. While there are frameworks like Hystrix and resilience4j to handle circuit breaking, rate limiting and stuff like that, this post focuses on the first thing: how the heck are my services talking to each other?

AWS X-Ray can fill the gap here by offering service mapping and tracing and thus you can see something like

Compared to generic service monitoring,
X-Ray has some additional benefits around AWS ecosystem in that
it will auto expose your AWS resource write
(yes only write unfortunately) call insights when you use AWS SDK.
This applies to SQS, SNS and DynamoDB.

Continue reading

This works for Angular 4-6 so far.

If you have ever used Angular 1.x, you know there's a manual bootstrapping option which looks like:
angular.bootstrap(document.querySelector('#myApp'), ['myModule'])`

This used to be pretty handy until Angular 2 comes in and changes the life.
For some reason they decide to hide that option and ask people to just use
bootstrap in @NgModule.

I get that because for general users this is good enough,
especially if you are just building a general SPA.
However if you want to build something advanced like lazy loading,
or conditional rendering, then this seems a bit naive.

This is especially annoying when in React its counterpart is as simple as

<MyApp />,

This alone won’t drive people away from Angular but it’s just one of the examples
that shows Angular wants to force people into its model rather than thinking about
use cases in the real world.

Alright enough whining and let’s get to coding. After all, Angular seems excellent
especially it covers everything from development, testing, and packaging out of the box.
Let’s leave whining till next time.

Continue reading

I was not really a fan of Windows 10,
let alone Microsoft decided to ditch the most important feature I liked in Windows 7 - Aero.
In fact, I’d admit that in most cases I use Windows as an entertainment system
rather than a working platform.

Don’t get me wrong, Windows is great, both in terms of the quality of the software
and the design/usability of the system by itself. It’s also particularly great of you are
a .NET developer, a webmaster using IIS, or a game developer heavily using DirectX.
However, it’s just cumbersome to use it as a daily OSS platform, namely there lacks the
general ecosystem and the tools are just different. Yes you can install node, java, maven,
gradle, and you can probably use powershell to write shell scripts, but at the end of the day,
the overall configuration just feels different and since most people don’t use Windows
for work on a day-to-day basis, it just takes too much time and effort to learn a set of
rules with different flavor, just to get the environment set up.

However, things have changed.

The release of WSL (Windows Subsystem on Linux) in Windows 10 was like silent bomb.
It wasn’t really marketed to general public, but it implies the fundamental
change of attitude from Microsoft towards OSS community.

WSL is not a virtual machine. In fact there’s no real linux kernel running.
Instead, there is a layer in between that translates linux system calls to
something that windows kernel can handle. Technically, this is seriously phenomenal,
as there’s certain things that there’s no direct equivalent in Windows.

For example:

Quoted from MSDN blog

The Linux fork syscall has no documented equivalent for Windows.
When a fork syscall is made on WSL, lxss.sys does some of the initial work
to prepare for copying the process.
It then calls internal NT APIs to create the process with the correct semantics
and create a thread in the process with an identical register context.
Finally, it does some additional work to complete copying the process
and resumes the new process so it can begin executing.

And another one regarding WSL file system:

The Windows Subsystem for Linux must translate various Linux file system operations
into NT kernel operations. WSL must provide a place where Linux system files can exist
with all the functionality required for that including Linux permissions,
symbolic links and other special files such as FIFOs;
it must provide access to the Windows volumes on your system;
and it must provide special file systems such as ProcFs.

And now it even supports interop
after the Fall Creators update. This means if you type in notepad.exe,
it would literally open notepad for you. Not very exciting but beyond that you could

# copy stuff to clipboard
echo 'foo bar' | clip.exe

# open a file in windows using default associated program
cmd.exe /C start image.png

Awesome, but what’s our original topic?

Continue reading
Airflow Azkaban Conductor Oozie Step Functions
Owner Apache
(previously Airbnb)
LinkedIn Netflix Apache Amazon
Community Very Active Somewhat active Active Active N/A
History 4 years 7 years 1.5 years 8 years 1.5 years
Main Purpose General Purpose Batch Processing Hadoop Job Scheduling Microservice orchestration Hadoop Job Scheduling General Purpose Workflow Processing
Flow Definition Python Custom DSL JSON XML JSON
Support for single node Yes Yes Yes Yes N/A
Quick demo setup Yes Yes Yes No N/A
Support for HA Yes Yes Yes Yes Yes
Single Point of Failure Yes
(Single scheduler)
(Single web and scheduler combined node)
No No No
HA Extra Requirement Celery/Dask/Mesos + Load Balancer + DB DB Load Balancer (web nodes) + DB Load Balancer (web nodes) + DB + Zookeeper Native
Cron Job Yes Yes No Yes Yes
Execution Model Push Push Poll Poll Unknown
Rest API Trigger Yes Yes Yes Yes Yes
Parameterized Execution Yes Yes Yes Yes Yes
Trigger by External Event Yes No No Yes Yes
Native Waiting Task Support Yes No Yes (external signal required) No Yes
Backfilling support Yes No No Yes No
Native Web Authentication LDAP/Password XML Password No Kerberos N/A (AWS login)
Monitoring Yes Limited Limited Yes Limited
Scalability Depending on executor setup Good Very Good Very Good Very Good


  • (2018.11) Oozie has Kerberos auth over SPNEGO for web (thanks to Justin Miller for pointing it out)


I’m not an expert in any of those engines.
I’ve used some of those (Airflow & Azkaban) and checked the code.
For some others I either only read the code (Conductor) or the docs (Oozie/AWS Step Functions).
As most of them are OSS projects, it’s certainly possible that I might have missed certain undocumented features,
or community-contributed plugins. I’m happy to update this if you see anything wrong.

Bottom line: Use your own judgement when reading this post.


The Good

Airflow is a super feature rich engine compared to all other solutions.
Not only you can use plugins to support all kinds of jobs,
ranging from data processing jobs: Hive, Pig (though you can also submit them via shell command),
to general flow management like triggering by existence of file/db entry/s3 content,
or waiting for expected output from a web endpoint,
but also it provides a nice UI that allows you to check your DAGs (workflow dependencies) through code/graph,
and monitors the real time execution of jobs.

Airflow is also highly customizable with a currently vigorous community.
You can run all your jobs through a single node using local executor,
or distribute them onto a group of worker nodes through Celery/Dask/Mesos orchestration.

The Bad

Airflow by itself is still not very mature (in fact maybe Oozie is the only “mature” engine here).
The scheduler would need to periodically poll the scheduling plan and send jobs to executors.
This means it along would continuously dump enormous amount of logs out of the box.
As it works by “ticking”, your jobs are not guaranteed to get scheduled in “real-time” if that makes sense
and this would get worse as the number of concurrent jobs increases.
Meanwhile as you have one centralized scheduler, if it goes down or gets stuck, your running jobs won’t be
affected as that the job of executors, but no new jobs will get scheduled. This is especially confusing when
you run this with a HA setup where you have multiple web nodes, a scheduler, a broker
(typically a message queue in Celery case), multiple executors. When scheduler is stuck for whatever reason,
all you see in web UI is all tasks are running, but in fact they are not actually moving forward while executors
are happily reporting they are fine. In other words, the default monitoring is still far from bullet proof.

The web UI is very nice from the first look. However it sometimes is confusing to new users.
What does it mean my DAG runs are “running” but my tasks have no state? The charts are not search friendly either,
let alone some of the features are still far from well documented
(though the document does look nice, I mean, compared to Oozie, which does seem out-dated).

The backfilling design is good in certain cases but very error prone in others.
If you have a flow with cron schedules disabled and re-enabled later, it would try to play catch up,
and if your jobs is not designed to be idempotent, shit would happen for real.


The Good

Of all the engines, Azkaban is probably the easiest to get going out of the box.
UI is very intuitive and easy to use. Scheduling and REST APIs works just fine.

Limited HA setup works out of the box.
There’s no need for load balancer because you can only have one web node.
You can configure how it selects executor nodes to push jobs to and it generally seems to scale pretty nicely.
You can easily run tens of thousands of jobs as long as you have enough capacity for the executor nodes.

The Bad

It is not very feature rich out of the box as a general purpose orchestration engine,
but likely that’s not what’s originally designed for. It’s strength lies in native support for Hadoop/Pig/Hive,
though you can also achieve those using command line. But itself cannot trigger jobs through external resources like
Airflow, nor does it support job waiting pattern. Although you can do busy waiting through java code/scripts, that
leads to bad resource utilization.

The documentation and configuration are generally a bit confusing compared to others. It’s likely that it wasn’t supposed
to be OSed at the beginning. The design is okish but you better have a big data center to run the executors as scheduling
would get stalled when executors run out of resources without extra monitoring stuff. The code quality overall is a bit towards
the lower end compared to others so it generally only scales well when resource is not a problem.

The setup/design is not cloud friendly. You are pretty much supposed to have stable bare metal rather than dynamically
allocated virtual instances with dynamic IPs. Scheduling would go south if machines vanish.

The monitoring part is sort of acceptable through JMX (does not seem documented). But it generally doesn’t work well if your
machines are heavily loaded, unfortunately, as the endpoints may get stuck.


The Good

It’s a bit unfair to put Conductor into this competition as it’s real purpose is for microservice orchestration, whatever that means.
It’s HA model involves a quorum of servers sitting behind load balancer putting tasks onto a message queue which the worker nodes would
poll from, which means it’s less likely you’ll run into stalled scheduling.
With the help of parameterized execution through API, it’s actually quite good at scheduling and scaling provided
that you set up your load balancer/service discovery layer properly.

The Bad

The UI needs a bit more love. There’s currently very limited monitoring there. Although for general purpose scheduling that’s probably
good enough.

It’s pretty bare-bone out of the box. There’s not even native support for running shell scripts, though it’s pretty easy to implement
a task worker through python to do the job with the examples provided.


The Good

Oozie provides a seemingly reliable HA model through the db setup (seemingly b/c I’ve not dug into it).
It provides native support for Hadoop related jobs as it was sort of built for that eco system.

The Bad

Not a very good candidate for general purpose flow scheduling as the XML definition is quite verbose
and cumbersome for defining light weight jobs.

It also requires quite a bit of peripheral setup. You need a zookeeper cluster, a db, a load balancer
and each node needs to run a web app container like Tomcat. The initial setup also takes some time which is
not friendly to first time users to pilot stuff.

Step Functions

The Good

Step Functions is fairly new (launch in Dec 2016). However the future seems promising. With the HA nature of cloud
platform and lambda functions, it almost feels like it can easily scale infinitely (compared to others).

It also offers some useful features for general purpose workflow handling like waiting support and dynamic branching
based on output.

It’s also fairly cheap:

  • 4,000 state transitions are free each month
  • $0.025 per 1,000 state transitions thereafter ($0.000025 per state transition)

If you don’t run tens of thousands of jobs, this might be even better than running your own cluster of things.

The Bad

Can only be used by AWS users. Deal breaker if you are not one of them yet.

Lambda requires extra work for production level iteration/deployment.

There’s no UI (well there is but it’s really just a console).
So if you need any level of monitoring beyond that you need to build it using cloudwatch by yourself.

Comment and share

Table of Contents

  1. == and ===
  2. Dig deeper
    1. What about arrays?
    2. What about objects
  3. Implicit conversions
  4. Conclusion

== and ===

Likely you know the difference between == and ===: basically, === means strict equality where no implicit conversion is allowed whereas == is loose equality.


'a' === 'a' // true
0 == false // true

Dig deeper

OK but this is too boring since we all know that.

How about this:


String('a') === 'a'
new String('a') === 'a'

Well the answers are true and false because String() returns a primitive string while new String() returns a string object. Surely new String('a') == 'a' yields true. No surprise.

What about arrays?

[] === []

Well this returns false because for non-primitive objects, they are compared by reference. This always returns false because they are different in terms of memory location.

However surprisingly you can compare arrays like this:

[1, 2, 3] < [2, 3]      // true
[2, 1, 3] > [1, 2, 3]   // true
Blonde hmmm

(Wait a sec. I think I have an idea.)

How about this:

function arrEquals(arr1, arr2) {
    return !(arr1 < arr2) && !(arr2 < arr1);
Fuck yeah smile

Well this is wrong because arrays will be flattened when compared, like this

[[1, 2], 1] < [1, 2, 3]     // true

What about objects

What’s the result of this expression?

{} === {}

Well it’s neither true nor false but you get SyntaxError because in this case {} is not an object literal but a code block and thus it cannot be followed with =. Anyway we are drifting away from the original topic…

Implicit conversions

Well that’s just warm-up. Let’s see something serious.

If you read something about “best practices”, you would probably be told not to use == because of the evil conversion. However chances are you’ve used it here and there and most likely that’s also part of the “best practices”.

For example:

var foo = bar();
if (foo) {

This works because in JavaScript, only 6 object/literals are evaluated to false. They are 0, '', NaN, undefined, null and of course false. Rest of the world evaluates to true, including {} and [].

Hmm here’s something wacky:


var a = {
valueOf: function () {
return -1;

if (!(1 + a)) {

Your code does go boom because 1 + a gets implicitly converted to 1 + a.valueOf() and hence yields 0.

The actual behavior is documented in ECMA standard - http://www.ecma-international.org/ecma-262/6.0/#sec-abstract-equality-comparison

In most cases, implicit conversion would cause valueOf() to be called or falls back to toString() if not defined.

For example:


var foo = {
valueOf: function () {
return 'value';
toString: function () {
return 'toString';

'foo' + foo // foovalue

This is because according to standard, when toPrimitive is invoked for implicit conversion with no hint provided (e.g. in the case of concatenation, or when == is used between different types), it by default prefers valueOf. There are a few exceptions though, including but not limited to Array.prototype.join and alert. They would call toPrimitive with string as the hint so toString() will be favored.


In general, you probably want to avoid using == and use === most of the time if not always to avoid worrying about wonky implicit conversion magic.

However, you can’t be wary enough. For example:

isNaN('1') === true

You might think that '1' is a string and hence this should be false but unfortunately isNaN always calls toNumber internally (spec) and hence this is true.

Computer stare

Comment and share

Table of Contents

  1. Have you seen eval() written like this?
  2. Regular eval
  3. Global eval
  4. Back to the original topic

Recently I’ve been writing quite a bit of front-end stuff and seen quite a few tricks from other people’s libraries. It turns out JavaScript is a pretty wonky and fked up interesting language, which tempts me to write a series about it and this is the first one. This is by no means supposed to show how to write JS but just to show some “wacky” stuff.

Have you seen eval() written like this?

(0, eval)('something');
{% rage_face 'Are you fucking kidding me' style:width:200px %}

Regular eval

Eval basically allows you to execute any script within the given context.

For example:

{% codeblock lang:js %} eval('console.log("123");'); // prints out 123 (function A() { this.a = 1; eval('console.log(this.a);'); // 1 })(); {% endcodeblock %}

So far everything is normal: eval runs inside the current scope. this is pointed to the instance of A.

Global eval

Things get interesting when you do this:

{% codeblock lang:js %} var someVar = 'outer'; (function A() { this.someVar = 'inner'; eval('console.log(someVar);'); // you may want 'outer' but this says 'inner' })(); {% endcodeblock %}

Well in this scenario eval cannot get the value of someVar in the global scope.

However ECMA5 says, if you change eval() call to indirect, in other words, if you use it as a value rather than a function reference, then it will evaluate the input in the global scope.

So this would work:

{% codeblock lang:js %} var someVar = 'outer'; (function A() { var geval = eval; this.someVar = 'inner'; geval('console.log(someVar);'); // 'outer' })(); {% endcodeblock %}

Although geval and eval call the exact same function, geval is a value and thus it becomes an indirect call according to ECMA5.

Back to the original topic

So what the hell is (0, eval) then? Well a comma separated expression list evaluates to the last value, so it essentially is a shortcut to

var geval = eval;

0 is only a puppet here. It could be any value.

So much win

Comment and share

Author's picture

Shawn Xu

Software Engineer in Bay Area