Lyncredible Navigating the tech stack of engineering management

Start with measurement

The legendary Andy Grove wrote, in High Output Management, that good indicators are key to management. It has since been popularized by the OKR framework as is depicted in Measure What Matters by John Doerr. As I practice the metrics skill on and off work, I continue to be amazed at how effective a tool it is. With the pandemic ravaging the world, I find myself relearning the importance of starting with measurement.

No metrics != No problem

Leaders in too many countries pretended there was not a local outbreak when it was the best time to combat COVID-19. China did not admit human to human transmission until January 20th, 2020, more than 40 days after the first confirmed case on December 8th, 2019. United States continued to downplay the threat while it did not even allow private labs to test for SARS-CoV-2, the coronavirus responsible for COVID-19. Official denials are the best catalyst for such an infectious virus, and it capitalized on the opportunity. In countries where testing was done early and widely, the situation seems to be under control.

Of course there is always an opportunity cost with overreaction, and leaders have every reason1 to not sacrifice the economy for an unknown threat whose impact might be big or small. I believe that is exactly why it is key to collect data before taking action, because the appropriate action needs to be data-driven instead of wishful thinking.

I have learned this lesson the hard way myself. There was this key project on my team’s road map last half to speed up our tooling. It was so important that everyone agreed we should do it. That included every engineer on my team, every engineer using our infrastructure tooling, and every manager supporting those teams. The engineers wrote a proposal, got it reviewed and dove right into execution. Everything seemed perfect, except that they did not bother with metrics. I tried to nudge them to start with plotting operation durations. “We could always create a chart after it is shipped”, said the team.

The originally scheduled release date had come and gone. The team were hard at work. Most weekly project updates were on track, but the project was nowhere near shipping. It was okay to miss the target. Estimation was hard anyway. The problem was that we did not know quantitatively how much progress was made, because there was no data to show for it. It also made it hard to predict how much longer the project would take. We eventually shipped the project after another couple of months and retroactively added metrics, but I wished I had insisted from the beginning. Had we had ongoing measurement of the operation durations, I could have caught the off-track signal earlier and intervened. I could have helped the team fight their perfectionism and deliver timely, incremental improvements.

Agree on good metrics

Metrics are only effective when the team agree they are good metrics. It would not be as effective if I imposed certain metrics upon my team, however reasonable those metrics seemed. Otherwise we would spend too much time debating the merits of the metrics. The precious time should instead be spent moving the metrics towards the right direction.

The US federal and state governments failed spectacularly in this regard. Take a look at the data compiled by the COVID Tracking Project, and one will be astonished by the inconsistency among numbers reported by different states. Read the footnotes for each state if you need persuasion. In fact the project leaders tweeted data quality grades and urged states to report higher quality, consistent data.

An interesting thought experiment is, if you were the leader in charge of COVID-19 response, what would you do in terms of measuring it? I have been contemplating this for a while, and here are my ideas:

  • Collect testing data, including number of positive, negative and total tested cases. The derived positive ratio could be used to inform if testing capacity is enough. If the positive ratio were too high, testing would need to be rapidly scaled up to cover more people.
  • Pivot testing data among some dimensions, including geographic area, age group, and symptom severity. These data points again can be used to inform future mitigation strategy.
  • Publish daily updates with aggregate numbers and historical daily increases. Visualize them in charts.
  • Publish total numbers along with per-capita numbers. Do it for the aggregate numbers and for each geographic area. This can be used to identify hot spots.
  • Influence other leaders to adopt a consistent set of data collection and reporting requirements.

Measure to improve

As the saying goes, if you cannot measure it, you cannot improve it. I am grateful to learn the metrics lesson again in this unforeseeable fashion. This tragic development underscores the importance of performing good measurements early. Here is hope that governments across the world would take actions to measure and to contain the pandemic.

  1. There is a sad explanation of government inaction based on incentive theory. No one would thank the home inspector for pointing out fire risk, but everyone would thank the firefighter who saved them from a burning house. The government is not incentivized to take early actions when, ironically, it is the easiest to contain the risk, because people would not believe how serious the problem could be. Instead, the government has every reason to wait until the risk is fully clear when it can show up as a savior to its people. People in East Asian countries are still terrified by the SARS outbreak in 2003. Therefore their governments are motivated to contain COVID-19 from day one.