Cargo Cult Analytics (best thinking on big data I’ve read in a long time)

Former IDG colleague Matt McAllister tweeted a link to a wonderful post by Stijn Debrouwere, a Knight-Mozilla OpenNews fellow who is “loosely affiliated with the Guardian’s data science team in London.”

Debrouwere tackles the futility of newsroom analytics and measurement, something I lived at Forbes.com and IDG and then Lenovo when I ran web analytics, mainly Omniture (now Adobe) and Google Analytics. As Debrouwere sarcastically notes, putting dashboards on big flat panel screens and making them really big makes them really important. He compares media executives who cling to their dashboards to New Guinea primitives waving at the sky and waiting for the cargo to come to them the way it used to come during WW II when the US Army was fighting the war.

Some zingers from his post:

  • “If you’re like most people, you don’t stray very far from the dashboard you get when you log in. You stare and squint and hope insight will magically manifest itself.”
  • “There’s nothing like a dashboard full of data and graphs and trend lines to make us feel like grown ups. Like people who know what they’re doing. So even though we’re not getting any real use out of it, it’s addictive and we can’t stop doing it.”
  • “There’s enough social media analytics tools to merit listicles that helpfully introduce you to the top 8.”
  • “You’re supposed to put these dashboards up on a wall, on a huge plasma screen. Because of course numbers are twice as persuasive if you make them twice as big.”
  • “Metrics are for doing, not staring.”
  • “I honestly can’t recall the last time I’ve looked at our pageviews. I know it wouldn’t get me anywhere.”

http://stdout.be/2013/08/26/cargo-cult-analytics/

This Big Data thing has a lot of people confused, myself included. And for good reason. We think of this big database in the cloud, doing something so big and difficult that it requires lots and lots of processing power and a thing called “Hadoop,” watching individuals interacting like so many ants with companies and stuff in real-time like some scary NSA spooky datacenter in Utah listening to all our phone calls  (but respectful of our privacy of course), figuring out patterns and trends and opportunities and MAGICAL INSIGHTS in REAL-TIME.  So let’s get ourselves some of that there Big Data and save the company, be like Google, A/B test the shit out  of stuff, and get rid of the Highest Paid Person’s opinion, blah, blah blah…..

First, a lot of people, including me, suck at math and statistics and so we overcompensate by regarding any numbers and the word “quantitative” as mystical.  If it has a number attached to it (not an adjective) it must be important.

Second, the legends around Big Data and the magical insights they deliver to retailers are kind of cool to consider and become mythical. Target can tell when a woman is pregnant based on her shopping history. Wal-Mart figured out 80% of its store visitors turn right when they enter the store. The promise of finding one of those awesome insights is just too compelling to miss. The problem is staring at a dashboard doesn’t equate to discovering an insight. Hence a lot of us are like the Cargo Cultists.

Third, modern management is obsessed with measurement — former colleague Lew McCreary called it the Tyranny of Metrics — and the Pokemon Model of Got To Get ‘Em All applies to equating Big Data with Big Data Collection, which yields the ugly phenomenon coined by Google metrics guru Avinash Kaushik: “Data Puking.” The admonition that, “You can’t manage what you don’t measure,” has built a corporate culture more concerned with looking buttoned-up, on the ball, and obsessively accurate than being intuitive, empathetic and innovative.

I was the guy who built these dashboards, peered at them for magical insights, puked them at my bosses, and over time I started to get really cynical and put  tired old quotes pissing on measurement   into my PowerPoint presentations:

Einstein: “Not everything that can be counted counts, and not everything that counts can be counted.”

Warren Buffett:”They studied what was measurable, rather than what was meaningful” – 

I know Debrouwere’s post appealed to me because he was specifically addressing metrics in the newsroom — a place I spent most of my career. But it also struck a current chord with me because of  my work for clients, all of whom cite Big Data incessantly as a force for disruption and transformation, yet haven’t the faintest clue of how to harness it or whom the Oracle will be in their organization who will study the digital tea leaves and come up with the single “AHA!” that will make them Measurement Legends.

 

3 Ways to Write an Annoying “ListLine”™

The recently departed Al Neuharth — the man who gave the world McJournalism when he created USA Today in 1982 — was famous in my mind for two things (no, make that three, because this is a post in part about the magic powers of “three”):

  1. Always publish the tits above the fold
  2. Bulleted lists are better than paragraphs
  3. Infographics that twist statistics and invoke the Royal We into cartoons are engaging

People love lists. Decades ago there was a bestseller entitled “The Book of Lists,” a classic toilet-side tome in many a household. There are  management books about the power of to-do lists.  I must have at least three or four list apps on my phones and tablets and PC. Most horrible is the tendency of the lower life forms in online journalism and especially digital marketing/SEO/Content marketing bloggers to use lists as linkbait. There are so many headlines about “Three Ways to Increase ROI” and “Four Ways Content Marketing Can Engage and Delight Your Customers” that I have to wonder what’s driving this obsession with numerical sequence.  I know that if I click through to actually read the stuff I’m going to read some airhead social media/digital marketing “guru’s” rehashed airheaded jargon twisted bloviations.

Working off off my feeds this morning  I found this actual set of … oh hell, let’s just call them “ListLines™“, e.g. headlines promoting lists:

  • 13 Smart Podcasts That Will Feed Your Hunger for Knowledge and Ideas
  • The 45 Best Restaurants in America (BusinessInsider is a huge fan of  ListLines™, generally cutting up the content into slideshows to pump up the pageviews). They have a daily list which is semi-useful called …..
  • Ten Things You Need To Know
  • 10 Habits of Remarkably Charismatic People
  • We Try 4 New Electric Hot Water Kettles for Coffee and Tea

The king of the numbered ListLine has to be the Content Marketing Institute, which on its home page has the following headlines, and all save one has a numeral in it:

  • 4 Truths About Content Marketing Clients
  • 6 Tips to Start Creating Content on Tumblr
  • 3 Tips for More Effective Content Marketing Visuals
  • 9 Questions to Help You Prioritize Content Creation
  • 12 Roles Essential to the Future of Content Marketing
  • Thought Leadership Strategy: 3 Ways to Leverage Live Event Content
  • 3 Tips for Keeping Your Buyer Personas Fresh and Alive
  • How Enterprises Handle B2B Content: 6 Key Insights From Our Research

McKinsey, the organization that lives on PowerPoint, had an unofficial Rule of Threes during my short stint– as in no slide should have more than three bullet points on it because that was all the typical audience member could hold in their head during the time it took the expensive consultant to present the slide. McKinsey was into numerology in general and the place should have had the Pareto Principle inscribed over the door as its motto (the “80/20” rule). I admit I stick to the Rule of Threes to this day.

My theory about the abuse of the numbered list in online headlines is the corruption of editorial good sense by the scuzzy underworld of Search Engine Optimization and the Tyranny of Metrics. Let’s turn to the experts at the Content Marketing Institute, enter in the search term “lists” and what do you know? In a post entitled “Content Strategy: 9 Secrets for Awesome Blog Post Titles“, Tracy Gold writes in item number 5:

“We all groan about numbered lists in blog posts. But the truth is, they work. In our research, titles that began with a number performed 45 percent better than the average.

“Another approach is to start with a keyword and include a number later in the title. Take “Content Marketing Checklist: 22 To-dos for SlideShare Success,” for example. We tested both title types, and when the headline started with a keyword, it actually performed slightly better.

“While one approach to this method is to work more numbered lists into your blog content strategy up front, you can also use a numbered list in a post after it’s written. Is the post split up into sections? Can those sections be numbered? Boom. But again, don’t mislead your readers — make sure a numbered list format actually fits the content of your post.”

Now we know the secrets of the masters. My theory is by announcing ahead of time how many pieces of b.s. the reader will have to digest, they figure they aren’t in for a reading of Procopius History of the Early Church and can snack on the info before their Adderall buzzing brain clicks them away.

Before closing, let me digress back to USA Today and my indoctrination into the art of the list.

I worked at a newspaper — The Lawrence Eagle-Tribune — that rented its color presses to print the New England edition USA Today at night, receiving the pages via satellite and then churning out the colorful McPaper so familiar to residents of the Marriott Courtyard Suites. This close relationship unfortunately colored the judgment of Eagle-Tribune editor-in-chief Dan Warner, who decided that Al Neuharth was a visionary genius and that the Tribune’s staff  would learn to write lists instead of stories and develop “infographics” about Why We Love Ice Cream,” complete with a cartoon of a melting ice cream cone, a gushing thermometer and some made up statistic about what flavors “We” preferred.

This was strictly enforced to the point that every story opened with a classic lead (my favorite lead of all time, courtesy of Edna Buchanan, the legendary police reporter of the Miami Herald is cited below*), a standard second paragraph, and then an inevitable list of bulleted items before the jump to an inside page.  I would pile into the newsroom after a scintillating evening covering the Salem, New Hampshire board of selectmen and pound out some lifeless copy (“This ain’t a short story about your dead grandma bub, so get over it” my editor, Al White, told me after taking a machete to my first story about a sewer bond hearing) that always had a bullet list up high where Dan Warner would be sure to see it. Hence:

“In other actions, the board voted to:

  • Ban pit bulls from playgrounds
  • Postpone a hearing on bingo licenses
  • Authorize door-to-door cigarette sales by Brownie Troop 5
  • Commend Police Chief Nickerson for Sunday’s arrest of undercover Massachusetts State Policemen harassing Bay State liquor and fireworks customers

At first the mandate to use bullet lists offended my delicate Strunk & White sensibilities about prose composition.  One of the joys of great writing is a well-written list, contained in a single flowing sentence, ordered just so to delight the ear and paint a picture in the mind’s eye, but alas the world has become addicted to the staccato stack of one-liners preceded by the bold typographical dot and so I have given up all hope of resistance.

But I know in my heart of hearts that William Faulkner never wrote a bullet list in his life or worried about SEO.

 

*: Calvin Trillin, profiling Buchanan in the New Yorker: “In the newsroom of the Miami Herald, there is some disagreement about which of Edna Buchanan’s first paragraphs stands as the classic Edna lead. I line up with the fried-chicken faction. The fried-chicken story was about a rowdy ex-con named Gary Robinson, who late one Sunday night lurched drunkenly into a Church’s outlet, shoved his way to the front of the line, and ordered a three-piece box of fried chicken. Persuaded to wait his turn, he reached the counter again five or ten minutes later, only to be told that Church’s had run out of fried chicken. The young woman at the counter suggested that he might like chicken nuggets instead. Robinson responded to the suggestion by slugging her in the head. That set off a chain of events that ended with Robinson’s being shot dead by a security guard. Edna Buchanan covered the murder for the Herald—there are policemen in Miami who say that it wouldn’t be a murder without her—and her story began with what the fried-chicken faction still regards as the classic Edna lead: “Gary Robinson died hungry.”

 

Big data visualization beauty

I marvel at the art of visually representing quantitative data. There have been some excellent examples over the time. I used to be particularly obsessed with Smartmoney’s heat map of the stock market which blew a lot of minds in the late 1990s, and went out of my way to try to recruit the genius who came up with it into Forbes.com (with no success). Today it seems so static and Web 1.0, but still, cavemen used to be freaked out by fire, imagine what they would do with a Bic lighter?

Uncle Fester, the collector of all that is interesting, sent me a link to a very cool wind map.  Meteorological maps are generally fairly dull and impenetrable, with their own symbolic language of isobars, beaufort scales, and occluded fronts. Indeed, weather has long been considered one of the greatest data challenges. Consider that for decades the standard was something like this:

 Not very friendly to the layman, more the sort of thing a pilot or professional could read and derive some sense of the future from. Wind is personally the single most interesting element of a weather forecast. As a former sailboat racer, I’d obsess over the probability of a wind shift occurring during a race, or, plan ahead on whether or not to take a crew to help hold the boat down if the breeze increased in velocity. Too much weight and I’d lose. Too little weight and I’d be screwed trying to keep the boat flat in the gusts.

Here’s what wind maps used to look like:

And here is what they look like today. This is beautiful and very addictive to play with. I highly recommend clicking through to see this in all of its animated glory.

 

 

 

 

 

 

And sorry, but I can’t forget this classic:

Personal Analytics: Track Thy Self

I’ve been logging my physical activities and diet for a while, moving from spreadsheets to programs to web-apps to device apps in search of the best way to keep consistent track of my progress in the belief that if I don’t measure it, I won’t stick with it.

One of my former rowing coaches, Tom Bohrer, an amazing oarsman and former Olympic-level athlete, told me the first step towards success in losing weight is to log every bite. The discipline of noting what one puts in one’s mouth forces an awareness of what is on the plate and the high number of random, thoughtless calories that can creep onto the plate during the day. Tom had me write it down in a simple $1.00 spiral notebook and not bother with calories counts, ounces and grams, or totals. Just have the honesty to admit to the bag of Swedish Fish and the courage to show that transgression to him every week.

In this Moneyball era, some sports are very number/goal based and others are getting more so. Any of the racing sports — swimming, rowing, running are stark time-over-distance efforts that can be timed, charted, and plotted over time.  Team sports — football or lacrosse for example — are subjective and don’t lend themselves to improvement-metrics the way baseball does.

I’m most interested in the trend of personal tracking and the rise of technology that allows a person to track every step taken during the day, every session completed on the machine, every moment spent in deep sleep, down to blood glucose levels. Tim Ferris’ bestselling The Four Hour Body exemplifies the degree to which a person with enough motivation and money can obsessively test one’s self. This is a guy who flies to Central America where he can gets a lot of expensive tests performed cheaply. A guy who is open to any device or toy that will help him plot performance and levels over time.

I learned the discipline of logging early on thanks to the early efforts of Concept2 — the Vermont maker of the Concept 2 rowing ergometer, the standard indoor rowing machine adopted by most teams because of its high quality and very capable digital monitor, a device called the PM4 which was developed for Concept2 by the Pennsylvania company Nielsen & Kellerman who also make monitors for on-the-water rowing and portable meteorological instruments. Concept2 was smart in opening up the code interface to the PM monitor and equipping it with a USB and ethernet jack. Third party software such as RowPro followed, giving devoted rowers and coaches even more data about their performance. Concept2’s smartest move, in my opinion, was serving up an online logbook that allows a rower to enter their workouts and compare themselves on public leaderboards against other rowers of the same weight, gender, age over set benchmark times and distances. The online logbook at Concept2.com sees billions of meters logged every year, and gives a disciplined rower a clear sense of progress and goals.

For more than a year I have been logging my diet through a free tool offered by the Livestrong Foundation called MyPlate.  The web-service is designed and managed by Demand Media and is buried in a content site that delivers nutrition and health stories and social network functions which I pretty much ignore.

The calorie tracker combines the functions of a log book with a deep database of calorie counts and nutritional levels for essentially any food one could imagine, including branded food such as a quarter cup of Trader Joe’s organic dried white peaches to a Five Guys Bacon Double Cheeseburger. I can combine ingredients into standard meals to ease the logging of frequently eaten combinations, set nutritional targets ranging from the amount of sodium to the number of net calories consumed per day, and log and plot my weight, body mass index, and specific physical measurements such as the diameter of my neck, check and abdomen over time. MyPlate will calculate calorie levels to achieve specific weight loss or gain goals and does a good job of plotting progress on X,Y charts. A subscription version offers richer functionality.

To log my exercise progress — I could and do use MyPlate as it calculates calories expended and deducts those from my gross calorie count. Hence I can log a two mile run at 13 minutes, 43 seconds, and it will cough up a calorie expense of 438 and subtract that from the inputs.

Since I am spending most of my workout time in Crossfit — I also need to track my performance and progress against a lot of benchmarks ranging from my personal records for weight lifting such as deadlifts, back squats, snatches, presses and cleans, as well as specific Crossfit workouts such with names like Fran and Kelly. I had been logging that work in a paper notebook I leave at the gym, but a fellow crossfitter introduced me to a site called Beyondthewhiteboard.com which does an excellent job of letting me log my progress against my gym’s prescribed daily workouts. There is a food logging capability on the site, but it isn’t driven by a crowd-sourced calorie database, so I tend to ignore it. I do throw my weight in there though to keep a record of progress there as well.

The Four Hour Body piqued my curiosity about the role of supplements in physical well being and improvement. Ferris prescribes some fairly outre tips ranging from his so-called PAGG Stack (policasonol, alpha-lipoic acid, garlic extract and green tea extract) to induce a state of fat-burning thermogenesis , to eating three brazil nuts in the morning and at night to improve selenium levels and testosterone production. I personally agree with the man who said people who take vitamin supplements have the most expensive pee in the world, but I also spend a lot of cash on stuff ranging from Omega-3 fish oil to all sorts of pills, protein powders and vitamins. Since I don’t have the free cash to spend on a lot of blood tests to see exactly what is going on in my metabolism I take this stuff as an article of faith.

A good source of deep and usually impenetrable advice about supplements comes from the forums at Longecity.com which is where I learned about the online log service, CRON-O-Meter. This service is essentially MyPlate taken to another level of specificity for total nutrition geeks with automated tracking of very specific vitamin and protein information for those who believe food is essentially culinary pharmaceuticals and who like to geek out by reading every word of Dr. Barry Sears, the Zone diet founder or Gary Taubes, the au courant dispeller of the why we get fat myth. I tried CRON-O-Meter for a while, but I’m just not that anal retentive or well-heeled to figure out if I need more lysine or niacin or vitamin D in my life and then buy it.

Rising in popularity are sleep monitors as the fitness-measurers are pushing the idea that sleep quality and duration has a big effect on health, recovery from exercise, and general well-being.  The owner of my Crossfit gym, Mark Lee has been using a sleep monitor, and there are some that track the time it takes you to fall asleep, how many times a night you wake up, when you go into deep sleep, etc..  One brand I’m aware of is Zeo with a $150 bedside setup.

Then there are the new breed of pedometer like devices that track every step, capture all the data, and can be uploaded and tracked online. Fitbit is probably the best know of these, and at a $100 seems reasonable enough as it also purports to track sleep but I’m not compelled to wear one on my belt.

One can obviously go overboard on the personal tracking obsession and I know I am coming close to being too geeky about the whole thing, but you can expect to see and hear about more of it, not less, as awareness over dietary and supplement chemistry rises thanks to people like Tim Ferris; the paleo diet craze expands because of Reebok’s commercial embrace of Crossfit “the Sport of Fitness (Crossfit, aka “Cultfit” to its detractors, embraces paleo principles as part of the program); and the device makers push their meters, gauges, wireless scales and pedometers at you more and more.

My personal testimony to whether any of the tracking works is this: I’ve dropped 50 pounds in 18 months, cholesterol levels have plummeted (I took myself off prescribed statins and have yet to see if I can manage my HDL/LDL levels through diet and exercise alone), and I eat a fairly strict paleo diet that restricts calories to around the 2,000 per day level. My rowing times are as good, if not better than they were ten years ago, and my running times have improved from a sluggish ten-minute mile pace to a 7 minute mile in a matter of months. Yes, this is insanely narcissistic, but it is efficient, it beats the old method of carrots and cottage cheese, little paper calorie counter books, and endless jogs around the block with a daily visit to the bathroom scale.

 

Spikes in stats

Feedburner displays my feed subscribers in the left column. I keep an eye on as a casual reference to growth in readership and declare little victories everytime the odometer clocks another 100 readers.

Typically it hovers around 600 subscribers but in the last few days it has spiked to 900 plus. Why? No clue. The number fluctuates up and down, but a 30% spike means either Feedburner has burped or … (update, Nathan Gilliatt said FriendFeed subs are added)

Some undetected thing spiked inbound traffic.

Look at the green bar just go nuts in the last week.

I’m not a collector of stat counts — I have to worry about followers and ranks too much in the real world of Lenovo — but it is an ego-stroke to know someone reads this stuff.

Then again some don’t ….. Stefan Constantinescu, a great commentor on all things related to ThinkPads, had to unsub when he realized that my professional title doesn’t mean this blog follows in my career’s footsteps. (No hard feelings Stefan, just citing your decision as example of blog identity crisis).

  1. Churbuck: David Churbuck works for Lenovo (OTCPK: LNVGY), a company that makes a line of laptops known as the ThinkPad. Why do I know this? I’m a huge ThinkPad nerd. Practically every laptop I’ve ever owned has been a ThinkPad. I love the design, the dependability, the battery life, but do I love David? This is his personal blog more or less. He constantly writes about getting back into shape and fishing. I’m sorry, but I just don’t care. Decision: Unsubscribe.

The stupidity of metrics

I have been in solid meetings the past two days and yesterday watched a presentation that reminded me of the story of the crash of Eastern Airlines Flight 401 in 1972.
The pilots were coming in for a landing but the “gear down” light didn’t illuminate in the cockpit. They tapped the light. They flipped switches. The co-pilot opened a hatch and climbed down to see what the problem was. They continued to obsess about the light but they didn’t notice when one of them bumped into the steering column and turned off the autopilot, putting the jet into a slow descent.

No one looked out the windshield. They were looking at the dead nose gear light.

Splat. 99 dead. 77 survived.

Metrics — the act of collecting data about systems and processes — and then reporting them in dashboards can lead to the type of tunnel vision those pilots displayed 36 years ago. The obsession with gathering status reports for the sake of gathering status reports can divert the organization and its people from the task at hand. If you’re trying to smelt gold but you spend so much time tracking ingot development that you fail to notice that you’re in fact smelting lead — then you’re going to be really good at ingot development, but oblivious to the quality of the final output. This is why formerly good things get ruined when big companies acquire them and start to obsess about the efficiencies. “We’ll just swap out the good stuff for the okay stuff and no one will notice.”
Subjectivity — the measurement of quality — is it good? is it bad? Do we suck or do we rock? Those unmeasurable intangibles are dismissed by technocrats as “feeling” behavior prized by people to sloppy to appreciate precision. Or, they attempt to quantify the subjective with surveys and stupid metrics like “sentiment.”
Objectivity — the measurement of facts — has become de rigeur ever since Neutron Jack Welch of GE set forth the commandment that you have to measure it to manage it. And so commenced the age of the tyranny of metrics. The Excel tyrants are really really good at demanding status reports and updates, but the reality is no one looks at their work and is terrified to say: “Go away. Here’s a beach. Start counting grains of sand and give me a TPS report by tomorrow.”

Metrics people — turn yourself into analysts by looking out the window and telling your boss the swamp is getting really close.