Numbers and US

Story that numbers tell us

How long are you going to live

leave a comment »

All of us who are exposed to statistics have heard of regression towards mean. We know how Sir Galton was trying to explain the height of a child from parent’s height. Sir Galton found that next observation – height of progeny- is always more likely to move towards the mean height of the population. So, if you are way taller than most of the people around you, it’s more likely that your kid is going to be shorter than you. Based on Sir Galton’s finding, if you are, say 1 feet taller than mean height of population, your kid is most likely going to be 8 inch taller than average height.

Interesting, we have some study done on life expectancy of a progeny as a factor of parents life span as well. Gavrilov’s at the university of chicago has done lot of studies on life expectancy. Our life span could be dependent on both genetic as well as environmental factor. He was trying to understand the interplay of both.

Here’s what they found.  It looks like up to age 70 environment seems to play major factor, but after the tipping point of 70 years, parents life span seems to be a good predictor of child’s life expectancy.

Life expectancy 1


Written by SK

August 25, 2014 at 3:12 am

Posted in Uncategorized

Information Deluge, and How is it going to get managed

leave a comment »

Conor Friedersdorf who writes for  The Atlantic has founded a newsletter ‘The Best of Journalism’  that sends exceptional non-fiction writing to our mail boxes. They are charging $1.99 for their services. It seems they don’t have too many users, and readers base is only in hundreds now.

Nonetheless, it’s an interesting development.

The idea behind the concept is that, we live in a world where information is everywhere, we are sinking in the sea of information, and every bit of information is fighting for our attention. We really don’t know where to put our attention to, what to soak in, what to just glance at, and what to discard out-rightly.  The newsletter try to address consumers need in the sense that it sends us a list of items that are worth our attention.

In a time, where we are in the verge of automating all the decisions, at least we have started talking about automating all the decisions, this is a throwback to an era bygone. Now we have recommendation engines that tell us which movies to watch, which books to read, which news item to read, what to buy, when to buy grocery…Though ‘The Best of Journalism’ doesn’t have too many readers now, this venture is based on assumption that there is still a need of human touch.  And the recommendation engines are still not fulfilling the needs of all.

Evolution is going to make recommendation engine smarter with time, but it feels like there is going to be a need of human touch. We change, and we don’t even know ourselves, and sometime our past activity or historical data might not be predictor of our future choices. Secondly, to really know us the the algorithm would need to have access of all signals that we receive or emit, from every sensory organs, sight, touch, smell, hearing, taste. It’s going to be a challenge for sometime, and when statistical models would have access of it, they have to deal with human irrationality.

Written by SK

December 9, 2013 at 5:59 am

Posted in Uncategorized

Is Job designation an intelligence Pill (a placebo)

leave a comment »

A year back while wasting time on Netflix, I discovered  “It’s Always Sunny in Philadelphia’. I watched couple of episodes, and was hooked on it. Though, I kept thinking that this show has low-class humor, and I wondered why did I like it?

I recommended the show to my friends. Without fail I pointed out to them that it has low-class jokes, and I don’t like the fact that I like it so much. After a week or two, I ended up reading about the show in Wikipedia, and discovered it was getting compared to Seinfeld. I liked Seinfeld, the moment I laid my eyes on it, and I liked the fact that I liked it. After all, it was a show about nothing! I thought again about “It’s Always Sunny in Philadelphia”, and collected feedback from friends about it, and decided that it’s a good show, and I don’t have to dislike myself for liking it.

So, now the 9th season is on, and there is an episode, “Flowers for Charlie“. In that episode two scientist who had developed an intelligence pill, selects Charlie as a lab rat considering the fact that he has intelligence of a rat. After taking intelligence pill, Charlie starts reading Tolstoy and Shakespeare and Hawking, denies doing any menial work in the pub, and realizes his friends are utterly stupid.  He was able to look at the real waitress too. An inane and crackpot girl, who he can’t understand why thought as love of his life for last eight seasons.

After taking all those intelligence pills, Charlie  presents the research that he claims is going to revolutionize the human society; everyone in the room is dumbfounded.

As an explanation of his stupid, Charlie-like research, the two scientists unravel that Charlie was getting placebo not the intelligence pill.

While explaining their findings both scientists show below chart. I am wondering, does that apply to corporate world?

Notice, how his arrogance increases even without increase in intelligence or knowledge.

Charlie's Arrogance

Since, now it’s a widely accepted fact that ‘It’s always Sunny in Philadelphia’ is like Seinfeld – a show about nothing, we can extract any intelligent insight we want to get out of it. I wonder whether designations or promotions in corporate world, in reality,  turns out to be placebo of  intelligence pill for most of the folks.

Of course, not everyone follows the path shown in the chart. In fact, like what analytic professionals are  striving to  apply in all walks of business, every individual might have different path. One chart like below exists for everyone at every moment of one’s professional life. Please notice the line item for ‘political awareness’ also, it means that ‘real’ knowledge might not increase significantly as we go in hierarchy, but ‘political  awareness’ certainly increases. Sometime we confound it with ‘real’ knowledge. Keeping Orwell’s ‘Animal Farm’ in context, ‘political awareness’ is what elite class flaunts as a sword over proletariat class, and proletariat class never realizes that the sword is made of  foam  that has color of iron throne.

Though, in general, lets also not discount that generally average intelligence increases as we go up the ladder. ‘A man rises up to the level of his incompetence’ is a nugget of wisdom I got from CEO of last company I worked for, and rings true to my ear the more I think about it.

Nonetheless, we have to agree that, some of us , some of the time ,are like Charlie.

its sunny 3

Written by SK

November 20, 2013 at 5:10 am

Posted in Uncategorized

History of Statistics

leave a comment »

Couple of months back I did a course on Bayesian Statistics organized by SF chapter of ASA and eBay-Google . The course was really great, and I wish I were more disciplined and grasped more from the course.

The instructor, David Draper, is really great. In addition to teaching us Bayesian Statistics, he talked about the history of statistical methods. When he talked about hypothesis testing and Neyman, he talked about the tools that we had in 1930’s with us.  To me, it was like, you teleport yourself to 30’s, and forget all the learning and tools (computers) we had so far in 100 years, and make yourself aware of the challenges and prejudices faced by scientific community of that time, and try to come up withsomething – that future generation would know as hypothesis testing.

Not just that when David Draper talked about Jerzy Neyman  he talked about Roland Fisher, when he talked about Fisher, he talked about Karl Pearson. He mentioned a book on history of statistics. As far as I recall, it was a book by Stephen M. Stigler, Statistics on the table. Though I am not sure, and I have to ask him again.

But, in any case, I ended up buying two books on history of Statistics and statistical methods, that I received last night . The book ‘Statistics on the Table’ and ‘The Lady Tasting Tea’. Since The Lady Tasting Tea talks about Statistics of the twentieth century, I am going to start with this. Or, pondering over my choice for a while,I think, I might be starting with this one because the book cover is more interesting – a cup of tea with a piece of lemon tucked on it, resting over chess-board styled tiled floor, and a lady with a hat on, not facing the cup, but  looking at the horizon where the sun is taking shelter for the night.

I just hope this book makes me smarter enough to design an experiment to find out the underlying reason of my choice – even with a sample size of one, as in my case.

Written by SK

November 13, 2013 at 4:44 pm

Posted in Uncategorized

YouTube Recommendation Engine

leave a comment »

While watching a YouTube video I was pleasantly surprised to see a video recommendation. I was watching a Hindi song, and got recommendation of an interview of P!nk on a topic that was highly relevant to the theme of the song. If you have seen the movie Abhiman, you are certainly going to be impressed with the recommendation!

recommendation Engine

Have YouTube statistical modelers  made their recommendation engine advance enough to recommend us videos based on our mood/sentiment?  I doubt that.

To understand, lets just think of the data YouTube collected from my activities. I watched couple of P!nk video. I am not sure whether I listened to any Hindi songs in the last couple of weeks, but that’s not of enough relevance here.  Now the possible hypothesis could be:

a:) YouTube knows I watch P!nk videos along with many more videos. Quite possible that randomly, just by sheer chance, it recommended me to watch one of the P1nk video.   Relevance of sentiment was just a fluke.  Well, this is always a possibility, and in fact, number of times chance and randomness are answers to so many puzzles we bump our head to. But, I am positive, recommendation engine is smarter than this.

b:) It’s quite possible, YouTube might have bucketed all the videos in their database based on ‘sentiments’. The video from Abhiman might have been bucketed under the same ‘sentiment’ as the interview of P1nk. Hence, the moment I watch the Abhiman video, YouTube recommended me a video  with the same sentiment, and of someone I watch.  I would guess YouTube might have started using this approach for number of recommendation, but I have some reservation around how well they might be using the approach. It’s difficult as you have to bucket a video in millions of group across multiple dimension. Sometime user generated ‘sentiment group’ is the answer, but getting as much data as you really want is a challenge.

c:) The third possibility could be the  approach that was, and it still is, the core of most of recommendation engines. There might be someone who would have watched the Abhiman video, and the same guy might have watched the P1nk video as well. Millions would have watched Abhiman, and most of them won’t have watched P!nk video subsequently. So, the recommendation rule tagged me along with a guy who watched P1nk video, as I too like P!nk video.

If you think about it, the real business rule could be a combination of any of the three hypothesis, including the one based on randomness. But, in any case, it was nice to  get the recommendation as I really ended up watching it.

Written by SK

November 6, 2013 at 7:04 am

Posted in Uncategorized

Business sense or data (statistics)

leave a comment »

Couple of weeks back, I was reading an article in Cricinfo by Ed Smith.  In the article, Ed Smith was talking about Alistair cook and captaincy, and made some really good points about our potential bias – alpha male/pro-adventure bias- when we judge quality of captaincy. In addition to making insightful commentary on captaincy and our perceived value of it, he went on to talk about the movie Moneyball and Trouble With the Curve, together. He didn’t comment much on it, but said that Eastwood’s movie is inverse Moneyball. In Moneyball, a computer savvy nerd makes a fool of guys who have been playing/watching the sports for a long time,  where as in Trouble With the Curve, a guy who does not know computer at all, who can’t even see properly, but who has lived and breathed his life in baseball; somehow, at last, comes winner when pitted against a guy who was all data and computer.

Understandably, in a commercial movie, to get the effect, to catch the audience attention, one has to make the story more dramatic than what the underlying idea truly is. A movie won’t sell if someone tries to make  it an intellectual debate about how world works without taking sides or trying to be equitable to all the conflicting ideas.

But, watching these two movies in the context, certainly helps put things in perspective.  And, it enriches the age-old debate of decision making in the  areas of mundane business world as well, not just the fanciful areas of sports. Though, we all know,sports management does not fall outside the realm of business world anymore.

It brings into forefront the age old debate of old vs new, experience vs exuberance, business sense acquired through hard work  on the field vs insights thrown outside from the know-it-all I-god.

Like all things in life when we are faced with two contrasting thoughts, most often than not, the truth lies somewhere in between.  The cliche’ might be true here as well. Or, may we don’t have to think of the two as competing ideas. Both ideas  complement each other, and when put together to work in tandem with each other brings wonder. But over reliance in the one, and putting a blind eye over another might have crumbling effect on business. In fact, if we think deeper we would realize that we can’t do justice to one without directly/indirectly using the other facets of decision making.

So far so good! The problem is, each one of us believes that where we stand, what we think, is ‘somewhere in between’ . We have the right mix  or perfect ingredient for decision making for a particular problem. Sometimes we might not be right!

I get skeptical when I see people going gaga over Moneyball. It’s certainly refreshing to know that an endeavor that looks so ‘personal’ can be understood through impersonal ways of looking at the results. But, to argue, to the extreme, that sports punditry acquired through by being on the sun for thousand of days is worthless, is  farcical.

Conversely, per Trouble with the Curve, we end up meeting folks ( we, ourselves, take these avatar sometime when pitted against a problem that are close to our heart) who don’t value statistics, and consider it a way to cloud the clear judgment.  Statistics or even the most complex algorithm has it’s value, but it’s true, it’s dangerous when we have placed our common sense safely in a cupboard.

Being in a profession of numbers, I am supposed to be biased towards use of numbers for decision making.  Oddly enough, I like it when I hear people saying they don’t have much faith in statistics. Reading between the line, that they really don’t like over-use or over glamorization of statistics, not just statistics. To me, it looks like that it might come when someone has sound intuitive understanding of statistics, and it is just an attempt to be dramatic.  But, it looks like there are some who really think statistics is a waste.

What we are discussing here is complex, and I might have to put more thoughts on it to write it clearly and decisively. But, here are my two cents on why this thought is as outrageous. To start with, statistics is just an extension of common sense.  Events that  we have observed thousand of times, that  we might not be able to recall one by one each  moment we are making a decisions, are condensed into useful wisdom using statistics.  Statistics is just a formalization of common sense where after putting rigorous effort  statisticians have developed methods ( we can call it thumb rules) that helps us make judicious decisions faster with the help of computers. it saves us the rigor of looking at tons of events s at once by super humans – only super humans can do this -, and making a decision based on common sense. And if we put in some effort, it’s not difficult to appreciate common sense behind a complex, impersonal statistical formulation.  Statistics is nothing but common sense made easy.

For example, if someone asks us what is 2*3. We will immediately say that it’s 6. The result makes business sense. if 3 guys give me $2 each, I’m going to be richer by $6. Now, if someone asks us what is 236237 * 978467345, we might get confused. Using calculator and spreadsheet, we can say, it is 231,150,190,180,765.  Now, I might start arguing that I don’t believe this number because I can’t verify it with common sense.

But if we just start relying on the consistent behavior of numbers, we can say that the result might be correct as two numbers that end in 7 and 5, when multiplied  gives a number that ends in 5; then we can go on, and say  231,150,190,180,765  is greater than 200000 * 97000000.  The calculation looks close, may be correct; once again 231,150,190,180,765  is less than 200000 * 100,000000, so it makes sense. Hence the numbers we’re getting from calculator might be correct.

The difference between, a complex statistical result and above big multiplication is  our attitude. In our mind we have developed a way to test big multiplication, so have started relying on machines.

And, if we put some thought and think a little harder, we would start realizing that appreciating  common sense behind a laborious statistical methods is not impossible.

Written by SK

October 18, 2013 at 6:14 am

Posted in Uncategorized

Who do you write like

leave a comment »

I stumbled upon recently.  The website has a tool. They are calling it, analyzer. it analyses your writing and tells you the famous writer you write like.

Working as an analyst for close to a  decade, and also having an interest in literature, I find this tool mind blowing.  I have been thinking really hard as to what kind of set up they have in the tool, what algorithm they have used, how much data (writing snippets) they have. If any of you have insight into it, let me know.

I put my writing on the tool to find that that I write like


I am more ambitious than this. I was not happy. I put my second sample of writing, and it was Corey Doctorow once again. Again, I was not happy with the result, but still it looked like the tool was working.  I tested the tool, by putting the piece of writing from famous authors, and the tool really worked. When I put writing from Chekhov, Chekov it was; when I put Hemingway, Hemingway it was! Is it that this tool keeps writing of famous authors in the database? it’s quite possible, but to keep all the writings from all famous authors is not an easy job.

I put my other writing snippets, and I got to know that sometimes I write like Arthur Clark, and sometimes David Foster Wallace. I was happy. Since I have read these two, I have been wondering, looking at my writing that  what did the tool pick up in the writing to say the writing was like Arthur Clark or DFW. I developed some idea, but still I am sure whether the tool could be this advanced. The writing snippet – it says- is like the writing of DFW, had some meta-writing in it, the kind you could expect in his writing. But still, I am not sure the tool can pick this up.

Between, when i pasted above piece of blog, I find that I am wearing a Dan Brown hat at this moment.

Written by SK

September 3, 2013 at 2:03 pm

Posted in Uncategorized

Humor, Sarcasm from & on Silicon Valley

Let's have a laugh together

Product Thinking

Peeling the layers of products that delight is the best place for your personal blog or business site.

%d bloggers like this: