In my last blog post, Hugh Howey and the Tsunami of Cash, I talked briefly about the recent results posted by Hugh Howey and his collaborator “Anonymous Data Guy,” who analyzed in detail the sales of category best-sellers on Amazon. (The first study looked at about 7,000 books and the second study looked at about 50,000 books.) See all their results at AuthorEarnings.com. (Note that this site is no longer online.)
These results had been criticized by a number of people, so I thought it would be useful in my blog post to try to estimate the broad spectrum of indie author earnings using the 80-20 rule.
I was able to make rough estimates of the number of units sold by indie authors from the very top earners all the way down to the very bottom earners.
Hugh left a comment on that blog post, and so did Chip MacGregor, a well-known literary agent. Chip is a “no-BS” kind of a guy, and his comment was quite long and had some good questions (but also a couple of clunkers). Chip is a long-time friend of mine, and was my agent for several years, and I consider him one of the good guys. I didn’t want to simply bury my response to Chip in a comment.
I’ve decided to do a whole new blog post today just to answer Chip’s questions.
First, just to set the context, here is Chip’s entire comment, which I’ll answer line by line in the rest of this post.
As a guy who is supportive of authors self-publishing, I find Howey’s work interesting, but not earth-shaking, For the record, he looked at one day of sales, at one company, and admittedly guesstimated many of his numbers based on what friends told him. Um… Would your PhD program have accepted that, Randy? Do you think his sample size is adequate? Would you allow him to create a trend line from that? The two big questions that stick in my head after reading this report: Can we rely on Amazon marketing info to be accurate? And if so, why isn’t Amazon sharing information with him?
I know the people who are raising questions about the validity of the study are being hammered as Luddites, but I tend to think this needs a bit more study before it’s declared as gospel. I like your idea of applying the principle of factor sparsity to the data, but your suggestions seem pretty optimistic. Out of more than a million authors, the average number sold is about 300. (Nothing wrong with selling 300 copies, mind you, and if they were charging a couple bucks, they made themselves about $400, which is better than a kick in the head… but it’s not the windfall you seem to make it out.) I’m not sure why you state that “the average and median sales are not very useful,” Randy. Seems as though those are very useful to give context — as in, “There are more than a million authors on Amazon, and last year 150 of them sold more than 100,000 copies.” On the one hand, I celebrate the successes. On the other, you have to admit those are fairly long odds. Again, I hesitate to say that, because everybody WANTS this to be true, and to have discovered the secret to making a lot of money at this crazy business.
I notice Hugh came on your site to say he personally knows “several others who sold multiple millions last year.” Um… this is the sort of thing that makes me wonder about his veracity. I guess I tend to doubt that he knows “several” who sold “multiple millions.” Several? Really? Even your quick data analysis doesn’t support that, Randy. Look, I’m a guy who has self-published books and done well, and who encourages the authors I work with to self-pub… but I don’t like the Amway-like atmosphere being promoted by people who want to make it sound like there are publishing fairies out there, waiting to sprinkle hundred dollar bills onto everyone. My two cents.
Randy sez: Now that you’ve seen Chip’s comment in full, I’ll repeat it line by line and respond to each logical unit.
Chip wrote:
As a guy who is supportive of authors self-publishing, I find Howey’s work interesting, but not earth-shaking,
Randy sez: I find it both interesting and earth-shaking. Here’s why. I knew that indie authors were doing well. I know many of them. I’ve seen the difference indie publishing makes in their lives. I know the kind of sales numbers they’ve been getting.
What I didn’t know is that indie authors (as a group) have just about reached parity with the Big 5 authors (as a group). That is, the set of indie authors Hugh and Data Guy analyzed are moving just about as many copies and earning just about as much money as the Big 5 authors they analyzed.
I don’t think anybody knew that. That’s why it’s created so much excitement.
Chip wrote:
For the record, he looked at one day of sales, at one company, and admittedly guesstimated many of his numbers based on what friends told him. Um… Would your PhD program have accepted that, Randy?
Randy sez: He’s now looked at two days worth of sales, and the two sets of results are in very reasonable agreement. The company he looked at was Amazon, which is by far the biggest player in e-books. But he’s now doing a study on B&N, and I think we’re all looking forward to those results.
As for his method of analysis, it’s more sophisticated than guesstimating based on what his friends told him. Hugh and other indie authors have been compiling data for years that allow them to accurately correlate a book’s sales rank with its actual sales. This is approximate, but it’s a very reasonable approximation, and the Law of Large Numbers tells us that statistical fluctuations will wash out pretty quickly as you get more data. And 7,000 books is a boatload of data. 50,000 books is even more.
As for whether my Ph.D. Program would have accepted that, let’s not be silly. I got my Ph.D. In quantum field theory at UC Berkeley. That’s a high standard of rigor, and it’s far beyond what people normally try for in real life.
From what I can see, Hugh’s calculations are well above the usual standard in the book industry. I’m looking at the BookScan report for my book WRITING FICTION FOR DUMMIES right now. For the last royalty period, BookScan underestimates paper sales by about 30% and it has no estimate at all for e-book sales.
So the real question is whether Hugh’s data increases our knowledge of author earnings. My judgment is that it does.
Chip wrote:
Do you think his sample size is adequate?
Randy sez: Yes, even for the first data set, which looked at about 7,000 books. For the second data set, it’s an embarrassment of riches, with around 50,000 books covering all categories, fiction and non-fiction. This is good stuff.
Chip wrote:
Would you allow him to create a trend line from that?
Randy sez: No, of course not. Chip, that was a bad question. Hugh isn’t analyzing the rate of change of things. He’s analyzing the state of the industry right now. In any event, you can’t create a trend line from one day’s worth of data. (Now he has two days’ worth, but he’s doing a static analysis, not trying to predict changes, so a trend line is really beside the point.) And of course, Hugh didn’t create one.
Chip wrote:
The two big questions that stick in my head after reading this report: Can we rely on Amazon marketing info to be accurate?
Randy sez: I didn’t understand this first of the two questions, so I emailed Chip to ask what it means. He emailed me back to restate it:
Since the bestseller lists at Amazon are largely seen to be a marketing tool, do we want to rely on them as a database for research?
Randy sez: OK, I see now. Chip is saying that many people believe that the sales rank for books isn’t strictly correlated to actual daily unit sales. Most people believe that Amazon uses other factors to determine the sales rank, and some of those factors might be Amazon’s marketing needs.
This means that it’s possible that the daily sales rank for a book would not be a good predictor for its daily unit sales. In that case, Hugh’s calculations with Anonymous Data Guy would be incorrect.
Fortunately, that is a testable question. Here’s how to test it mathematically:
All you have to do is look at the raw data for a large number of books. Each day, you look at the sales rank (which is public information) and you look at the actual units sold (this is private information that Amazon only tells the publisher). Since indie authors are publishers, they can easily compile this raw information and then any math person can model it.
I would model it as a Pareto distribution curve, S = C/(R**E), where:
- S = daily unit sales
- C = some unknown constant to be determined by the data
- R = the sales rank on the given day
- E = some unknown exponent to be determined by the data
So the mathematical solution is to do a least-squares fit to the data to determine the best values for C and E. Then do a chi-squared analysis of the fit to see how well the theory fits the data. This is easy to do. We could also compute variations from the best-fit. This would tell us the uncertainty in the calculations presented by Hugh and Anonymous Data Guy.
I don’t know if Anonymous Data Guy has done this calculation, but it’s not hard and it would answer Chip’s question. I have sent Hugh an e-mail about this issue.
Chip’s second question:
And if so, why isn’t Amazon sharing information with him?
Randy sez: You’d have to ask Amazon, but my understanding is that they hardly ever share any info with anyone. One thing indie authors like is that Amazon does give them up-to-the-minute sales information, which is a welcome change from the hassle it takes to get info from traditional publishers.
Chip wrote:
I know the people who are raising questions about the validity of the study are being hammered as Luddites, but I tend to think this needs a bit more study before it’s declared as gospel. I like your idea of applying the principle of factor sparsity to the data, but your suggestions seem pretty optimistic. Out of more than a million authors, the average number sold is about 300. (Nothing wrong with selling 300 copies, mind you, and if they were charging a couple bucks, they made themselves about $400, which is better than a kick in the head… but it’s not the windfall you seem to make it out.)
Randy sez: Well, as I said in my post, the average and the median are pretty useless because they’re both dragged down by the great mass of unpolished writers. Two things are important:
- How well are the top-performing indie author compared to the top-performing traditional authors?
- Roughly how many indie authors are at each pay level?
So my blog post was aimed at guessing the answers to these questions. The answer is that about 10 indie authors are moving more than a million copies a year. That sounds pretty cool to me. Look at the other numbers in my post! There are opportunities here for a couple of thousand indie authors to be moving more than 10k copies per year. That makes it clear that the whole “outlier” thing is a myth.
Chip wrote:
I’m not sure why you state that “the average and median sales are not very useful,” Randy.
Randy sez: The reason is simple. We’re used to that pesky “bell-shaped curve” when talking about results. We know that the average man is about 5’9” tall, and the standard deviation is about 3 inches. We know immediately from this data that a 7 foot man would be exceptionally tall and a 5 foot man would be quite short. Note that both of those extremes are reasonably close to the average (and the median). So the average and median are useful numbers for understanding bell-shaped curve distributions.
But the Pareto distribution is wildly different. The top-selling author in my estimates was selling 7.5 million copies. The average author was selling just under 500.
If the top-selling author were as tall as he is rich, he’d be almost 17 miles tall!
That is the sense in which the average is not very useful for a Pareto distribution. The average gives us no information at all on what we should expect from peak performers.
I won’t belabor this, Chip, because I know you’re familiar with the Pareto distribution. You blogged about it recently on your own blog, in your article The Pereto Principle. (Aside from misspelling “Pareto,” it was a good article.)
Chip wrote:
Seems as though those are very useful to give context — as in, “There are more than a million authors on Amazon, and last year 150 of them sold more than 100,000 copies.”
Randy sez: No, the average does NOT give the correct context for a Pareto distribution. When you have a bell-shaped curve, you typically report two pieces of information–the average and standard deviation. Anyone who understands the bell-shaped curve then immediately understands the complete spectrum.
With a Pareto distribution, you also report two pieces of information, but they AREN’T the average and standard deviation! The two pieces of information you report are the earnings of the top-performer and the critical exponent (in my calculations I used .8613, which is the exponent for the 80-20 rule). Anyone who understands the Pareto distribution then immediately understands the complete spectrum.
Chip wrote:
On the one hand, I celebrate the successes. On the other, you have to admit those are fairly long odds. Again, I hesitate to say that, because everybody WANTS this to be true, and to have discovered the secret to making a lot of money at this crazy business.
Randy sez: Everybody agrees that the odds of a major success are long. I said this in my blog post in October of 2012, Liars and Outliers in the Publishing World. Joe Konrath has said this many times, most recently in his blog last Friday, where Barry Eisler did a guest post and then Joe chimed in: Eisler – Publishing is a Lottery & Konrath – Publishing is a Carny Game. I don’t know of anyone who claims that every indie author is going to get rich.
Chip wrote:
I notice Hugh came on your site to say he personally knows “several others who sold multiple millions last year.” Um… this is the sort of thing that makes me wonder about his veracity. I guess I tend to doubt that he knows “several” who sold “multiple millions.” Several? Really? Even your quick data analysis doesn’t support that, Randy.
Randy sez: Actually, my analysis is consistent with Hugh’s statement. My calculations estimate that there are four indie authors who moved more than 2 million copies last year and ten indies who moved more than a million. I can think of several off the top of my head who sold more than a million last year, and at least two of them I’ve met in person. I’m sure Hugh knows a lot more of the heavy-hitters than I do. I don’t know who he has in mind, but it sounds plausible to me.
Chip wrote:
Look, I’m a guy who has self-published books and done well, and who encourages the authors I work with to self-pub… but I don’t like the Amway-like atmosphere being promoted by people who want to make it sound like there are publishing fairies out there, waiting to sprinkle hundred dollar bills onto everyone. My two cents.
Randy sez: I’m also opposed to the Amway mentality, and I’ve consistently pointed out on this blog that only a few authors will ever get super-rich. But let’s remember that with a Pareto distribution, we’re interested in the expected earnings of the top performer. The expected earnings of everyone else follows from that. So it’s REQUIRED that we talk about top-performers, even though this misleads people who want to think in terms of bell-shaped curves.
The blunt truth is that most authors won’t do very well. What I’m interested in is the spectrum of author earnings from the very top all the way down to the very bottom—how many authors are at each income level. That tells authors how to plan their careers (and it might keep a few people from quitting their day jobs prematurely).
My Pareto calculations are a first cut at answering that question. I hope to show more data soon.
Chip, thanks for your questions. It’s important to ask questions, because the issue of author earnings is important.
I think that we’ll get a fuller picture as Hugh and Anonymous Data Guy continue to analyze more data. I don’t think we’ll see a radically different picture as we get more data.
I think we’ll continue to see that indie author earnings are spread across an enormous spectrum, with a very few authors earning millions per year and hundreds of thousands who earn only a few hundred per year.
The key thing is that a couple of thousand indies are earning some tens of thousands per year. That’s the “broad shoulder” of the Pareto distribution, and it’s where most professional novelists will find themselves.