The Laboratorium : Principles and Recommendations for the Google Book Search Settlement

I’ve reviewed the proposed settlement in the Google Book Search case, along with its fourteen appendices. I’ve also talked with a number of my favorite smart people, some in Google’s pocket, some opposed to all things Google. I offer the following as a set of guiding principles (numbered P0 to P5 and in bold) for the court as it considers whether to approve the settlement, and for the public to help in thinking about the effects of the settlement. Interwoven with them are more specific recommendations (numbered R0 to R15 and in italics): concrete changes the court ought to make to the settlement.

If you’d like to go straight to the recommendations, there’s a list at the end of this document.

P0: The settlement should be approved

My starting point is that the settlement is a good thing. Everyone is better off than in a world where the alternative is no Google Book Search.

Google will take in a lot of money selling e-books to consumers, subscription databases to libraries, and book search ads to advertisers.
Authors and publishers will receive the majority of that money. They can choose the price they sell individual copies at; they’ll get a proportion of the revenues from other uses based on how popular their books are.
Public and nonprofit libraries will get at least some minimal all-you-can-drink privileges at the fire hose.
Universities, schools, and lots of other institutions will be able to subscribe to the fire hose of books, as well.
The libraries participating in scanning books will get back digital copies of the books from their collections. While there will be usage restrictions on the in-copyright ones, the digital copies of the public-domain ones are not to be sneezed at.
The public as individuals get an incredibly useful book search engine, one that will come increasingly close to being genuinely comprehensive over time. We also get another convenient source of e-books, free PDF access to millions upon millions of public-domain books, and some degree of full-text library-based access to the rest.
The public at large gets a substantial leg up on solving the orphan works problem. This system will encourage some copyright owners to come forward, will enable many sensible uses of many books for which no copyright owner can be found, and will help in cleaning up the records to help track down copyright owners in general.

These are serious benefits, and the settlement is a universal win compared with the status quo.

Is the status quo the right point of comparison? I’d been hoping that Google would establish that its scanning and searching features were fair uses. That would open up the book search business to anyone, free from legal taint. What’s more, it would give us a powerful, portable fair use principle that could do a lot of other good in this digital age. Fair use fans have been complaining that the settlement deprives us of that determination.

I do regret that loss, and I would rather have a definitive fair use finding than the settlement. But that’s not the choice on the table, and there are several good reasons why we should settle for the settlement.

First, while I thought that the fair use finding ought to be favorable, I don’t necessarily expect that it will be. This may be the most doctrinally controversial copyright issue since Grokster; scholars are split pretty much down the middle. There was a very strong chance that a court would have found definitively against fair use, and the damage such a finding could do would be immense. I’m not a gambling man, and while I thought the odds were decent, this bird in the hand definitely beats the ones in the fair use bush. The settlement is so comprehensive, when it comes to books, that it gives us perhaps 80 or 90 percent of the actual uses of books a positive fair use finding would have enabled. I’d rather join the all-media fair use battle in another case where the equities tip more clearly in favor of fair use.

Second, even if we scholars would like a fair use fight, it’s not our call to make here. It’s Google’s. Google was the defendant; it earned that dubious privilege by actually scanning and searching books. Having stepped up to the plate to risk a lawsuit—and having gotten beaned with one—Google now has the right to choose whether to settle. Let’s keep in mind that Google’s choice to settle takes away no legal rights from anyone else; one else loses the fair use argument because Google didn’t chance it.

I have seen raised the the argument that Google’s capitulation means there’s now a functioning licensing market for searching books. If true, this fact would undercut fair use claims by Google’s competitors, since it would imply that scanning-and-searching without payment takes away revenue that copyright holders could have realized. The hole in this argument is that this isn’t a “market” one can negotiate in without at some point needing to file lawsuits. As I explain in part 2, the settlement by itself does not result in a situation where Google’s competitors can cleanly license all the rights they need to start large-scale scanning of the in-copyright but out-of-print backlist the way that Google did.

And third, given that Google and the author/publishers are the parties here, I do not see how the court could be forced to consider the fair use issue. The adversarial legal process lets parties present their cases and controversies to a judge for resolution. It doesn’t generally let outsiders force the parties to litigate issues not of their choosing. This fact means that to the extent the rest of us want the court to modify the settlement or even to think about particular issues, we need to find a hook to put them properly before the court. There are some such hooks (see below), but the fair use question isn’t one of them. I don’t see a way to cram that specific copyright issue under the general “public interest” heading the court is directed to take into account.

Thus, I start from a baseline of believing that [R0] the settlement should be approved. It makes all of us better off, and it is, in an intuitive and meaningful sense, quite fair to all involved. I have concerns and critiques, but I see them as patches to make the settlement better, not as do-or-die clauses the settlement must absolutely contain. Mend it, don’t end it.

Thus, the implementation of our zeroth principle, approve the settlement, is simple:

R0: Approve the settlement.

P1: The Registry poses an antitrust problem

The agreement establishes a powerful new Book Rights Registry. It’s necessary to have one for basic administrative tasks: maintaining a database of who owns what copyrights, mediating ownership disputes, processing payments, and so on. It’s also smart to separate the Registry from Google: the Registry has less of a conflict of interest in tracking down AWOL authors, and it’s suitably adversarial when auditing Google’s records and security procedures. A centralized Registry that operates on behalf of all authors and publishers is one of the pieces of our current copyright institutions (think of SoundExchange) that the settlement cleverly remixes into its new system.

The Registry doesn’t just have ministerial tasks, though. It also has a lot of authority to negotiate on behalf of authors and publishers. It can negotiate the terms of “New Revenue Models” (such as print-on-demand, PDF download, coursepacks, and so on). It has approval power over the security plans that Google and the various libraries use. I read the settlement to give it some power to negotiate the terms of the licenses that participating libraries must agree to in order to participate. It has broad discretion to work out an equitable formula for dividing revenues among authors.

If your antitrust sensors aren’t pinging wildly at this point, please make sure they’re properly calibrated. The Registry is a centralized entity with the authority to negotiate on behalf of all book copyright owners. As such, it walks and quacks like a cartel. My antitrust is a bit weak, but my understanding is that were all authors and publishers (i.e., the plaintiff class, more or less) to gather in a room and sign a piece of paper giving the Registry the powers listed above, it’d be per se illegal as a monopolization of the book market. They could do some frightful things to dictate terms to resellers and readers.

The settlement recognizes this danger, and accordingly puts some very important limits on the Registry. It’s specifically prohibited from representing any subgroup of copyright owners; it has to act in all their interest. (This keeps it from being used by one group of authors, say, to suppress the market for another group’s books.) Similarly, its board is equally divided between authors and publishers, with any action requiring a majority.

The most important defanging provisions are that copyright owners have (a) unlimited authority to strike side deals with anyone they like, (b) nearly unlimited authority to pull out of the program entirely, and (c) unlimited authority to set the prices they charge for e-books sold through Google. These facts keep the Registry from acting like a classic price-fixing cartel; individual publishers can easily defect and charge less (or more). It’s true that there’s a bit of coordination, but it’s not really much more than you’d see with any online book sales—publishers can already see what other publishers’ suggested prices are by looking on Amazon.

Why, then, is the Registry an anticompetitive threat? It has to do with the class nature of this class action settlement. If Google would like to negotiate, say, an encryption standard and DRM terms for book downloads, without the Registry, it needs to negotiate one-on-one with authors and publishers. But the Registry is authorized to negotiate on their behalf, all of their behalf. It could agree with Google on a privacy-intrusive DRM system that fed back usage information into a database used to do industry-wide price-fixing in the guise of price discrimination. Yes, authors are able to opt in and out individually, but the Registry’s centralizing role permits various anti-competitive practices to be laundered through its coordinating negotiating function.

Therefore, while the Registry required by the settlement is not necessarily an antitrust problem, the Registry permitted by it could well be one. What then, should we do about it?

First, the Registry’s structural protections should be supplemented. It’s acceptable and understandable for the Registry’s charter to require that at least one author representative and at least one publisher representative consent to any action it takes, but that veto rule doesn’t require that they be the only members of its board. There should also be [R1] members appointed to represent libraries and the reading public. In addition to objecting if the Registry takes anti-competitive, anti-reader actions, these additional members would be able to monitor its actions, bringing important transparency to this new quarter-ton gorilla of the book industry.

Second, the Registry should be under ongoing antitrust scrutiny from the day of its birth. As a condition of settlement approval, the Registry should be [R2] required to negotiate and sign an antitrust consent decree with the Department of Justice. That decree would enumerate various forbidden anticompetitive practices—including those called out in the settlement, ones pertaining to hidden price-fixing (as in the example above), and whatever else the experts in the Antitrust Division think necessary to add. In addition, the Department would be given the ability to review any contracts entered into by the Registry and reject any with anticompetitive effect.

Third, the Registry should be required not to embrace the one distinction among copyright owners explicitly contemplated by the settlement agreement: that between current copyright owners and future authors. Only the former are part of the lawsuit, which means that only they can be part of the settlement. The most immediate danger is that the Registry might adopt policies that operate to the benefit of past authors, and against future authors (policies with narrow views of fair use and broad views of a derivative works right come to mind). There’s an easy way out, which is that [R3] the Registry must be explicitly required to represent any future copyright owners who agree to its standard deal, and explicitly forbidden from offering future copyright owners any materially different deal. This pair of rules guarantees to anyone who comes along in the future effectively the same opt-out right that present copyright owners, while offering them exactly the same terms if they choose not to opt out. This way, the Registry will truly be a fair, impartial representative of all authors and publishers.

Thus, to implement our first principle, that the Registry poses an antitrust problem, we have three recommendations:

R1: Put library and reader representatives on the Registry’s board.
R2: Require the Registry to sign an antitrust consent decree.
R3: Give future authors and publishers the same deal as current ones.

P2: If it didn’t already, Google poses an antitrust problem

The Registry isn’t the only entity in a privileged position as a result of the settlement. Google is too. It becomes the only game in town for scanning and searching books on anything resembling this scale. Yes, it was the only game in town on this scale before—but that was when there was a legal threat hanging over it. Now Google will have the legal okay to go full-steam ahead, with some exceedingly tasty markets all to itself.

The immediately rejoinder from Google, of course, would be that this is not a market closed to entry. Microsoft used to have a book-scanning program; it could have one again. There’s nothing in the settlement to prevent anyone involved from doing side deals with others. Authors could license Yahoo! to scan, index, and sell. The Registry could split the take. Libraries could subscribe to Yahoo!’s version of the fire hose. The settlement just sets up a series of deals with Google; it leaves everything else open for all comers.

That’s what they’d say, but it’s not quite the case. In the first place, the settlement has a most-favored-nations clause in Google’s favor. Individual copyright owners can strike whatever side deals they want, but the Registry can’t give anyone else better “economic or other terms” than Google gets for the first ten years. Notice, for example that this now precludes the Registry from offering a better revenue-sharing deal to Yahoo! even if the Registry thinks this better deal is necessary to turn Yahoo! into a serious player in the market. Having dug this particular mine, Google now gets to the benefits of any other gems someone else locates within it. This clause alone might be enough to keep everyone else away. Google’s concern about being undercut is real, but provided that the Registry itself is under proper antitrust scrutiny (see above), it has nothing legitimate to fear. [R4] The most-favored-nations clause should be struck.

There’s also another, subtler problem here. You can’t just go out and do what Google did. Remember that one of Google’s fair use arguments was going to be the insane transaction costs of trying to negotiate with every possible copyright claimant, particularly for out-of-print and orphaned books. The settlement gives Google a clean release from all of it. All those pesky claims from authors who couldn’t be found go away. No one else can rely on the settlement to do that work for them.

That leaves a would-be competitor like Yahoo! with some unappetizing options. If it goes ahead and starts doing large-scale scanning, it’ll get sued just the way Google did—but there’s no guarantee that the plaintiffs there would feel any interest in settling, or settling on terms comparable to the ones Google got. Indeed, as long as there were some potential plaintiffs out there, Yahoo! couldn’t feel safe, even though it had struck agreements with 99% of them. The others could still pull enough of copyright’s harsh remedial levers to scotch the whole enterprise.

No, Yahoo! would need the same magic device of the class action that Google is now taking advantage of. Would the plaintiffs bother to organize themselves as a class for its benefit? There’s no guarantee they would. That puts Yahoo! in the especially tricky situation of filing a declaratory judgment action against a class of copyright owner defendants. It’d be hard even just to pick proper class representatives and appoint appropriate class counsel without some kind of collusion. And again, once there were class representatives, would they settle on comparable terms? There’s no guarantee of it, especially given the deal they’re getting from Google.

I’ve also heard floated the idea that competitors are perfectly free to lobby Congress on orphan works legislation. So they are, but the argument that this is an acceptable substitute for free competition in the book market as it currently exists is laughable. If I manufacture widgets and my competitor is monopolizing the widget market, it’s no answer to my pleas to say that I can go off and lobby Congress for widget subsidies. Orphan works legislation, done right, would be a great thing. But no one should have to count on it happening as a condition of entry to a market Google is already in.

Thus, Google’s first-past-the-post status here could easily turn into a long-term monopoly. There are plenty of non-legal reasons why—first-mover advantage and network effects come to mind. One may or may not think that a monopoly built on such structural bases is legitimate; one may or may not favor government intervention if it just so happens that Google is the only player in this game. I take no position on that question. But I think it safe to say that the court reviewing this settlement should not erect its own power—in the form of its ability to bind absent class members—as a barrier to entry in the online-books and book-search markets. What, then, should the court do?

First, [R5] any other entity willing to assume the same payment and security obligation that Google assumes in the settlement should be allowed to offer the same services that Google will. (It should also be able to offer any subset of them it wishes to.) This kind of competition is fair to authors and publishers because the various payment and security terms are already presumably acceptable to them. It’s fair to Google, which gets the same deal it currently does. It’s fair to others in the field, who could enter on a level playing ground, without needing to roll the dice on invoking the legal system’s power to intervene. And it saves the legal system the work of having to deal with the Microsoft Book Search lawsuit, the Yahoo! Book Search Lawsuit, the Laboratorium Book Search lawsuit, and so on.

Second, [R6] the Registry should be authorized to negotiate deals with Google’s competitors on the same terms it’s authorized to negotiate with Google. (This rule is the logical extension of the previous one to new business models, which can’t be specified in detail in the settlement precisely because they don’t exist yet.) Under the proposed settlement, the Registry can give its blessing to Google on plenty of projects; it should be allowed to give the same blessing to anyone else. Crucially, that blessing would have the same effect of binding all authors and publishers to the deal it strikes. Again, authorizing the Registry to do such things is fair because the “binding” is only a default, one with a reasonable and explicit opt-out option.

Thus, to implement our second principle, that Google poses an antitrust problem, we have three further recommendations:

R4: Strike the most-favored-nations clause.
R5: Allow Google’s competitors to offer the same services the settlement allows Google to offer, with the same obligations.
R6: Authorize the Registry to negotiate on copyright owners’ behalf with Google’s competitors.

P3: Enforce reasonable consumer-protection standards

Once again, I need to call out some aspects of the settlement for particular praise. The negotiations produced a document that shows more regard for readers than I would have expected going in. There are clauses in it that aren’t strictly necessary to resolve the dispute between the parties to the lawsuit but that go a good way toward making sure that the results will provide books to the public on fair terms:

There’s a special section on “Non-Consumptive Research.” (Although it’s a term of art in copyright law, the term still sounds funny. Other than tuberculosis studies, what research is “consumptive?”) The settlement makes provision for researchers to run gigantic statistical studies on the entire corpus of scanned books, the point being that these uses don’t intrude on any author’s interest in making money from having her books read by people, while at the same time advancing human knowledge about algorithms, natural language, the history of publishing, and other topics.
Google specifically promises that it won’t use pop-up or pop-under ads. It also authorizes the Registry to take swift action to opt authors out of having their books shown with “animated, audio or video advertisements.”
In its program for institutional subscribers to the fire hose, Google guarantees that its terms and conditions will “not prohibit any uses … that would otherwise be permitted under the Copyright Act.” This is significant; it’s a commitment that the subscriptions won’t require readers to surrender their fair use rights and so on and so forth.
Similarly, the institutional subscriptions will never offer an “experience and rights” worse than those enjoyed by readers who purchase e-books for themselves through the program.
There’s a public-access service that’ll put free fire-hose terminals in colleges and public libraries. The service is stingy by default: colleges get one terminal per 10,000 students, and public libraries get one per building, and Google isn’t obligated to provide the service at all. There’s authority for Google and the Registry to expand this program, which I applaud, even though I’m also unsure whether the Registry would ever approve more generous terms.
The Copyright Act has various provisions that operate in favor of libraries and other public-service entities, and the settlement makes sure those provisions remain intact. Libraries that allowed their books to be scanned get back digital copies, which they can use for accessibility purposes, and to replace damaged or lost copies of physical books.

These are all to the good. There are some places where the settlement could have gone further, but didn’t. In some cases, there are good reasons.

Thus, I was initially concerned to see that Google in effect conceded the fair use case as to “snippets”—a few lines from a book before and after the occurrence of a search term. To me, this was a strong fair use claim (once reference works, poetry, and other books that come in small chunks already are excluded), and a very useful feature. But the settlement gives copyright owners of all books the discretion to remove their books from “Display Uses,” and turn off snippets. I’m informed that this isn’t such a big deal. You see, out of the 100 million or so books in the world, only about one and a half million of them are in print—and in-print books are the only ones opted out of Display Uses by default. But Google has about a million books in its Partner Program (a number that may grow once the lawsuit is resolved and with it much of the mutual suspicion). So, practically speaking, the gap of books for which no snippets are available in search results may simply not be large enough for long enough to make a big deal out of.

There are, however, other consumer-protection matters on which the settlement is silent or ambiguous. Google insists on its good intentions in many of these areas. (“Don’t be evil,” and all that.) I would prefer to see stronger protections than Google’s continued promises of non-evilness. It’s in the nature of such assurances that they’re still offered long after they’ve ceased to be true. To the extent that Google really means to abide by them, it should have no objection to putting these terms explicitly in the settlement agreement or in an FTC consent decree, right?

First, [R7] Google’s fancy new pricing algorithms shouldn’t price-discriminate among buyers. For institutional subscriptions, the settlement has been drafted to do pricing on an FTE basis with different pricing buckets for different categories of institutions—e.g. higher education, corporate, government, etc.—which promises to be largely fair. For individual buyers, either booksellers can pick a price, or they can let Google’s algorithms set a price for them based on buying patterns. The settlement doesn’t explicitly say that Google won’t charge different readers different prices for the same book, something that its immense computational power and huge pricing corpus might make dangerously attractive. They don’t have current plans to do so, and it’s true that the market punished Amazon harshly when it tried the same stunt a few years back, but still. Better safe than sorry. Cross out price discrimination.

Second, I’m worried about privacy in some parts of the agreement. Especially for individual online e-book purchases, user browsing through book search results, and institutional subscriptions, there’s a real danger that readers could be identified and tracked through their precise reading habits, page by page, minute by minute. Indeed, the settlement explicitly requires that libraries which open up their digital copies for scholarly and classroom uses of less than five pages must “keep track of and report[] all such uses of Books to the Registry.” Putting on my most charitable hat, I’d say that some kind of report is necessary for auditing, that the Registry would be fine with aggregate reports, and that the libraries will be sure to negotiate better privacy terms for themselves than Google has.

But still, the only explicit privacy protections in the settlement are about keeping private the information the Registry has about copyright owners. That’s insufficient. [R8] The settlement should contain explicit privacy guarantees that user information and reading habits should be monitored only that minimal extent necessary to audit for security and billing, that no such data be used for any other purpose, that all such data be promptly destroyed, and that Google not reveal any information about any user or users’ reading habits to any other entity, including the Registry. Further privacy principles will not be hard to articulate; there’s been plenty of good scholarship and activism around reader privacy and online privacy.

Third, while Google commits to reasonable terms and conditions in its institutional subscription option, it makes no such promises about individual e-book purchases. It should. The settlement should [R9] protect reader rights under the Copyright Act across the board. There’s a good first cut at such language in the settlement already—I quoted it above. Similarly, the Library-Registry agreements should contain explicit statements that library terms of service will not require readers to give up any of their other rights under the Copyright Act. Not fair use, not first sale, not noncommercial public performance, not anything. Any new business models should come with similar protections.

Thus, our third principle, enforce reasonable consumer-protection standards, comes with its own three implementing recommendations:

R7: Prohibit Google from price discriminating in individual book sales.
R8: Insert strict guarantees of reader privacy.
R9: Protect readers from being asked to waive their rights as a condition of access.

P4: Make the public goods generated by the project truly public

Merely in order to carry out this immense scanning and indexing project, Google has had to assemble a number of important databases. In particular, it now possesses an immense amount of bibliographic information; a pile of scans is useless unless linked back to the right publication metadata. Its decisions about which books to preview depends on an immense number of informed guesses about whether books are in print or not. And in order to convey copyright-owner requests to Google and to convey payments from Google back to them, the Registry will be building a huge database of book copyright ownership.

These databases are all public goods. They’ll be useful to readers and researchers. They’re also going to be immensely useful to players in the book business. That in-print database will help libraries understand their rights under copyright law; the rights-owner database will help publishers gather the rights they need to publish new and exciting editions.

Moreover, these databases are byproducts of the Google Book Search project, not its goals. Google is not compiling them to make money selling access; it’s compiling them because it can’t offer Book Search without them. Whether Google (or the Registry) can monetize them directly or not is not going to affect the incentive to compile them.

Taken together, these propositions imply that these databases should be opened to broad public access. Granted, this will not always be possible I’m informed that Google has assembled the first database—the bibliographic metadata about publication—largely by licensing it from other sources. Scholars may contest whether such databases should be capable of exclusive licensing, but even those who want to pick that fight shouldn’t pick it here. Google didn’t generate this data; it shouldn’t be forced to reveal it.

The correct principle, instead, is that to the extent that Google creates useful metadata databases as part of the Book Search project, those databases should be offered to the public, gratis, and without legal or technical restrictions. The settlement agreement contemplates at least two two such databases, both important (though the principle might also apply to others).

First, there’s the database of in-print information that Google needs to make its calls as to whether works are “commercially available” and thus some uses will be restricted by default. Google currently synthesizes this information from a variety of sources (such as looking at used book sales online). Google is required by the settlement to make this information available to the Registry on behalf of copyright owners; it should be [R10] required to make the database of in-print information public, as well. This is not likely to be a practical problem, since Google will all but inevitably expose this information when it lets users either see preview pages or not, and Google seems eager to expose it more directly, as well. But still, Google should not be given the option to restrict its availability; in case Google does wind up having competitors in this space, there will be an inevitable temptation to restrict it as a way of slowing down the other guy.

Second, there’s the database of information about copyright claims that the Registry will need to use to distribute payments among copyright owners. The Registry is required to share much of this information with Google; it should be [R11] required to share almost as much of the copyright-owner database with the public. There are privacy concerns here, since it will contain information about authors, but those concerns can be accommodated without much limiting the usefulness of the database in solving orphan works problems. Pseudonyms and proxies are reasonable, provided the database in general is made available so that others can use it as a point of contact in finding copyright owners—or verify that no one knows who the owner is or where she can be found.

In making these databases available, and in providing some of the other core services (the exact set to be determined), Google should be required to [R12] use standard APIs and open data formats, as well as to allow programmatic access and bulk download where appropriate. Google currently does this as a matter of policy in many of its other lines of business, and it’s already providing PDF downloads of public-domain books. These good policies should be enshrined as actual requirements. Among other things, they’ll ensure that Google’s competitors behave reasonably, too.

Thus, there are three recommendations to implement the fourth principle, that of making public goods truly public:

R10: Require that Google’s database of in-print/out-of-print information be made public.
R11: Require that the Registry’s database of copyright owner information be made public.
R12: Require the use of standard APIs, open data formats, and (for metadata) unrestricted access.

P5: Require accountability and transparency

Google has been repeatedly criticized by scholars and activists upset at its lack of institutional transparency. It has also learned from that criticism. The settlement agreement contains a number of reassuring provisions to provide accountability. As good as they are, they should be supplemented with a few more.

Institutionally, Google and the Registry are given mutual rights to audit each others’ relevant books. While these audits are themselves confidential, the arrangement creates a healthy system of mutual accountability. Similarly, research users and libraries are subject to security audits under suitable procedures. When there are disputes about public-domain or in-print status, they’re subjected to a low-stakes initial process that lets the parties sort out the facts. Larger disputes go first through required executive-level mediation and then arbitration, on terms that don’t strike me as unbalanced in any significant way.

Google has also accepted a fairly stringent set of rules that prohibit it from altering the texts of the books it scans. There are the usual, sensible carveouts: Google can hyperlink indices, it can link from books to sources they cite, it can highlight user search queries, and it can even add a limited social-networking annotation-sharing feature. These are specific exceptions, however, from the general principle that it won’t change one word of the author’s writings without permission. Good.

Google has even agreed to procedures that limit its editorial discretion to exclude books from being displayed (though not from being indexed or listed among search results, it would appear). If Google removes a book for “editorial reasons,” it will tell the Registry about it and give the Registry a digital copy of the book. The Registry may then go out and commission a competitor to provide the display services that Google has refused to. Google believes it has a First Amendment right not to be required to “speak” by passing along a book that it strongly objects to. Under these particular circumstances, I tend to agree. It has, however, chosen an honorable and speech-friendly way of exercising that right; Google’s waiver does not censor the book itself, which can still be made available through other means.

There are, however, three potential accountability holes in this procedure. One is that the Registry need not, or might not be able to, find a replacement for Google. I’m reluctant to intervene too strongly here, particularly when no other potential partner is willing to step forward. Others, bolder than I, might propose a positive duty on the Registry’s part, but I take no position on the issue, noting only that it raises difficult issues of free speech law and free speech policy.

The second is that no one besides the Registry might ever find out that Google has chosen to de-list a book. If the Registry doesn’t or can’t engage a replacement for Google, the book would genuinely vanish from this new Library of Alexandria. Perhaps that should happen for some books, but decisions like that shouldn’t be made in secret. When Google choses to exclude a book for editorial reasons, it should be [R13] required to inform the copyright owner and the general public, not just the Registry. This path leaves intact Google’s option to be silent, but requires that it be exercised with transparency. If and when, Google chooses not to speak, it should own the ethical consequences, rather than being able to hide from its decision to hide a book.

The third potential accountability hole is that the settlement contains no clear distinction between “non-editorial” and “editorial” reasons for Google to exclude a book from being displayed. This ambiguity raises the possibility that Google might exclude a book for editorial reasons but tell no one, not even the Registry, about it, and thereby completely suppress the book. There’s a danger of line-crossing wherever a line is drawn, but assuming that Google will act in good faith, a [R14] sharper definition of “non-editorial reasons” should suffice. The current draft of the settlement says “quality, user experience, legal, or other non-editorial reasons,” an unclear and imprecise list that could easily be converted into a clear and precise one.

One last point of accountability concerns an issue raised by Jean-Noël Jeanneney: what books are scanned and in the collection at all. Jeanneney’s specific concern—a lack of Francophone sources—has an easy and obvious response: the Bibliothèque nationale de France, of which he is the president, could join with Google to scan its collections. Indeed, Google has indicated its broad willingness to partner with libraries interested in scanning large corpuses of books to get them into the digital collection more quickly. Once again, Google’s sensible policy is one thing in the context of a private Google project and another in the context of a massive remaking of the United States system of book copyrights that requires the blessing of a court of law. [R15] Any institution that wishes to provide books for scanning, or to perform scanning itself, should be allowed to take part in the scanning effort and ensure that particular works are digitized. There will need to be appropriate provisions about capacity, financing, quality control, and so on, but a well-drafted “consent not to be unreasonably withheld” clause can take care of many of them. Once again, it’s worth emphasizing that this provision, like all of the others above, would apply both to Google and to any of its competitors who come in under the modified settlement.

Thus, the fifth and final principle, that of requiring accountability and transparency also has three recommendations that implement it:

R13: Require that Google inform the public when it excludes a book for editorial reasons.
R14: Tighten up the definition of “non-editorial reasons” for excluding a book.
R15: Allow any institution ready, willing, and able to participate in scanning books to do so.

Conclusion

The starting point for all of this analysis has been that Google and the copyright owners are asking a federal court to put the United States’s judicial power behind a document they have presented to it. The court’s consent should not be given lightly; the settlement should be approved only when the court is satisfied that it really will serve the interests of all parties, including the public. I have tried to offer general principles to think through what the public interest requires, along with specific, realistic recommendations to implement those principles.

At the same time, this is not a sentencing hearing or a legislative chamber. The court is not in a position to rewire Google and the book industry to right all wrongs therein, nor should it try. Google’s other ventures are not on the table, nor are the many other problems bedeviling copyright law. I have tried to offer recommendations tailored to the specific question the court is facing: should it use its power to bind absent class members and approve this settlement?

Thus, I hope that my principles and recommendations all have two things in common. Each takes off from some issue specifically raised by the proposed settlement, some way in which approving the settlement could cause trouble down the line. Each then offers a change to head off that trouble, a change more or less narrowly tailored to the issue it confronts.

How do we get there from here? These concerns need to be placed before the court. Different recommendations can appropriately be raised by different parties. The Department of Justice,and at least and one of Google’s potential competitors in book scanning (Microsoft, which has previously had a book-scanning project, is a natural choice) should move to intervene to raise the antitrust concerns. The FTC should move to intervene to raise the consumer-protection issues. And some unnamed members of the author class should object to the settlement unless it is modified to take account of the remaining issues. I believe that all of the recommendations I make can legitimately be suggested to the court on one or more of these bases, and that the court’s mandate to consider the public interest in a settlement gives it the power to condition its approval of the settlement on any or all of them.

My goals here are pragmatic. I am not proposing to take public control of the Book Search project. In comparison with the institutional reconfiguration of book copyright law that the settlement would enact, these tweaks are all quite minor. Nor am I proposing to leave Book Search entirely alone; the parties gave up on that possibility when they asked the court to approve this sweeping class-action settlement.

I hope that these recommendations will prove equally appealing to those who think that Google can do no evil and those who think it does only evil. Perhaps they will prove equally frustrating. The settlement is good as it stands, but it could stand to be better. These recommendations will not make the settlement perfect, but I believe they will not make it worse.

Summary of principles and recommendations (hyperlinks take you back to the section of the document that discusses them)

P0: The settlement should be approved
- R0: Approve the settlement.
P1: The Registry poses an antitrust problem
- R1: Put library and reader representatives on the Registry’s board.
- R2: Require the Registry to sign an antitrust consent decree.
- R3: Give future authors and publishers the same deal as current ones.
P2 If it didn’t already, Google poses an antitrust problem
- R4: Strike the most-favored-nations clause.
- R5: Allow Google’s competitors to offer the same services the settlement allows Google to offer, with the same obligations.
- R6: Authorize the Registry to negotiate on copyright owners’ behalf with Google’s competitors.
P3: Enforce reasonable consumer-protection standards
- R7: Prohibit Google from price discriminating in individual book sales.
- R8: Insert strict guarantees of reader privacy.
- R9: Protect readers from being asked to waive their rights as a condition of access.
P4: Make the public goods generated by the project truly public
- R10: Require that Google’s database of in-print/out-of-print information be made public.
- R11: Require that the Registry’s database of copyright owner information be made public.
- R12: Require the use of standard APIs, open data formats, and (for metadata) unrestricted access.
P5: Require accountability and transparency
- R13: Require that Google inform the public when it excludes a book for editorial reasons.
- R14: Tighten up the definition of “non-editorial reasons” for excluding a book.
- R15: Allow any institution ready, willing, and able to participate in scanning books to do so.

November 9, 2008 at 1:02 PM

Tim

I hope you’ll be submitting this, in some format, as an amicus brief? (I have no previous knowledge of whether courts tend to allow amicus filings for settlement approvals, but it can’t hurt to submit it.)

November 22, 2008 at 11:18 AM

Adam Corson-Finnerty

This is an excellent analysis, for which many thanks. I am working on a similar piece, for the academic library community. One question: do you think that scholars will be able to do “consumptive” research on their institutions’ subscription to the google db? It looks like the answer would be yes, since the sub will allow the faculty to see everything. But I suspect that datamining may be in some way blocked or not-enabled through the sub. what do you think?

November 22, 2008 at 12:26 PM

James Grimmelmann

At “Fully Participating Libraries” — those that offer books for scanning and get back digital copies — uses can engage in ” non-consumptive research (e.g. large-scale data mining). 7.2(b)(vi). I don’t read the Institutional Subscription (the paid-access version) to allow those large-scale uses. 4.1(d). Either version will allow individual scholarship where the scholar reads the book to understand what they say, but the “Fully Participating Library” version will restrict them to five pages per book.

November 25, 2008 at 4:28 PM

John Gilmore

You missed one antitrust problem. You say the Registry isn’t quite a monopoly, because individual authors or publishers can opt out of it, and negotiate different terms with Google (or others) if they wish. But that argument falls apart if Google refuses to negotiate different terms with authors or publishers. And Google is free to do so, without any antitrust or other penalty. Thus Google’s power to decline to negotiate, empowers the monopoly of the Registry. The deal is very likely to become: “If you opt out of the Registry, your books are simply not available from Google’s repository of ‘all’ books by ‘all’ publishers and authors.”

November 28, 2008 at 10:18 AM

James Grimmelmann

John, I think your prediction that Google won’t want to strike one-off deals with authors except through the registry is right. I’m struggling with how to characterize this fact. Perhaps it’s an indication that Google will have monopoly power under the settlement, and that Google is likely to use that power in a way that helps the Registry become a monopoly. I’ll think about the extent to which this behavior by Google could be challenged as a refusal to deal or as an attempt to leverage one monopoly into a related market.

December 2, 2008 at 5:22 PM

Colette Vogele

James, this is extremely useful for us practitioners out there who don’t have the time to read every last page of that 300+ page settlement proposal. Thank you for the contribution and especially the detail. Really great.

December 10, 2008 at 7:15 AM

Adam Corson-Finnerty

Thanks for your excellent, and fast analysis of the GBS. So far, you are the only person to suggest that the public be represented on the BRR, which I think is an excellent suggestion. For one thing, if Google decides to drop the whole project, as too money-wasting, the BRR inherits the effort—at least as I read the settlement draft. I have been trying to look at the implications for libraries, and would be interested in your, and your readers’ take on these questions:

GBS Questions for Librarians

I believe that, broadly speaking, the Google Book Settlement is a good thing. However, I have some very specific questions that I have not seen addressed by the Library community.

The Book Rights Registry is a non-profit entity that plays a critical role in administering the agreement. The BRR is controlled by four author representatives and four publisher representatives, with five votes needed for decisions. Why aren’t there any library representatives on this board? Or ‘public’ representatives.
This question is made more important by my reading of what happens if Google decides that the book-scanning business is a money sinkhole. We saw Microsoft bail out of the LiveSearch business, so this is not moot. If Google bails, then the BRR takes over the whole operation. All the more reason to have some ‘public’ directors.
Here is something cool to think about. In-print books and public domain books appear to be the tip of the iceberg. The greatest number of titles are out-of-print but still in copyright. I have seen estimates of 20-30 million titles. In most cases, the rights to such works will have reverted to the author. Google is going to have a green light to scan these titles for inclusion in its database, and for selective display, and commercial use, unless the author formally objects. The author will get a cut of any revenue, through the distributive mechanism of the BRR. All well and good. But this also opens an interesting opportunity. Allowing your book to be in the GBS is non-exclusive. Therefore, authors could also give publication rights to a non-profit entity, perhaps their university library, perhaps to a coalition of libraries. Authors could sign a ‘creative commons’ license for their out-of-print titles, thus adding immeasurably to the Open Access corpus. Shouldn’t we get organized and go after this opportunity?
I am really freaked by what is said about ‘mining’ the GBS database. Only ‘non-consumptive’ research will be permitted. That appears to mean that you can count words and analyze patterns, but you cannot see the words or phrases in context. This seems so outrageous that I hope that I am mis-interpreting. A simple example will suffice: Suppose you wanted to study how widely the term ‘fulsome praise’ has transmuted from having a negative connotation to having a positive one. You would have to see the phrase in context. Google will allow the establishment of three research bases, all of which are restricted to ‘non-consumptive’ research. OK, but will the ‘institutional subscription’ then allow datamining with context? If not, this is a scandal and academic librarians should be shouting from the rooftops.
A different sort of question is this: If Google is successful, then virtually every book ever printed in English, and millions of titles in other languages, will be available to read, print out, and purchase through print-on-demand. So can most academic research libraries get out of the book storage business? You can see what I mean: save a few preservation master copies, and a dozen circulating copies for those who want to study the book as artifact, and dump the rest. For most of our patrons, if they want to read the book on paper, a printed facsimile should do just fine.
And yet, I have heard that Google’s scans are certainly not preservation quality, and perhaps not even print-on-demand quality. Does anyone know?
Finally, the Google Agreement is between the company and authors and publishers. Artists, photographers, and illustrators are not included. I have heard that this will mean the images in an in-copyright book will be blanked out. Is this true? Has anyone heard of Google pursuing a deal will these groups?

December 10, 2008 at 11:09 AM

James Grimmelmann

Adam. Good questions. Some tentative answers:

Yes.
Yes.
This would be a very good thing, whether or not the Book Search project goes forward.
My understanding is that the Institutional Subscription would be supplied from Google’s servers and wouldn’t necessarily be offered in a way that would make datamining possible. You could conduct searches and code the results yourself by hand, but you wouldn’t be able to automate the process the way you could with the Research Corpus. If I’m wrong about that, I hope someone else will speak up and correct me.
I expect that many libraries will get out of the storage business. Nicholson Baker will be outraged. The deaccessioning will take place on a large scale. Some materials that should have been preserved will be lost. Now would also be a good time to think about how to minimize that damage.
I don’t know, but would love to be informed by someone who does.
The images in in-copyright books won’t be blanked out where the publisher has the right to let Google display them and consents. I don’t know Google’s plans as to artist-controlled rights, but I would expect that they include blanking-out. It would be logical for Google to pursue a deal — perhaps through another class-action lawsuit — but I haven’t heard anything to suggest that they actually are.

February 5, 2009 at 1:01 PM

Jerome M. Garchik,Attorney

I am a civil rights attorney is S.F. and have been studying this too in some detail.

No doubt there will be some objections filed by May 5 deadline.

Anti trust issues can be lodged via consumer contact sites of the US Dept of Justice and the European Community Competition Commission.

Overall, these matters involved in the settlement ,license mechanisms, registry, e-publishing fees and conditions should be dealt with in a public forum at federal agency or Congressional level and not a here in closed door negotiations by private interests in settlement of a lawsuit.

This deal has worlwide implications, and more than iTunes, involves the cultural heritage of mankind, and who can profit off is.

Call me if you wish to discuss this further. J.Garchik, 703 Market St. SF 94l03 4l5 495 8527 e mail: jchikesq@gmail.com

June 17, 2009 at 5:35 AM

KrzysztofCzyzewski

Hi,

I am IP lawyer from Poland. I Am head of IP practice in LSW law firm.

I got interested in the settlement from the “foreign perspective”.

For foreign authors and publisher the main problem is to explain to them how the settlement applies to their books. The authors of the settlement want the settlement to apply to books all over the world but they have not put any effort in explaining the idea of „American copyright / American interest/. Of course it may be done by lawyers from each jurisdiction, however it would be better to put in details in the settlement, instead of providing the readers with simple reference to USC. (For example – as far as Poland and Polish books are concerned – 16th of February 1927 is crucial).

Secondly, it should be expressly stated what does it mean “use on the territory of the US”. Especially having in mind the argumentation of Google’s lawyers in SAIF case in France. Will Google use IP number limitation? As for today Google is far from saying “yes”.

Thirdly, the idea of “commercial availability” should take into account also foreign perspective. What does it mean to be available for consumers on the territory of the US? What is the difference in offering books on amazon.com and on foreign sites eg. empik.com or [polish publisher].pl

It seems to me that the authors of the settlement want it to cover all foreign books without however showing any respect to any alternative copyright systems and foreign interests.

best krzysztof

July 17, 2009 at 5:06 AM

Esther Hoorn

Dear Krysztof,

I am a copyright officer in the university of Groningen in the Netherland with a task to promote Open Access. You will be interested to know that the European Commission is organising a hearing to come to an European perspective on the settlement. See: http://arstechnica.com/tech-policy/news/2009/06/eu-may-flex-regulatory-muscles-against-google-book-deal.ars

The Dutch Wikimedia organisation coördinated a European contribution on behalf of Wikipedia on the Greenbook Copyright in the Knowledge society. Therefore Wikimedia is also invited to the hearing. It would be interesting to elaborate P4 in the article above about public re-use of databases with metadata.

On commercial availability of foreign books see: http://digital-scholarship.org/digitalkoans/2009/06/22/google-book-search-settlement-interview-with-michael-healy-expected-executive-director-of-the-book-rights-registry/

Best regards, Esther Hoorn

September 19, 2009 at 12:25 AM

Edward Hasbrouck

You start from the premise that “Authors and publishers will receive the majority of that money.” There are at least three pmajor roblems with this:

First, the assumption that there will be “new” revenues rests on the erroneous presumption (rooted in Google’s arrogant presumption of the necessity for its role as intermediary and the absence of any possible alternative), that rights to “out-of-print” books are not currently being commercially exploited, when in fact authors are often exploiting those rights through online self-publication, such as publication of content with advertising on the author’s Web site, or direct sale of PDF e-books through the author’s Web site, or even through placement in Google’s partner program at more favorable terms than are offered through the settlement.

Second, in many cases the author (or less often the print publisher, or in some cases both), while they hold non-exclusive (or otherwise divided or conditional) rights under current contracts, will not be determined to be a “Rightsholder” with respect to a particular work or particular uses of it under the different definitions and procedures of the settlement, and thus will receive nothing from the settlement or form those uses.

Third, much, perhaps most (at least at first), of the new revenue will be revenues for general search (not search within a specific book). Even where all the responses to a specific search query come from a particular book, Google will keep all of this revenue. This would include, inter alia, searches by users trying to locate the source, exact text, or citation for a quotation or reference, and is likely to amount to billions of dollars a year.