Last month, Google announced a partnership with major research libraries to scan 20 million books for inclusion in Google's search database. For those works in the public domain, the full text will be available. For those works still possibly under copyright, only snippets will be seen. The potential of this project is only beginning to be understood -- it is likely to bring about the most dramatic changes in the nature of research and the spread of culture since the birth of Google itself.
But the excitement around Google's extraordinary plan has obscured a dirty little secret: It is not at all clear that Google and these libraries have the legal right to do what is proposed. For work in the public domain, the right is clear enough. But for work not in the public domain, Google's right to scan -- to copy -- whole texts to index is uncertain at best, even if it ultimately makes only snippets available. When permission has been given by the copyright holder, again there's no problem. But when permission has not been secured, the law is essentially uncertain. If lawsuits were filed, and if Google and its partner libraries were found to have violated the law, their legal exposure could reach into the billions.
Google, to its credit, has decided to accept these risks. It can afford to fight the lawsuits, and the benefit to society and Google from such access apparently outweighs its potential costs.
But not everyone is Google. Not every library could afford the risks that Google can. And so before we accept a world where only a Google can build valuable, network-based digital libraries, we should ask whether the system that produces these profound uncertainties is a system that we should change.
The basic problem is simple. A copyright is a property right. Yet our particular system of copyrights is insanely inefficient. Rights get created easily enough -- a copyright is automatic; you need do nothing to secure it. But tracking who owns the rights created is astonishingly difficult. There is no Google for determining what works are protected by copyright; there is no Google for tracking down current copyright holders. The law creates a property right, but leaves it practically impossible to respect that property right for older, out-of-print works.
For example, in 1930, there were 10,027 books published in the United States. In 2001, 174 of those books were still in print. That means 9,853 books were out of print, but still presumably protected by copyright. "Presumably" because, in the U.S., the protection of copyright reaches back to 1923. But only presumably because, for works created before 1978, a copyright had to be registered to be secured and then renewed for the author to enjoy a full term of copyright protection. At least half of all works published historically never took the first step; almost 90% never took the second.
The vast majority of creative work published in 1930, therefore, is in the public domain. But it is extremely costly to know which works in particular are in that category. And for those works that remain under copyright, unless new editions containing the latest copyright information become available -- a reprint of an old book, say, or a DVD of an old movie -- tracking down the current owners can require hours of detective work that may come up empty.
The solution is obvious enough: Clean up the copyright system. As with every other federal intellectual property regime, all copyrights should be registered. And as was the American tradition for almost two centuries, there should be simple techniques for filtering out works that have no continuing need for copyright protection. No doubt, the law should protect creative work when protection does some good, but that protection should end when it serves no purpose.
How would it work? One proposal calls for copyrights to be renewed every five years -- a process that today could be made technically quite simple and that would create an accessible database as well as quickly clear away unneeded copyrights.
Clarifying the system, however, has been universally opposed by the content industry -- Hollywood, book publishers and the like. It fears that any reform would weaken Congress' resolve to strongly protect intellectual property. So while it insists upon increased regulation to protect commercially valuable work, it works to block reform that would enable a wide range of creative work to be efficiently built on by others.
Google's gamble shows that it is time for Congress to listen to both the content industry and the digital entrepreneurs. Our culture should be available for anyone -- not just a deep-pockets Google -- to build on and spread, consistent with the purpose of copyright law. The law's inefficiencies should not block that opportunity.
Reforms designed to clarify copyrights would allow Google to do more with our cultural and intellectual past without legal worries. They would also allow others, at very low cost, to do the very same.