Project Gutenberg
Logo
Established 12 January 1971 (1971-01-12)
(First document posted)[1]
Collection
Size Over 47,000 documents
Website .orggutenberg

Project Gutenberg (PG) is a volunteer effort to digitize and archive cultural works, to "encourage the creation and distribution of eBooks".[2] It was founded in 1971 by Michael S. Hart and is the oldest digital library.[3] Most of the items in its collection are the full texts of public domain books. The project tries to make these as free as possible, in long-lasting, open formats that can be used on almost any computer. As of March 2014, Project Gutenberg claimed over 47,000 items in its collection.

The releases are available in plain text but, wherever possible, other formats are included, such as HTML, PDF, EPUB, MOBI, and Plucker. Most releases are in the English language, but many non-English works are also available. There are multiple affiliated projects that are providing additional content, including regional and language-specific works. Project Gutenberg is also closely affiliated with Distributed Proofreaders, an Internet-based community for proofreading scanned texts.

History

Michael Hart (left) and Gregory Newby (right) of Project Gutenberg, 2006

Project Gutenberg was started by Michael Hart in 1971 with the digitization of the United States Declaration of Independence.[4] Hart, a student at the University of Illinois, obtained access to a Xerox Sigma V mainframe computer in the university's Materials Research Lab. Through friendly operators, he received an account with a virtually unlimited amount of computer time; its value at that time has since been variously estimated at $100,000 or $100,000,000.[4] Hart has said he wanted to "give back" this gift by doing something that could be considered to be of great value. His initial goal was to make the 10,000 most consulted books available to the public at little or no charge, and to do so by the end of the 20th century.[5]

This particular computer was one of the 15 nodes on ARPANET, the computer network that would become the Internet. Hart believed that computers would one day be accessible to the general public and decided to make works of literature available in electronic form for free. He used a copy of the United States Declaration of Independence in his backpack, and this became the first Project Gutenberg e-text. He named the project after Johannes Gutenberg, the fifteenth century German printer who propelled the movable type printing press revolution.

By the mid-1990s, Hart was running Project Gutenberg from Illinois Benedictine College. More volunteers had joined the effort. All of the text was entered manually until 1989 when image scanners and optical character recognition software improved and became more widely available, which made book scanning more feasible.[6] Hart later came to an arrangement with Carnegie Mellon University, which agreed to administer Project Gutenberg's finances. As the volume of e-texts increased, volunteers began to take over the project's day-to-day operations that Hart had run.

Starting in 2004, an improved online catalog made Project Gutenberg content easier to browse, access and hyperlink. Project Gutenberg is now hosted by ibiblio at the University of North Carolina at Chapel Hill.

Italian volunteer Pietro Di Miceli developed and administered the first Project Gutenberg website and started the development of the Project online Catalog. In his ten years in this role (1994–2004), the Project web pages won a number of awards, often being featured in "best of the Web" listings, and contributing to the project's popularity.[7]

Project Gutenberg founder, Michael Hart, died on 6 September 2011 at his home at Urbana, Illinois at the age of 64.[8]

Affiliated organizations

In 2000, a non-profit corporation, the Project Gutenberg Literary Archive Foundation, Inc. was chartered in Mississippi to handle the project's legal needs. Donations to it are tax-deductible. Long-time Project Gutenberg volunteer Gregory Newby became the foundation's first CEO.[9]

Also In 2000, Charles Franks founded Distributed Proofreaders (DP), which allowed the proofreading of scanned texts to be distributed among many volunteers over the Internet. This effort greatly increased the number and variety of texts being added to Project Gutenberg, as well as making it easier for new volunteers to start contributing. DP became officially affiliated with Project Gutenberg in 2002.[10] As of 2007, the 10,000+ DP-contributed books comprised almost a third of the nearly 47,000 books in Project Gutenberg.

CD and DVD project

In August 2003, Project Gutenberg created a CD containing approximately 600 of the "best" e-books from the collection. The CD is available for download as an ISO image. When users are unable to download the CD, they can request to have a copy sent to them, free of charge.

In December 2003, a DVD was created containing nearly 10,000 items. At the time, this almost represented the entire collection. In early 2004, the DVD also became available by mail.

In July 2007, a new edition of the DVD was released containing over 17,000 books, and in April 2010, a dual-layer DVD was released, containing nearly 30,000 items.

The majority of the DVDs, and all of the CDs mailed by the project were recorded on recordable media by volunteers. However, the new dual layer DVDs were manufactured, as it proved more economical than having volunteers burn them. As of October 2010, the project has mailed approximately 40,000 discs.[11]

Scope of collection

Growth of Project Gutenberg publications from 1994 until 2008.

As of May 2014, Project Gutenberg claimed over 47,000 items in its collection, with an average of over fifty new e-books being added each week.[12] These are primarily works of literature from the Western cultural tradition. In addition to literature such as novels, poetry, short stories and drama, Project Gutenberg also has cookbooks, reference works and issues of periodicals.[13] The Project Gutenberg collection also has a few non-text items such as audio files and music notation files.[14]

Most releases are in English, but there are also significant numbers in many other languages. As of February 2013, the non-English languages most represented are: French, German, Finnish, Dutch, Portuguese, and Chinese.[3]

Whenever possible, Gutenberg releases are available in plain text, mainly using US-ASCII character encoding but frequently extended to ISO-8859-1 (needed to represent accented characters in French and Scharfes s in German, for example). Besides being copyright-free, the requirement for a Latin (character set) text version of the release has been a criterion of Michael Hart's since the founding of Project Gutenberg, as he believes this is the format most likely to be readable in the extended future.[15] Out of necessity, this criterion has had to be extended further for the sizable collection of texts in East Asian languages such as Chinese and Japanese now in the collection, where UTF-8 is used instead.

Other formats may be released as well when submitted by volunteers. The most common non-ASCII format is HTML, which allows markup and illustrations to be included. Some project members and users have requested more advanced formats, believing them to be much easier to read. But some formats that are not easily editable, such as PDF, are generally not considered to fit in with the goals of Project Gutenberg. Also Project Gutenberg has two options for master formats which can be submitted (from which all other files are generated), customized versions of the Text Encoding Initiative standard since 2005,[16] and reStructuredText, since 2011.[17]

Beginning in 2009 the Project Gutenberg catalog began offering auto-generated alternate file formats, including html (when not already provided), EPUB and plucker.[18]


Gallery