The World of Peer-to-Peer (P2P)/All Chapters

This page is brought to you by Wikimedia Laboratories
Jump to: navigation, search

Contents

Foreword

Guide to Readers

This is a wikibook (en.wikibooks.org), as such you should learn a bit about what it is and how it does its magic.

The book is organized into different parts, but as this is a work that is always evolving, things may be missing or just not where they should be, you are free to become a writer and contribute to fix things up...

This book intends to explain to you the overall utilization that P2P (Peer-to-Peer) technologies have in todays world, it goes deeper into as many implementations as it can and compares the benefits, problems even legal implications and changes to social behaviors and economic infrastructures. We explain in detail about the technology and how works and try to bring you a vision on what to expect in the future.

Reader Comments

If you have comments about the technical accuracy, content, or organization of this document, please tell us (e.g. by using the "discussion" pages or by email). Be sure to include the section or the part title of the document with your comments and the date of your copy of the book. If you are really convinced of your point, information or correction then become a writer (at Wikibooks) and do it, it can always be rolled back if someone disagrees...

Guide to Writers

Authors/Contributors (at Wikibooks) should register if intending to make non-anonymous contributions to the book (this will give more value and relevance to your opinions and views on the evolution of the work and enable others to talk to you) and try to follow the structure. If you have major ideas or big changes use the discussion area; as a rule just go with the flow...

Conventions 
A set of conventions have been adopted on the creation of this book please read about them before you contribute any content on the book's talk page.

Authors

The following people are authors to this book
Panic

There are many other contributors/editors to the book; a verifiable list of all contributions exist as History Logs at Wikibooks (http://en.wikibooks.org/).

Acknowledgment is given for using some contents from other works like Wikipedia, theinfobox:Peer to Peer and Internet Technologies
GNU head Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License."

What is P2P ?

This is a diagram of a Peer-to-Peer computer network.
A diagram of a server-based computer network.

Generally, a peer-to-peer (or P2P) computer network refers to any network that does not have fixed clients and servers, but a number of autonomous peer nodes that function as both clients and servers to the other nodes on the network. This model of network arrangement is contrasted with the client-server model as any node is able to initiate or complete any supported transaction. Peer nodes may differ in local configuration, processing speed, network bandwidth, and storage quantity.

Although the term has been applied to Usenet and IRC in all their incarnations and is even applicable to the network of IP hosts known as the Internet, it is most often used restricted to the networks of peers developed starting in the late 1990s characterized by transmission of data upon the receiver's request instead of the sender's. Such of the early networks included Gnutella, FastTrack, and the now-defunct Napster which all provide facilities for free (and somewhat anonymous) file transfer between personal computers connected in a dynamic and unreliable way to a network in order to work collectively towards a shared objective.

Even those early Networks did work around the same concept or implementation. In some Networks, such as Napster, OpenNap or IRC, the client-server structure is used for some tasks (e.g. searching) and a peer-to-peer structure for others, and even that is not consistent in each. Networks such as Gnutella or Freenet, use a peer-to-peer structure for all purposes and are sometimes referred to as true peer-to-peer networks, even though some of the last evolution are now making them into a hybrid approach were each peer is not equal in its functions.

When the term peer-to-peer was used to describe the Napster network, it implied that the peer protocol nature was important, but in reality the great achievement of Napster was the empowerment of the peers (ie, the fringes of the network). The peer protocol was just a common way to achieve this.

So the best approach will be to define peer-to-peer, not as a set of strict definitions but to extend it to a definition of a technical/social/cultural movement, that attempts to provide a decentralized, dynamic and self regulated structure (in direct opposition to the old model o central control or server-client model) with the objective of providing content and services. In this way a computer programs/protocol that attempts to escape the need to use a central servers/repository and aims to empower or provide a similar level of service/access to a collection of similar computers can be referred to as being a P2P implementation, and it will be in fact enabling everyone to be a creator/provider, not only a consumer.

From a Computer Science Perspective

Technically, a true peer-to-peer application must implement only peering protocols that do not recognize the concepts of "server" and "client". Such pure peer applications and networks are rare. Most networks and applications described as peer-to-peer actually contain or rely on some non-peer elements, such as DNS. Also, real world applications often use multiple protocols and act as client, server, and peer simultaneously, or over time.

P2P under a computer science perspective creates new interesting fields for research not on to the not so recent switch of roles on the networks components, but due to unforeseen benefits and resource optimizations it enables, on network efficiency and stability.

Peer-to-peer systems and applications have attracted a great deal of attention from computer science research; some prominent research projects include the Chord lookup service, the PAST storage utility, and the CoopNet content distribution system (see below for external links related to these projects).

Distributed Systems

Ganglia

Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency.
Ganglia has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes. ( http://ganglia.info/ )

Distributed Computation

The basic premise behind distributed computation is to spread computational tasks between several machines distributed in space, most of the new projects focus on harnessing the idle processing power of "personal" distributed machines, the normal home user PC. This current trends is an exciting technology area that has to do with a sub set of distributed systems (client/server communication, protocols, server design, databases, and testing).

This new implementation of an old concept has it's roots in the realization that there is now a staggering number of computers in our homes that are vastly underutilized, not only home computers but there are few businesses that utilizes their computers the full 24 hours of any day. In fact seemingly active computers can be using only a small part of it processing power. Using a word processing, email, and web browsing, require very few CPU resources. So the "new" concept is to tap on this underutilized resource (CPU cycles) that can surpass several supercomputers at substantially lower costs since machines that individually owned and operated by the general public.

SETI@Home

One of the most famous distributed computation project, , hosted by the Space Sciences Laboratory, at the University of California, Berkeley, in the United States. SETI is an acronym for the Search for Extra-Terrestrial Intelligence. SETI@home was released to the public on May 17, 1999.

In average it used hundreds of thousands of home Internet-connected computers in the search for extraterrestrial intelligence. The whole point of the programs is to run your free CPU cycles when it would be otherwise idle, the original project is now deprecated to be included into BOIC.

BOINC

BOINC has been developed by a team based at the Space Sciences Laboratory at the University of California, Berkeley led by David Anderson, who also leads SETI@home.

Boinc stands for Berkeley Open Infrastructure for Network Computing, a non-commercial (free/w:open source software), released under the LGPL, middleware system for volunteer computing, originally developed to support the SETI@home project and still hosted at ( http://boinc.berkeley.edu/ ), but intended to be useful for other applications in areas as diverse as mathematics, medicine, molecular biology, climatology, and astrophysics. an open-source software platform for computing using volunteered resources that extends the original concept and lets you donate computing power to other scientific research projects such as:

  • Climateprediction.net: study climate change.
  • Einstein@home: search for gravitational signals emitted by pulsars.
  • LHC@home: improve the design of the CERN LHC particle accelerator.
  • Predictor@home: investigate protein-related diseases.
  • Rosetta@home: help researchers develop cures for human diseases.
  • SETI@home: Look for radio evidence of extraterrestrial life.
  • Folding@Home ( http://www.stanford.edu/group/pandegroup/folding/ ): to understand protein folding, misfolding, and related diseases.
  • Cell Computing biomedical research. (Japanese; requires nonstandard client software)
  • World Community Grid: advance our knowledge of human disease. (Requires 5.2.1 or greater)

As a "quasi-supercomputing" platform, BOINC has over 435,000 active computers (hosts) worldwide. BOINC is funded by the National Science Foundation through awards SCI/0221529, SCI/0438443, and SCI/0506411.

It is also used for commercial usages, as there are some private companies that are beginning to use the platform to assist in their own research. The framework is supported by various operating systems: Windows (XP/2K/2003/NT/98/ME), Unix (GNU/Linux, FreeBSD) and Mac OS X.

World Community Grid (WCG)

Created by IBM, World Community Grid ( http://www.worldcommunitygrid.org/ ) is similar to the above systems. Fourteen IBM servers serve as "command central" for WCG. When they receive a research assignment from an organization, they will scour it for security bugs, parse it into data units, encrypt them, run them through a scheduler and dispatch them out in triplicate to the army of volunteer PCs.

To be a volunteer one only needs to download a free, small software agent (similar to a screensaver).

Projects get selected based on the potential to benefit from WCG technology and address humanitarian concerns, and chosen by an independent, external board of philanthropists, scientists and officials.

The software is OpenSource (LGPL), C/C++ and wxWidgets and is available for Windows, Mac, or Linux.

Grid Networks

Grids first emerged in the use of supercomputers in the U.S. , as scientists and engineers sought access to scarce high-performance computing resources that were concentrated at a few sites.

Open Science Grid

The Open Science Grid ( http://www.opensciencegrid.org/ ) was built and is operated by the OSG Consortium, it is a U.S. grid computing infrastructure that supports scientific computing via an open collaboration of science researchers and software developers from universities and national laboratories, storage and network providers.

Globus Alliance

The Globus Alliance ( http://www.globus.org/ ) is a community of organizations and individuals developing fundamental technologies behind the "Grid," which lets people share computing power, databases, instruments, and other on-line tools securely across corporate, institutional, and geographic boundaries without sacrificing local autonomy.

The Globus Alliance also provides the Globus Toolkit, an open source software toolkit used for building robust, secure, grid systems (peer-to-peer distributed computing on supercomputers, clusters, and other high-performance systems) and applications. A Wiki is available to the Globus developer community ( http://dev.globus.org/wiki/Welcome ).

High Throughput Computing (HTC)

As some scientists try extract more floating point operation per second (FLOPS) or minute from their computing environment, others concentrate on the same goal for larger time scales, like months or years, we refer these environments as High Performance Computing (HPC) environments.

The term HTC was coined in a seminar at the NASA Goddard Flight Center in July of 1996 as a distinction between High Performance Computing (HPC) and High Throughput Computing (HTC).

HTC focus is on the processing power and not on the network, but the systems can also be created over a network and so be seen as a Grid network optimized for processing power.

Condor Project

The goal of the Condor Project ( http://www.cs.wisc.edu/condor/ ) is to develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing (HTC) on large collections of distributively owned computing resources. Guided by both the technological and sociological challenges of such a computing environment, the Condor Team has been building software tools that enable scientists and engineers to increase their computing throughput.

From a Economics Perspective

For a P2P system to be viable there must be a one to one share of work between peers, the goal should be a balance between consumption and production of resources and maintaining a singe class of participant on the network. Since most P2P systems have an hard time creating incentives for users to produce, most P2P system have a pyramidal scheme as users interact with it and do depend on the network effect it creates, the more users the system has the more attractive it (and the more value it has) as any system that depends on the network effect, it's success is based on compatibility and conformity issues.

Content for money

The privatization of the production and distribution of Cultural goods.

Content is virtual, made only of information. This information can be any type of non material object that is made from ideas (text, multimedia). In this way content is also the myriad ways those ideas can be expressed. It may consist of music, movies, books or any one single aspect of each.

Music

In todays interconnected world the distribution channels are so diversified that creating artificial control schemes will only degrade the level of satisfaction of consumers without increasing product value but incrementing the costs to the sanctioned distributors. If costumers are faced with a product with DRM, unauthorized copies if made publicly available, will create a competing product without limitations, thus creating a better product with a better price tag. In fact the use of DRM promotes the creation of a parallel market (if one can call it that because most offerings are gratis), this results from the consumers wishes are not being satisfied by the primary offer.

Video

Movies
TV

Recently some television networks are rethinking their approach to audiences, this has resulted from the level acceptance and interest that DVD show collections were having and several online attempts to improve distribution. Since now anyone can easily illegally download their favorite shows, a problem similar to the fragmentation of the distribution channels as seen in the music recording industry with the rise of alternative delivery technologies will have a similar result if television industry fails adapt and fill the audiences expectations of quick and easy accessibility to new fresh content.

TODO

TODO
extend, address the rise of independent productions, compare the p2p meme of decentralization with the static settings of the industry, we are all producers now, real time interaction

ISPs

From a Sociological Perspective

From person to person or user to user, a new world is being born in that all are at the same time producers and consumers. The information will be free since the costs of distribution will continue to fall and the power for creative participation is at anyones hands.

Is it morally wrong?

As discussed previously there is no common ground to answer this question, views differ wildly, even states degrees with the interpretation or legality of restricting/implementing intellectual property rights.

TODO

TODO
Disintermediation... Price Fixing... Copyright Extension... Radio Consolidation... DRM...

  • paedophiles and terrorists

DRM (Digital Rights Management)

In late 2005, market-based rationales influenced Sony BMG's deployment of DRM systems on millions of Compact Discs that threatened the security of its customers computers and compromised the integrity of the information infrastructure more broadly. This became known as the Sony BMG Rootkit debacle (see the paper Mulligan, Deirdre and Perzanowski, Aaron K., "The Magnificence of the Disaster: Reconstructing the Sony BMG Rootkit, for detailed information).

In February 6, 2007, Steve Jobs, wrote an open letter addressing DRM since it was impacting Apples business on the iTunes/iPod store ( http://www.apple.com/hotnews/thoughtsonmusic/ ).

On a presentation made by David Hughes of the RIAA at Arizona State University (2007), David Hughes, senior vice president of technology for the RIAA, dubbed the spiritual leader of Apple Steve Jobs as a "hypocrite" over his attitude to DRM on iTunes. "While Steve has been banging on about the music companies dropping DRM he has been unwilling to sell his Pixar movies through iTunes without DRM and DVDs without CSS encryption."

a danger for historical records

TODO

TODO
Libraries fear digital lockdown By Ian Youngs BBC News

P2P United

A now disbanded organization formed by six of the biggest P2P groups (those behind eDonkey, Grokster, Morpheus, Blubster, Limewire and BearShare), with Adam Eisgrau as executive director. It was started in mid-July 2003 to provide a way to lobby for the P2P on U.S. Congress and WIPO, the UN organization that administers intellectual property treaties since the file-sharing industry (as an industry) had no identifiable name and face in Washington or in the media.

This attempt was a bust and since then most of the members of the group has lost court cases or have settled and closed operations.

TODO

TODO
Complete

Peer-to-Peer working group

The Peer-to-Peer WG (P2Pwg).

A great article about problems with the creation of the working group is available at (www.openp2p.com) by Tim O'Reilly 10/13/2000 is available (http://www.openp2p.com/pub/a/p2p/2000/10/13/working_grp.html).

For every action there is a reaction

It is today an evidence that there is a social movement against what is generally perceived as the corruption of copyright over public goods, that is, legally a minority is attempting to impose extensions and reductions of liberties to defend economical interests of mostly sizable international corporations that in it's vast majority aren't even the direct creators of the goods. In this particular case virtual goods, mostly digital that have a approaching 0 cost of replication and aren't eroded by time or use.

From a Legal Perspective

The most commonly shared files on such networks are mp3 files of popular music and DivX movie files. This has led many observers, including most media companies and some peer-to-peer advocates, to conclude that these networks pose grave threats to the business models of established media companies. Consequently, peer-to-peer networks have been targeted by industry trade organizations such as the RIAA and MPAA as a potential threat. The Napster service was shut down by an RIAA lawsuit; both groups the RIAA and MPAA spend large amounts of money attempting to lobby lawmakers for legal restrictions. The most extreme manifestation of these efforts to date (as of January, 2003) has been a bill introduced by California Representative Berman, which would grant copyright holders the legal right to break into computer systems believed to be illegally distributing copyrighted material, and to subvert the operation of peer-to-peer networks. The bill was defeated in committee in 2002, but Rep. Berman has indicated that he will reintroduce it during the 2003 sessions.

As attacks from Media companies expand the networks have seemed to adapt at a quick pace and have become technologically more difficult to dismantle. This has caused the users of such systems to become targets . Some have predicted that open networks may give way to closed, encrypted ones where the identity of the sharing party is not known by the requesting party. Other trends towards immunity from media companies seem to be in wireless adhoc networks where each device is connected in a true peer-to-peer sense to those in the immediate vicinity.

While historically P2P file sharing has been used to illegally distribute copyrighted materials (like music, movies, and software), future P2P technologies will certainly evolve and be used to improve the legal distribution of materials.

TODO

TODO
..."IP addresses are an identifier used to locate a particular network interface on the Internet. Be this a router, a PC, Mac, PDA, mobile phone or otherwise (with modules capable of utilizing one ranging to the size of a finger nail). IP addresses are not proof that a particular TYPE (PC running Windows, Linux or other free software, PDA, mobile phone, etc.) of computer hardware was used in the transmission. Nothing about this hardware can be *assumed*, and also nothing about the users IF ANY, of this hardware. So, I define my second point, which is that these electronic devices (of the types I listed above) may be operated without regard to physical location or the actual OWNER of the IP address." in "Patricia Santangelo files Answer, Demands Trial by Jury"...

As it should be obvious by now the problem P2P technologies create to the owner of the content, to the control of the distribution channels and to the limitation of users (consumers) rights is huge, the technology is making holes in the standard ideology that controls the relations between producers and consumers some new models have been proposed (see for example Towards solutions to “the p2p problem” - http://groups.sims.berkeley.edu/pam-p2p/ ).

Is it Illegal?

Peer-to-Peer in itself in nothing particularly new. We can say that an FTP transfer or any other one on one transfer is P2P, like an IRC user sending a DCC file to another, or even eMail, the only thing that can be illegal is the use one can give to a particular tool.

Legal uses of P2P include distributing open or public content, like movies, software distributions (Linux, updates) and even Wikipedia DVDs are found on P2P Networks. It can also be used to bypass censorship, like for instance the way Michael Moore's new film 'Sicko' leaked via P2P or as publicity machine to promote products and ideas or even used as a market annalists tool.

However trading copyrighted information without permission is illegal in most countries. You are free to distribute your favorite Linux distribution, videos or pictures you have taken yourself, MP3 files of a local band that gave you permission to post their songs online, maybe even a copy an open source software or book. The view of legality lies foremost on cultural and moral ground and in a globally networked world there is no fixed line you should avoid crossing, one thing is certain most people don't produce restricted content, most view their creations as giving to the global community, so it's mathematically evident that a minority is "protected" by the restrictions imposed on the use and free flow of ideas, concepts and culture in general.

P2P as we will see is not only about files sharing, it is more generally about content/services distribution.

Sharing is not theft and theft is not the same as piracy, this is true under any law.
is sharing theft? and is theft piracy? surly not...

Sharing contents that you have no right to is not theft. It has never been theft any were in the world. Anyone who says it is theft is wrong. Sharing content that you don't own or have the rights to is copyright infringement.

The legal battles we are now accustomed to hear about deals mostly on control and also on lesser degree in rights preservation. Control over the way distribution is archived (who gets what in what way), this deals with money, as there is added value to controlling and restricting access by format, time and space.

TODO

TODO

  • the Copyright Term Extension Act of 1998—alternatively known as the Sonny Bono Copyright Term Extension Act or the Mickey Mouse Protection Act.
  • Encryption
  • paedophiles and terrorists

WIPO (World Intellectual Property Organization)

The World Intellectual Property Organization is one of the specialized agencies of the United Nations. WIPO was created in 1967 with the stated purpose "to encourage creative activity, [and] to promote the protection of intellectual property throughout the world". The convention establishing the World Intellectual Property Organization, was signed at Stockholm on July 14, 1967.

TODO

TODO
Add more info on the WIPO and relevant treaties.

EU

An italian manifest saying "to share is not steal", referring to P2P legal status in Italy.

In August 2007, the Music industry was rebuffed in Europe on file-sharing identifications, as a court in Offenburg, Germany refused to order ISPs to identify subscribers when asked to by Music Industry who suspected specified accounts were being used for copyright-infringing file-sharing, the refusal was based in the courts understanding that ordering the ISPs to handover the details would be "disproportionate", since the Music Industry representatives had not adequately explained how the actions of the subscribers would constitute "criminally relevant damage" that could be a basis to request access to the data.

This was not an insulate incident in Germany, as also in 2007, Celle chief prosecutor's office used the justification that substantial damage had not been shown to refuse the data request, and does follows the opinion of a European Court of Justice (ECJ) Advocate-General, Juliane Kokott who had published an advice two weeks earlier, backing this stance, as it states that countries whose law restricted the handing over of identifying data to criminal cases were compliant with EU Directives. The produced advice was directed to a Spanish case in which a copyright holders' group wanted subscriber details from ISP Telefonica. The ECJ isn't obliged to follow an Advocate-General's advice, but does so in over three-quarters of cases.

In most European countries, copyright infringement is only a criminal offense when conducted on a commercial scale (for profit).

TODO

TODO

  • EU Proposing to Make P2P Piracy A Criminal Offense
    "The EP excluded patent rights from the scope of the directive, and decided that criminal sanctions should apply only to infringements deliberately carried out to obtain a commercial advantage. Piracy committed by private users for personal, non-profit purposes is therefore also excluded."

France

On June 12 2007 the Société des Producteurs de Phonogrammes en France (SPPF - http://www.sppf.com/ ), an entity that represents the legal interests and collects copyright revenue in behalf of independent French audio creations, have publicly announced that they had launched a civil action on the Paris Court of First Instance requesting a court order to terminate the distribution and function of Morpheus (published by Streamcast), Azureus and demanding compensation for monetary losses. In 18 September 2007 a similar action was made against Shareaza and in 20 December 2007 the SPPF announced a new action this time against Limewire. All of this legal actions seem to have as a base an amendment done to the national copyright law that stipulates that civil action can taken against software creators/publishers that do not take steps in preventing users from accessing illegal content.

USA

Under US law "the Betamax decision" (Sony Corp. of America v. Universal City Studios, Inc.), case holds that copying "technologies" are not inherently illegal, if substantial non-infringing use can be made of them. This decision, predating the widespread use of the Internet applies to most data networks, including peer-to-peer networks, since legal distribution of some files can be performed. These non-infringing uses include sending open source software, creative commons works and works in the public domain. Other jurisdictions tend to view the situation in somewhat similar ways.

The US is also a signatory of the WIPO treaties, treaties that were partially responsible for the creation and adoption of the Digital Millennium Copyright Act (DMCA).

As stated in US Copyright Law, one must be keep in mind the provisions for fair use, licensing, copyright misuses and the statute of limitations.

MGM v. Grokster

TODO

TODO
http://www.eff.org/IP/P2P/MGM_v_Grokster/

RIAA

The RIAA and the labels took an aggressive stance as soon as online music file sharing became popular. They won an early victory in 2001 by shutting down the seminal music-sharing service Napster.

The site was an easy target because Napster physically maintained the computer servers where illegal music files, typically in high-fidelity, compressed, download-friendly MP3 format, were stored. [With P2P networks, the files are stored on individual user computers; special software lets consumers "see" the files and download them onto their own hard drives.]
—Daphne Eviatar, "Record industry, music fans out of tune," The Recorder, August 20, 2003

The Recording Industry Association of America (RIAA) ( http://www.riaa.com/ ) is the trade group that represents the U.S. recording industry. The RIAA receives funding from the four of the major music groups EMI, Warner, Sony BMG and Universal and hundreds of small independent labels.

Motion Picture Association of America

TODO

TODO
Warner Bros. to Try File Sharing in Germany

MPAA sues newsgroup, P2P search sites By John Borland, Published on ZDNet News, February 23, 2006

Canada

Canada has a levy on blank audio recording media, created on March 19, 1998, by the adoption of the new federal copyright legislation. Canada introduced this levy regarding the private copying of sound recordings, other states that share a similar copyright regime include most of the G-7 and European Union members. In depth information regarding the levy may be found in the Canadian copyright levy on blank audio recording media FAQ ( http://neil.eton.ca/copylevy.shtml ).

With borders and close ties to it neighbor, Canada as historically been less prone to serve corporations interests and has a policy that contrasts in its social aspects with any other country in the American Continent. The reality is that Canada has been highly influenced and even pressured (economically and politically) by its strongest neighbor, the USA, to comply with its legal, social and economic evolution. In recent time (November 2007) the government of Canada has attempted to push for the adoption of a DMCA-modeled copyright law, so to to comply with the WIPO treaties the country signed in 1997 in a similar move to the USA, this has resulted in a popular outcry against the legislation and will probably result in it's alteration. The visibility of this last attempt was due to efforts of Dr. Michael Geist, a law professor at the University of Ottawa considered an expert in copyright and the Internet, that was afraid that law would copy the worst aspects of the U.S. Digital Millennium Copyright Act.

Shadow Play

Some actions are not intended to see the light of day, this section is dedicated to bring out some of the subjects/actions in an attempt to help the reader to fully appreciate some of the less publicized information that has some kind of baring on the evolution of P2P.

OS

Since P2P (and P2P related technologies) started to pop up, the security of the user OS started to be placed above user freedom, probably due to most people being, technologically challenged, some organization (or groups with invested interest) are still free to think for the masses in place of just try to push the information out. There is an organized attempt to hide this fact from the public, it is funny to see that after this security enhancements are done they tend to be hidden so not to cause any brain damage or confusion (read foment rebellion) into users.

Well not all is lost, some people can't seem to be made to comply with this state of things and some information can be found and actions reversed.

about MS Windows
  • TCPIP.SYS - [fix],[info] for Windows XP.

ISPs

TODO

TODO
Add missing information about ISPs interactions with P2P technologies.

Internet providers (ISPs) aren't very pleased with P2P technologies due to the load the bring into to their networks, although they sell their Internet connections as unlimited usage, if people actually take on their offer, ISPs will eventually be unable to cope with the demand at the same price/profit level. This has made clients increasingly worried over some ISPs actions, from traffic shaping (protocol/packet prioritization) to traffic tampering.

San Francisco-based branch of the Electronic Frontier Foundation (EFF) a digital rights group have successfully verified that this type of efforts by Internet providers to disrupt some uses of their services and evidences seem to indicate that it is an increasing trend other as reports have reached the EFF and verified by an investigation by The Associated Press.

EFF Releases Report Interference with Internet Traffic on ComCast ( http://www.eff.org/wp/packet-forgery-isps-report-comcast-affair ), other information is available about this subject on the EFF site.

Traffic shaping

TODO

TODO
Add missing information.

Traffic tampering

Traffic tampering is more worrying then Traffic shaping and harder to be noticed or verified. It can also be defined as spoofing, consisting in the injection of adulterated/fake information into communication by gaming a given protocol. It like the post office taking the identity of one of your friends and sending mail to you in it's name.

Pcapdiff ( http://www.eff.org/testyourisp/pcapdiff/ ) is a free Python tool developed by the EFF to compare two packet captures and identify potentially forged, dropped, or mangled packets.

Localization/Acceleration (Cache)

Network neutrality

Network Neutrality deals with the need to prevent ISPs from double dipping on charges/fees for both the clients paying for their broadband connections and WEB sites/Organizations having also to pay for prioritization of traffic according to origination and destination or protocol used.

P2P Networks and Protocols

This chapter will try to provide an overview of what is Peer-to-Peer, it's historical evolution, technologies and uses.

P2P and the Internet: A "bit" of History

P2P is not a new technology, P2P is almost as old as the Internet it started with the email and the next generation were called "metacomputing" or were classed as "middleware", the concept of it took the Internet by storm only because of a general decentralization of the P2P protocols, decentralization is the key word, not only it gives power to the simple user that is now in a leveled play field due to the easy access to powerful machines and infrastructures, but it also makes savings on information distribution resources a different approach from the old centralization concept, this can be a problem for security or control of that shared information, or in other words a "democratization" of the information (the well known use of P2P for downloading copies of MP3s, programs, and even movies from file sharing networks), and due to it's decentralizing nature the traffic patterns, are hard to predict, so, providing infrastructures to support it is a major problem most ISPs are now aware.

P2P has also been heralded as the solution to index the deep Web since most implantations of P2P technologies are based and oriented to wired networks running TCP/IP. Some are even being transfered to wireless uses (sensors, phones and robotic applications), you probably have already heard of some military implementation of intelligent mines or robotic insect hordes.

eMail

Peer2Mail

Peer to Mail ( http://www.peer2mail.com/ ) is a FreeWare application for Windows that lets you store and share files on any web-mail account, you can use Web-mail providers such as Gmail (Google Mail), Walla!, Yahoo and others, it will split the shared files into segments that will be compressed and encrypted and then sends the file segments one by one to an account you have administration access. To Download the files the process is reversed.

Security

The ecryptation was broken in Peer2Mail v1.4 (prior versions are also affected) - Peer2Mail Encrypt PassDumper Exploit.

Usenet

Usenet is the original peer to peer file-sharing application. It was originally developed to make use of UUCP (Unix to Unix Copy) to synchronize two computers' message queues. Usenet stores each article in an individual file and each newsgroup in its own directory. Synchronizing two peers is as simple as synchronizing selected directories in two disparate filesystems.

Usenet was created with the assumption that everyone would receive, store and forward the same news. This assumption greatly simplified development to the point where a peer was able to connect to any other peer in order to get news. The fragmentation of Usenet into myriad newsgroups allowed it to scale while preserving its basic architecture. 'Every node stores all news' became 'every node stores all news in newsgroups it subscribes to'.

Of all other peer-to-peer protocols, Usenet is closest to Freenet since all nodes are absolutely equal and global maps of the network are not kept by any subset of nodes. Unlike Freenet, which works by recursive pulling of a requested object along a linear chain of peers, Usenet works by recursive pushing of all news to their immediate neighbors into a tree.

FIDO net

FTP

The File Transfer Protocol (FTP), can be seen as a primordial P2P protocol. Even if it depends on a client/server structure the limitation is only on the type of application (client/server) one run since the roles are flexible.

File eXchange Protocol (FXP)

Instant Messaging

Instant messaging is the act of instantly communicating between two or more people over a network (LAN or WAN). It requires the use of a client program so that when a message is sent, a notification is shown a short time after on the destination application, enabling it's user to reply to the original messages. Instant messaging allows users to send quick notes or reminders to other users in almost real time. IM can, but may not, include any other P2P service like, file-sharing, VoIP or Video Conference the broad definition is that IM is the almost instantaneous trading of messages, whatever form it takes.

TODO

TODO
Describe what it is, the future and what it has to do with P2P...

Security Risks

TODO

TODO
Messages can easily be intercepted, "spoofed" or modified...

Internet Relay Chat (IRC)

Internet Relay Chat, commonly abbreviated IRC is a real-time text-based multi-user communication protocol specification and implementation; it relays messages between users on the network. IRC was born sometime in 1988 according to Efnet.org ( http://efnet.org/ ). According to IRChelp.org ( http://www.irchelp.org/irchelp/rfc/ ), the official specification for IRC was written in 1993 in the RFC format. The protocol is defined in the "RFC 1459: Internet Relay Chat Protocol" is a really excellent source for both and introduction to and for detailed information about the IRC protocol. Today IRC has a very wide range of users and anyone can find a place to participate in chat.

IRC's largest unit of architecture is the IRC network. There are perhaps hundreds of IRC networks in the world each one running parallel and disjoint from the others. A client logged into one network can communicate only with other clients on the same network, not with clients on other networks. Each network is composed of one or more IRC servers. An IRC client is a program that connects to a given IRC server in order to have the server relay communications to and from other clients on the same network but not necessarily the same server.

Messages on IRC are sent as blocks. That is, other IRC clients will not see one typing and editing as one does so. One creates a message block (often just a sentence) and transmits that block all at once, which is received by the server and based on the addressing, delivers it to the appropriate client or relays it to other servers so that it may be delivered or relayed again, et cetera.

Once connected to a server, addressing of other clients is achieved through IRC nicknames. A nickname is simply a unique string of ASCII characters identifying a particular client. Although implementations vary, restrictions on nicknames usually dictate that they be composed only of characters a-z, A-Z, 0-9, underscore, and dash.

Another form of addressing on IRC, and arguably one of its defining features, is the IRC channel. IRC channels are often compared to CB Radio (Citizen's Band Radio) channels. While with CB one is said to be "listening" to a channel, in IRC one's client is said to be "joined" to the channel. Any communication sent to that channel is then "heard" or seen by the client. On the other hand, other clients on the same network or even on the same server, but not on the same channel will not see any messages sent to that channel.

Updated information on IRC can be obtained at IRC.org, the move to support IPv6 and the new technical papers, the IETF (Internet Engineering Task-Force) approved the most current technical drafts ( April 2000 - authored by C Kalt):

     RFC 2810 : IRC Architecture
     RFC 2811 : IRC Channel-Management
     RFC 2812 : IRC Client-Protocol
     RFC 2813 : IRC Server-Protocol

These documents are already available on IRC.org's official FTP-server, reachable at ftp://ftp.irc.org/irc/server

While IRC is by definition not a P2P protocol, IRC does have some extensions that support text and file transmission directly from client to client without any relay at all. These extensions are known as DCC (Direct Client Connect) and CTCP (Client To Client Protocol). For CTCP, clients like mIRC implement commands such as "ctcp nickname version" or "ctcp nickname ping" to get some interesting infos about other users.

Ident Protocol

The Ident Protocol, specified in RFC 1413, is an Internet protocol that helps identify the user of a particular TCP connection, and differentiate them from others sharing the same connection on the a server.

The Ident Protocol is designed to work it self as a server daemon, on a user's computer, where it receives requests to a specified port, generally 113. The server will then send a specially designed response that identifies the username of the current user.

Most standalone Windows machines do not have an Ident service running or present by default, in this case you may need to run your own Ident server (there are several stand alone servers available), on the other hand if you are on a Unix/Linux machine the service is there by default. Some Windows IRC clients have also an Ident server built into them.

The reason for having an running Ident server is due to some IRC servers going so far as blocking clients without an Ident response, the main reason being that it makes it much harder to connect via an "open proxy" or a system where you have compromised a single account of some form but do not have root.

DCC Protocol

CTCP Protocol

Bots or Robots

IRC systems also support (ro)bots, in this case they are not real users but a collection of commands that are loaded from a script (text) file into the IRC client, or even a stand alone program that connects to a IRC channel. They serve to ease the human interaction with the system, provide some kind of automation or even to test or implement some AI.

Basic Commands

Here are some basic commands for IRC:

Command What it does Example
/attach

/server

Sign on to a server /attach irc.freenode.net

/server irc.freenode.net

/nick Set your nickname /nick YourName
/join Join a channel /join #wikibooks
/msg Sends a message (can either be private or to the entire channel) Message the channel: /msg #wikibooks hello world!

Send a private message: /msg JohnDoe Hi john.

/whois Display information about a user on the server /whois JohnDoe
/clear

/clearall

Clears a channel's text.

Clears all open channel's text.

/clear

/clearall

/away Sets an away message. Note: Type /away again to return from away. /away I'm away because...
/me Sends an action to the channel. See example. The following:

/me loves pie.

would output to the chat in the case of JohnDoe:

JohnDoe loves pie.

Privileged User Commands

Commands for half-operators, channel operators, channel owners, and Admins:

Command What it does Example
/kick Kicks, or boots a user from the channel. You must be a half-operator or greater to do this. Kick a user from the channel with a reason: /kick JohnDoe I kicked you because...
/ban

/unban

Bans a user from the channel. You must be a channel operator or greater to do this.

Unbans a user from the channel. You must be a channel operator or greater to do this.

/ban JohnDoe

/unban JohnDoe

IRC Networks

Software Implementations

  • KVIrc ( http://www.kvirc.net/ ) an open source (GPL) portable IRC client based on the Qt GUI toolkit and coded in C++.
  • Bersirc ( http://bersirc.free2code.net/index.php/home/ ), an open source IRC client (LGPL), coded in C, that runs on Windows (Linux and Mac OS X ports under development) by utilizing the Claro GUI Toolkit.
  • XChat ( http://www.xchat.org/ ) is an IRC (chat) program for Windows and UNIX (Linux/BSD) operating systems. I.R.C. is Internet Relay Chat. XChat runs on most BSD and POSIX compliant operating systems. Open Source (GPL), coded in C.
  • Irssi ( http://irssi.org/ ), an IRC client program originally written by Timo Sirainen, and released under the terms of the GNU General Public License. It is written in the C programming language and in normal operation uses a text-mode user interface.
  • mIRC ( http://www.mirc.co.uk/ ), a shareware Internet Relay Chat client for Windows, created in 1995 and developed by Khaled Mardam-Bey. This was originally its only use, but it has evolved into a highly configurable tool that can be used for many purposes due to its integrated scripting language.

You can also check Wikipedia list of IRC clients and Comparison of Internet Relay Chat clients (not up-to-date)...

Invisible IRC Project

A technological advancement in relation to normal IRC networks, created by invisibleNET, a research & development driven organization whose main focus is the innovation of intelligent network technology. Its goal is to provide the highest standards in security and privacy on the widely used, yet notoriously insecure Internet.

Invisible IRC Project ( http://www.invisiblenet.net/ ) is a three-tier, peer distributed network designed to be a secure and private transport medium for high speed, low volume, dynamic content. Features:

  • Perfect Forward Security using Diffie-Hellman Key Exchange Protocol
  • Constant session key rotation
  • 128 bit Blowfish node-to-node encryption
  • 160 bit Blowfish end-to-end encryption
  • Chaffed traffic to thwart traffic analysis
  • Secure dynamic routing using cryptographically signed namespaces for node identification
  • Node level flood control
  • Seamless use of standard IRC clients
  • Gui interface
  • Peer distributed topology for protecting the identity of users
  • Completely modular in design, all protocols are plug-in capable

The IIP software is released under the GPL license and is available for Windows 98/ME/NT/2000/XP, *nix/BSD and Mac OSX, coded in C.

Other IM Networks

TODO

TODO
AIM, ICQ, MSN, Yahoo!, IRC, Jabber, Gadu-Gadu, SILC, GroupWise Messenger, and Zephyr

Software Clients

  • Gaim/Pidgin ( http://pidgin.im/pidgin/home/ ) OpenSource (GPL) instant messaging client supporting Windows, GNU, BSD, and many Unix derivatives and compatible with AIM, ICQ, MSN, Yahoo!, IRC, Jabber, Gadu-Gadu, SILC, GroupWise Messenger, and Zephyr networks.
  • Trillain ( http://www.ceruleanstudios.com/ ) skinnable chat client that supports AIM, ICQ, MSN, Yahoo!, and IRC, it also includes many features not included in those chat programs.

VoIP

Voice over IP can also be seen like an extension of a IM were text is substituted by live audio or video, the technological challenges are very similar, if not considering the type data of data that needs to be transfered and specific considerations due to timings. It is not uncommon for IM applications to also support VoIP or video conferencing.

Security on VoIP faces the same vulnerabilities and security threats of other P2P protocols and applications, including fuzzing, floods, spoofing, stealth attacks and VoIP spam.

BitTorrent

BitTorrent is a protocol ( Bittorrent Protocol Specification v1.0 ) created by Bram Cohen and designed to distribute primarily large computer files over the Internet, it can be used to distribute legitimate content and to enable copyright infringement on a massive scale. It is peer-to-peer in nature, as users connect to each other directly to send and receive portions of a large file from other peers who have also downloaded either the file or parts of the it. These pieces are then reassembled into the full file. Since the users are downloading from each other and not from one central server, the bandwidth load of downloading large files is divided between the many sources that the user is downloading from. This decreases the bandwidth cost for people hosting large files, and increases the download speeds for the people downloading large files, because the protocol makes use of the upstream bandwidth of every downloader to increase the effectiveness of the distribution as a whole, and to gain advantage on the part of the downloader. However, there is a central server (called a tracker) which coordinates the action of all such peers. The tracker only manages connections, it does not have any knowledge of the contents of the files being distributed, and therefore a large number of users can be supported with relatively limited tracker bandwidth. The key philosophy of BitTorrent is that users should upload (transmit outbound) at the same time they are downloading (receiving inbound.) In this manner, network bandwidth is utilized as efficiently as possible. BitTorrent is designed to work better as the number of people interested in a certain file increases, in contrast to other file transfer protocols.

BitTorrent is redefining the way people share and search for content and is getting very popular for downloading movies, TV shows, full music albums and applications (it gains in performance with other alternatives) since it is very file specific and it gains on the "new" factor of P2P content, more users equals more speed, but it will not be the optimum solution to rare files or to distribute content that is not highly sought over.

To download files that are hosted using BitTorrent users must have a BitTorrent client and to publish a file one must run a tracker.

In November 2004, BitTorrent accounted for an astounding 35 percent of all the traffic on the Internet and in 2006 the BitTorrent protocol has risen to over 60 percent of all Internet traffic according to British Web analysis firm CacheLogic. Due to this some ISPs are doing traffic shaping also know as bandwidth throttling, meaning they are reducing the protocol priority inside their networks and reducing its overall performance this has resulted in two kind of responses, some ISPs are investing in upgrading their networks and provide local cache to the protocol and implementors of the protocol are starting to battle ISPs that refuse to adapt by encrypting and randomizing it, this kick need to adapt and the increasing popularity due to deviation from its creators vision is placing more and more its evolution on the hands of independent developers.

BitTorrent ( http://www.bittorrent.com/ ) is also the name of the original implementation of the protocol it started as a Python ( source code and old versions ) application to a full featured commercial enterprise. BitTorrent.com is now a destination to download entertainment content using the BitTorrent protocol. The site provide fast, on-demand access to the most comprehensive licensed catalog of thousands of movies, TV shows, music and games, but it also provides content creators a publishing platform to list their works in high-quality alongside the most recognizable titles from major movie studios, TV networks, and record labels.

Content Indexers

Relevant Sites

Legal Torrents ( http://www.legaltorrents.com ), a collection of Creative Commons-licensed, legally downloadable, freely distributable creator-approved files, from electronic/indie music to movies and books, which have been made available via BitTorrent. Everyone that grabs the BitTorrent client and downloads helps contribute more bandwidth, because BitTorrent utilizes your unused upload bandwidth. Again, please note that all of the current torrents are made available under a Creative Commons license with the full permission of the rights holder.

bt.etree.org ( http://bt.etree.org/ ), a site provided by the etree.org community for sharing the live concert recordings of trade friendly artists.

Other:

Protocol

BitTorrent is a protocol for distributing files. It identifies content by URL and is designed to integrate seamlessly with the web. Its advantage over plain HTTP is that when multiple downloads of the same file happen concurrently, the downloaders upload to each other, making it possible for the file source to support very large numbers of downloaders with only a modest increase in its load. ( http://www.bittorrent.com/protocol.html )

TODO

TODO
New Protocol To Boost BitTorrent Speeds called the “Cache Discovery Protocol” or CDP, which supposedly will act like DHCP for peer to peer networks.

Software Implementations

Wikipedia provides an article with a Comparison of BitTorrent software.

  • BitTorrent.
  • BitTorrent Queue Manager ( http://btqueue.sourceforge.net/ ), a Console-based BitTorrent Client with built-in scheduler for handling multiple sessions. It is designed to manage sessions in queue easily without heavy-weight GUI. External module can search for new torrents in trackers and submit it automatically. OpenSource (Python Software Foundation License) project, using Python.
  • Azureus ( http://azureus.sourceforge.net or http://www.getazureus.com/ ), an open source BitTorrent client in Java, probably the more advanced peer for the network (multiple torrent downloads, queuing/priority systems, start/stop seeding options, embedded tracker, Mainline DHT and a lot more) but a known resource hog, consuming large quantities of memory and CPU power.
  • µTorrent ( http://utorrent.com ), a closed source, freeware BitTorrent client in C++, a very complete peer (includes bandwidth prioritization, scheduling, RSS auto-downloading and Mainline DHT and more) with a very low system footprint.
  • BitTornado ( http://bittornado.com ), an open source BitTorrent client in Python.
  • BitComet ( http://www.bitcomet.com/ ), a closed source, freeware BitTorrent client for the MS Windows OS only, it also supports HTTP/FTP download management.
  • ABC [Yet Another BitTorrent Client] ( http://pingpong-abc.sourceforge.net ), an open source BitTorrent client, based on BitTornado.
  • Transmission ( http://transmission.m0k.org/ ), an open source lightweight BitTorrent client with a simple graphic user interface on top of a cross-platform back-end. Transmission runs on Mac OS X with a Cocoa interface, Linux/NetBSD/FreeBSD/OpenBSD with a GTK+ interface, and BeOS with a native interface. Released under the MIT/X Consortium License.
  • Warez ( http://www.warezclient.com/ ), a closed source, MS Windows only BitTorrent client from Neoteric Ltd. (previously supporting the Ares Network Warez P2P client).

Napster

The Naspter network was created at application-level using a client-server protocol over point-to-point TCP. The server was in this case a centralized directory that would hold an index of all files offered (MP3/WMA). The clients would connect to the server, identify themselves to the server (users had an account on the server) and send a list of MP3/WMA files they were sharing to it enabling other clients to search that central repository for any file on the network and then request it from any available source.

Napster protocol specifications

Opennap

Another Napster based peer-to-peer, created as open source, extending the Napster protocol to allow sharing of any media type, and adds the ability to link servers together.

Direct Connect

Direct connect is a peer-to-peer file sharing protocol/network but it uses a central server, this reliance on a central point can also be seen on the old Napster network, in that each server build an independent network (not an hybrid like for instance with eMule). One should note that some clients are now also implementing DHTs that will result in unifying used networks. The Direct Connect protocol was originally developed by Jonathan Hess for use on the Neo-Modus' Direct Connect (NMDC) v1, released September 2001 and partially in NMDC v2, released in July 2003.

Direct Connect defines the servers as HUBs. Clients connect to a central hub and that hub feature a list of clients or users connected to it. Users can then search for files to download, or as chat with other users present (on that server).

Direct Connect also implements Tiger tree hashing (TTH) for for file transfers.

NMDC Protocol

created by Jon Hess at Neo-modus protocol mirror ( http://www.teamfair.info/wiki/index.php )

ADC Protocol

The ADC protocol ( http://dcplusplus.sourceforge.net/ADC.html ) is similar to the Neo-Modus' Direct Connect (NMDC) protocol. It consists of a text protocol for a client-server network, created with goal to be simple, yet extensible.

Jon Hess contributed to the creation of this protocol with the original Direct Connect idea through the Neo-Modus Direct Connect client / hub. Other major contributing source was Jan Vidar Krey's DCTNG draft that lead to subsequent work by Dustin Brody, Walter Doekes, Timmo Stange, Fredrik Ullner, Fredrik Stenberg and others.

HUB Software Implementations

Client Software Implementations

KaZaa

Software (FastTrack) Implementations

  • Kazaa
  • Kazaa Lite
  • Diet Kaza
  • giFT
  • Grokster
  • iMesh

JXTA

JXTA™ technology, created by Sun™ ( http://www.jxta.org ), is a set of open protocols that allow any connected device on the network ranging from cell phones and wireless PDAs to PCs and servers to communicate and collaborate in a P2P manner. JXTA peers create a virtual network where any peer may interact with other and their resources directly even when some of the peers and resources are behind firewalls and NATs or are on different network transports. The project goals are interoperability across different peer-to-peer systems and communities, platform independence, multiple/diverse languages, systems, and networks, and ubiquity: every device with a digital heartbeat. The technology is is licensed using the Apache Software License (similar to the BSD license).

Most of the implementation is done in Java (with some minor examples in C).

iFolder

iFolder ( http://www.ifolder.com ) is an still in early development open source application, developed by Novell, Inc., intended to allow cross-platform file sharing across computer networks by using the Mono/.Net framework.

iFolder operates on the concept of shared folders, where a folder is marked as shared and the contents of the folder are then synchronized to other computers over a network, either directly between computers in a peer-to-peer fashion or through a server. This is intended to allow a single user to synchronize their files between different computers (for example between a work computer and a home computer) or share files with other users (for example a group of people who are collaborating on a project).

The core of the iFolder is actually a project called Simias. It is Simias which actually monitors files for changes, synchronizes these changes and controls the access permissions on folders. The actual iFolder clients (including a graphical desktop client and a web client) are developed as separate programs that communicate with the Simias back-end.

The iFolder client runs in two operating modes, enterprise sharing (with a server) and workgroup sharing (peer-to-peer, or without a server).

Gnutella

While Gnutella is stated as a fully-distributed information-sharing technology, later versions of the protocol are a mix of centralized and distributed networks with "Servers" ( Ultra or Super peers) and "Clients" ( Leafs or Nodes ).

A Gnutella client software is basically a mini search engine (offering an alternative to web search engines) and file serving system in one.

TODO

TODO
[http://rfc-gnutella.sourceforge.net/developer/testing/index.html RFC-Gnutella 0.6]

Gnutella in new implementations also supports Tiger tree hashing (TTH) for for file transfers.

Gnutella2 ( Mike's Protocol,G2 )

The result of a fork of the Gnutella protocol, due to the failing of the developers community to reach a consensus on the evolution of the protocol.

Gnutella2 is also called Mike's Protocol since the first changes and implementation resulted from a single developer Michael Stokes. In November 2002, Michael Stokes formally and unilaterally announced the creation Gnutella2 protocol to the Gnutella Developers Forum, which caused a schism among the developers, and lead for the modifications no to be supported in several Gnutella applications since the original proposal did conflict with other vendors concepts (in specific LimeWire and Bearshare).

The now resulting implementation drops all of the old Gnutella protocol except for the connection handshake and adopts an entirely new search algorithm. Gnutella2 is often abbreviated as G2.

TODO

TODO
Complete info

Software Implementations

  • Deepnet Explorer ( http://www.deepnetexplorer.com/ ) a browser with RSS news reader, P2P client integration (Gnutella) and phishing alarm, closed source, Windows only, freeware.
  • Shareaza ( http://shareaza.sourceforge.net/ ), Open Source (GPL), coded in C++, MFC and ATL. Multi-network peer-to-peer file-sharing client supporting Gnutella2 (G2), Gnutella, eDonkey2000/eMule, BitTorrent, FTP and HTTP protocols.
  • LimeWire a peer-to-peer file sharing client for the Java Platform, Open Source (GPL), which uses the Gnutella network to locate and transfer files. It also encourages the user to pay a fee, which will then give the user access to LimeWire Pro.
  • Phex
  • XoloX
  • Gnucleus - Gnutella, Gnutella2 (G2).
  • gtk-gnutella
  • FrostWire ( http://sourceforge.net/projects/frostwire/ ), a Peer to Peer (P2P) information sharing client for the Gnutella network. This project is not affiliated with LimeWire LLC. FrostWires' source code (Java) is Licensed under the GNU GPL Open Source license.
  • Hydranode ( multi-protocol, referenced on the eDonkey2000/eMule section )

Ares Network

Ares (software implementation) was developed in the middle of 2002, originally using the Gnutella network. After Six months of operation, it switched to its own network comprising the leaves-and-supernodes p2p architecture. Having a protocol that can be difficult to identify made Ares at times the only P2P client that could functions on restricted networks, such as some university campuses.

Software Implementations

  • Ares ( http://aresgalaxy.sourceforge.net/ ), a Chat/File Sharing P2P implementation in Delphi/Kylix. It's based on a Network organized into leafs and supernodes into a topology featuring broadcast-type searches. Ares can deliver a broader search horizon by means of the DHT technology, using a mime filter to DHT engine. Ares users can also join chat rooms or host a channel. It is for 32-bit MS Windows Operating Systems (NT/2000/XP) and Open Source under the GPL License (GNU General Public License). From version 1.9.0, data sharing was enabled between two peers behind a firewall. From version 1.9.4, Ares included support for the BitTorrent protocol. From version 1.9.9, Ares Galaxy has an experimental support for the SHOUTcast internet radio
Discontinued Implementations
  • Warez P2P was a proprietary P2P filesharing service that uses the Ares network, and offers a service similar to that of Kazaa. Up to version 1.6, Warez P2P was a clone of Ares Galaxy, created by Italian developer Alberto Trevisan, but since then has been developed independently by Neoteric Ltd until recently when it was discontinued.

Freenet

TODO

TODO
Complete

  • Frost (network)|Frost
  • Espra

GNUnet

GNUnet ( http://gnunet.org/ ), was started in late 2001, as a framework for secure peer-to-peer networking that does not use any centralized or otherwise trusted services. A service implemented on top of the networking layer allows anonymous censorship-resistant file-sharing. GNUnet uses a simple, excess-based economic model to allocate resources. Peers in GNUnet monitor each others behavior with respect to resource usage; peers that contribute to the network are rewarded with better service.

GNUnet is part of the GNU project. Our official GNU website can be found at ( http://www.gnu.org/software/gnunet/ ), there is only an existing client, OpenSource, GPL, written in C, that shares the same name as the network. GNUnet can be downloaded from this site or the GNU mirrors.

eDonkey

eDonkey the original client for the eDonkey network (also known as eDonkey2000 network or eD2k), was created and managed by MetaMachines (Sam Yagan and Jed McCaleh) based on the city of new York. It had a stable P2P community and the protocol was older than BitTorrent it was created in 2002 shortly after the closing of Napster and competed with the FastTrack network. In June of 2005, the entertainment industry gained a victory in the Supreme Court (USA) that stated that every file-sharing developer could be sued for copyright infringement if they induced such behavior. In September 2005 the Recording Industry Association of America (RIAA) sent several commercial P2P developer cease and desist letters including to MetaMachines and with no founds to battle the interpretation of the Supreme Court decision, Sam Yagan conceded defeat as he testified to the United States Senate Judiciary Committee.

On September 11, 2006 users could not get the eDonkey2000 client software, in September 12, 2006 MetaMachines settles for $30 Million (US) and the agreement closes any avenue MetaMachines had in dealing with any P2P technology in the future...

The eDonkey networks is centralized (as it depends on serves) to provide decentralized sharing of content (not stored on the servers), there are still many software implementations that support the network the most popular is eMule.

Kademlia

Started as the Overnet project by Jed McCaleb, the creator of eDonkey2000 to overcome the need of servers. Overnet implemented the Kademlia algorithm. In late 2006, Overnet and all Overnet-owned resources were taken down as a result of legal actions from the RIAA and others. However, since the core of Overnet is decentralized, Overnet clients are still able to function with limited functionality.

The KadC library (http://kadc.sourceforge.net/ ) provides an OpenSource C library to publishing and retrieving records in Kademlia-based Distributed Hash Tables.

A some what old paper named Kademlia: A Peer-to-peer Information System Based on the XOR Metric by Petar Maymounkov and David Mazières can also be a source of information about the protocol.

The network is now known as Kademlia and is supported by many of the implementations of the old eDonkey/Overnet clients, especially by the eMule project. Kademlia is a research effort to implement a full-featured peer-to-peer system based on the XOR metric routing. Of special interest are the objectives for efficient data storage and query; anonymity; network, content and user security and authentication.

eMule content database

eMule mascot.


( http://content.emule-project.net/ ) a service provided by the eMule project team for the eDonkey2000 and Kad network users, to make free content available for download and easy to find. The content database has been on line since around new years 2004.

Software Implementations

  • eMule ( http://www.emule-project.net/ ) a filesharing software implementation based on the eDonkey2000 network but offers more features than the standard client, open source C++/MFC and windows only, licensed under GPL ( http://sourceforge.net/projects/emule/ )
  • Xmod ( http://savannah.nongnu.org/projects/x-mod/ ) The Xmod is a Project is based on the eMule Client, OpenSource under the GPL.
  • xMule ( http://www.xmule.ws/ ), the X11 Mule, intended to bring a clone of eMule to virtually all the major Unix platforms, with a particular emphasis on Linux. C++ using wxWidgets for the GUI released as OpenSource under the GPL.
  • MLdonkey ( http://mldonkey.sourceforge.net ) is a multi-platform, multi-network P2P implementation. It supports several large networks such as eDonkey, Overnet, Kademlia, Bittorrent, Gnutella (Bearshare, Limewire, etc.), Gnutella2 (Shareaza), or Fasttrack (Kazaa, Imesh, Grobster). Networks can be enabled or disabled. Searches are performed in parallel on all enabled networks. For some networks, each file can be downloaded from multiple clients concurrently.
  • AMule ( http://www.amule.org/wiki/ ) a Project is based on the eMule Client, OpenSource under the GPL, currently supports Linux, FreeBSD, OpenBSD, Windows, MacOS X and X-Box on both 32 and 64 bit computers.
  • eMule Bowlfish ( http://pwp.netcabo.pt/DeepSea/ ), another eMule based project that aims to provide an restricted Network solution.
  • Hydranode ( http://hydranode.com/ ) a modular, plugin-driven peer-to-peer client framework which is designed with true multi-network downloads in mind (Support for eDonkey2000 and Bittorrent networks). OpenSource under the GPL, supports Linux and Windows.
  • Shareaza ( multi-protocol, referenced on the Gnutella section )

SoulSeek

Mute File Sharing

MUTE File Sharing ( http://mute-net.sourceforge.net ) is an anonymous, decentralized search-and-download file sharing system. MUTE uses algorithms inspired by ant behavior to route all messages, include file transfers, through a mesh network of neighbor connections.
Author Jason Rohrer - jcr13 (at) cornell (dot) edu Created using C++ and Crypto++ Library, support is provided for multiple OSs there is a frontend for Windows created with MFC, Mute is Open Source and released under the GPL License.

BitCoop

BitCoop (http://bitcoop.sourceforge.net/) created by Philippe Marchesseault is a console (Text Based) peer to peer backup system that enables the storage of files on remote computers with cryto and compression support. The size of files depends on the quantity you wish to share with the other peers. It is intended for server farms that wish to backup data among themselves. Supports various Operating Systems including Windows, Linux and Mac OS X, it's implemented in Java (Open Source under the GPL).

CSpace

CSpace (http://cspace.in/) provides a platform for secure, decentralized, user-to-user communication over the Internet. The driving idea behind the CSpace platform is to provide a connect(user,service) primitive, similar to the sockets API connect(ip,port). Applications built on top of CSpace can simply invoke connect(user,service) to establish a connection. The CSpace platform will take care of locating the user and creating a secure, nat/firewall friendly connection. Thus the application developers are relieved of the burden of connection establishment, and can focus on the application-level logic! CSpace is developed in Python. It uses OpenSSL for crypto, and Qt for the GUI. CSpace is licensed under the GPL.

I2P

I2P is a generic anonymous and secure peer to peer communication layer. It is a network that sits on top of another network (in this case, it sits on top of the Internet). It is responsible for delivering a message anonymously and securely to another location.


Other P2P Software Implementations

  • XNap ( http://xnap.sourceforge.net/ ) OpenSource (GPL), written in Java. The client features a modern Swing based user interface and console support. Able to work in several P2P Networks OpenNap, Gnutella, Overnet and OpenFT (and other networks supported by giFT like FastTrack). It also supports ICQ and IRC, viewers for MP3 tags, images, PDF, ZIP files and Text-To-Speech.
  • Napster network
    • WinMX
    • Napigator
    • FileNavigator
  • WPNP network
    • WinMX
  • MANOLITO network
    • Blubster
    • Piolet
  • other networks
    • MojoNation
    • Carracho
    • Hotwire
    • Chord peer-to-peer lookup service|Chord
    • Dexter
    • Swarmcast
    • Alpine program|Alpine
    • Scribe
    • Groove
    • Squid_Soft|Squid
    • Akamai
    • Evernet
    • Overnet network
    • Audiogalaxy network
    • SongSpy network
    • FileTopia
    • The Circle
    • OpenFT
  • Acquisition
  • Cabos
  • Swapper

Building a P2P System

Developer/Vendor

Selecting the Programming Language

TODO

TODO

RAD (prototype) vs resource use optimization

Selecting the License

Open vs Closed Source

How can P2P generate revenue

TODO

TODO

Pushing content
Marketing(ads,...)
Monitoring and Control (know what your customers are searching for,control what they see)
Premium User list

Donations

Donations is a model that is open for all types of software, open source or closed source, the objective is to let users to freely contribute to a project they like, most probably you will not get a fixed income of this revenue source but it may not be the only way you use to get a payoff or even profit from the project. If you use this method attempt to be clear on how the donations will be used (to further development, etc...), immediate needs you may have (hosting, services and equipment to develop and test ).

Depending were your project is located and how it is structured and focused there are ways to maximize the revenue, a common way to run a donation based project is to set up a non-profit corporation (you can even issue receipts for tax purposes).

Most people my not like it but providing a donors/supporters page list does incentive participation, and may event show users that small amount are a help, if you decide to list donors do offer an option to be excluded from it.

For an in depth look on the most of donation problems, you may read the article, When Do Users Donate? Experiments with Donationware: Ethical Software, Work Equalization, Temporary Licenses, Collective Bargaining, and Microdonations ( http://www.donationcoder.com/Articles/One/index.html ). You may even try to join their project or support similar ones, like for instance microPledge ( http://micropledge.com/ ) or even create a similar offering around your own product.

Examples:

Money
TODO

TODO
micro-payment - PayPal, Amazon Gift Certificate.

Hardware

Is is also common programmers or projects to accept Hardware donations by request or to incentive the project to add support to exclusive features or specific setups. If you adopt to support this feature do provide and maintain a list of hardware that is wanted and how it would help you.

Shareware / for Pay

This is the most problematic setup due to the legal hot-waters it can get you into and the formalisms and obligations that you need to comply to.

Also restricting the participation on the network will be intentionally reducing its usefulness, this is why most P2P services are free or at least support some level of free access.

Variations

There are several models that are variations of the simple donation/pay model, they give specific goals to the users or to the project in relation to the values collected.

Ransom
Put features or the code of the application up for a ransom payment, if people do contribute and fill that goal you accept to comply with your proposal (ie: opening the source code).
Pay for features
In this particular case you should be extra careful to inform users on what they are paying for, and the legality of what you are providing for that payment. Extra feature may be better services or even a better quality for the existing ones.
Paid support
Paid support include providing users access to a paid prioritized service for technical support, this is very commonly use on Open Source projects. You should restrain yourself for over complicating the software so you can profit from it, as the users will be the network. One solution is to provide a default dumb down version for public consumption and enable a very high degree of tweaking of the software, protocol or network and then attempt to profit for it.

License new technology

In case you came up with a new technology or a way new interconnect existing ones that to can be made into revenue source.

TODO

TODO
Complete & Examples

Venture Capital

TODO

TODO
add know list of VC firms that support P2P development

Level of Control

The Peer (user)

The Peer or the user running it, is the corner stone of all P2P systems, without peers you will not be able to create the Network, this seems obvious but it is very common to disregard the users needs and focus on the final objective the Network itself, kind of looking to a florets and not seeing the trees.

TODO

TODO
...personal computer...

A user oriented GUI

As you start to project your P2P application the GUI is what the users will have to interact with to use your creation, you should attempt to define not only what OS you will support but within what framework you can design the application to be used from a WEB browser or select a portable framework so you can port it to other systems.

Aside form the technical decisions the functionally you offer should also be considered, the best approach is to be consistent and offer similar options to existing implementations, other applications or even how it is normally done on the environment/OS you are using. There are several guidelines you may opt to fallow for instance Apple provides a guideline for OSX ( http://developer.apple.com/documentation/UserExperience/Conceptual/OSXHIGuidelines/ ).

Overwhelming a user with options is always a bad option and will only be enticing to highly experienced users, even if it done based on what you like you should keep in mind that you aren't creating it for your own use.

TODO

TODO
Complete

Topology

The topology of a P2P network can be very diverse, it depends on the medium it is run (Hardware), the size of the network (LAN,WAN) or even on the software/protocol that can impose or enable a specific network organization to emerge.

Below we can see the most common used topologies (there can be mixed topologies or even layered on the same network).

NetworkTopologies.png

Ring topology - Mesh topology - Star topology - Total Mesh or Full Mesh - Line topology - Tree topology - Bus topology - Hybrid topology

The resulting topology of a P2P system may depend on the protocol, the infrastructure (medium) or be the result of the interaction of peers. When performing studies of P2P networks the resulting topology (structural properties) is of primary importance, there are several papers on the optimization or characteristics of P2P networks.

The paper Effective networks for real-time distributed processing ( http://arxiv.org/abs/physics/0612134 ) by Gonzalo Travieso and Luciano da Fontoura Costa, seems to indicate that uniformly random interconnectivity scheme, is specific Erdős-Rényi (ER) random network model with fixed number of edges, as being largely more efficient than the scale-free counterpart, the Barabási-Albert(BA) scale-free model.

Bootstrap

Most P2P systems don't have (or need) a central server but need to know a entry point into the network, this is what is called bootstrapping the P2P application, to be able to connect to the network even without having a concrete idea who and what is where...

TODO

TODO
Complete... Avahi ( http://avahi.org/ )

Hybrid vs real-Peer systems

One of the main objectives of the P2P system is to make sure no single part of it critical to the collective objective. By introducing any type of centralization to a peer-to-peer Network one is creating points of failure, as some Peers will be more than others this can even lead to security or stability problems, as with the old server-client model, were a single user could crash the server and deny its use to others.

TODO

TODO
Complete

Availability

Integrity

TODO

TODO
Complete

Do to the open nature of peer-to-peer networks, most are under constant attack by people with a variety of motives. Most attacks can be defeated or controlled by careful design of the peer-to-peer network and through the use of encryption. P2P network defense is in fact closely related to the "Byzantine Generals Problem". However, almost any network will fail when the majority of the peers are trying to damage it, and many protocols may be rendered impotent by far fewer numbers.

Clustering

Computer science defines a computer cluster in general terms as a group of tightly coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer.

The components of a cluster are commonly, connected to each other through fast networks and usually deployed to improve performance and/or availability over that provided by a single computer, while typically being much more cost-effective than single computers of comparable speed or availability.

As we have seen before, this concept if applied to distributed networks or WANs (in place of LANs), generates distributed computations, grids and other systems. All of those applications are part of the P2P concept.

We loosely define clusters as a physical, social or even economical/statistical event. That is is defined by the aggregation of entities due to sharing a property in common, that property may be a shared purpose or a characteristic, or any other communality.

As we look at the topologies generated by P2P networks we can observe that most protocols generate some kind of clustering around networks structures, resources and they can even emerge as a result of the status of network conditions. Clustering is then an unsupervised learning problem, an automatic emerging event that results on the creation of ad hoc collection of unlabeled objects (data/items or events). For more information on clusters you may check A Tutorial on Clustering Algorithms ( http://home.dei.polimi.it//matteucc/Clustering/tutorial_html/ ).

A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters using the same set of characteristics.

This concept is very important on not only the form of P2P networks but has also implications on the social structure/relations that can be build upon the use of P2P applications.

TODO

TODO

Efficient Algorithms for K-Means Clustering Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu

Flashcrowds

Flashcrowd is a behavioral model in that participants will tend to aggregate/crowd around an event, in P2P terms this can be a scarce resource, for instance as we will see later BitTorrent promotes big files over small ones and new over old, this is a results of a connection to the network based on single items, the speeds on BitTorrent does depend uppermost on generating flashcrowds around files (more peers, more speed that will snow ball in more seeders).

This can also result of DoS (Denial of Service) or flood, for instance if a P2P system is poorly designed attempting to connect to the network by a significant number of peers may disturb the bootstrap method used.

You should educate users, more connections doesn't equate to more speed, at most it will result in more responses to queries but that may depend on how the protocol is structured, but enabling them will cost bandwidth. On the other side more peers will result on more resources to be shared that will also result on overlapping of shares and so better speeds, as a P2P network gets bigger the better it can provide its users.

Optimizations

There are simple optimization that should be done in any P2P Protocol that could bring a benefit to both peers and the network in general based on system metrics or profile like IP (ISP or range), content, physical location, share history, ratios, searches and many other variables.

Most of the logic/characteristics of P2P networks and topologies (in a WAN environment) will result in aggregation of peers and so this clusters will share the same properties of Distributed Behavioral Models, like Flocks, Herds and Schools this results in a easier to study environment and to establish correlation about the peers relations and extrapolate ways to improve efficiency.

  • Alignment
To get information/characteristics from the local system/environment in order to optimize the peer "location" on the network by selecting and optimize the separation e cohesion functions to improve the local neighborhood.
  • Separation
To implement a way to avoid crowding with other peers based on unwanted alignment.
  • Cohesion
To select peers based on their own alignment.

P2P Networks Traffic

The P2P traffic detected on the Internet due to the nature of the protocols and topology used some times can only be done by estimation based on the perceived use (users on-line, number of downloads of a given implementation) or by doing point checks on the networks itself. It is even possible to access the information if the implementation on the protocol or application was done with this objective in mind, several implementations of Gnutella a for example have that option and Bearshare did even reports some of the users system parameters, like type of firewall etc...

Examples of services that provide such traffic information over P2P networks are for instance Cachelogic ( http://www.cachelogic.com/research/2005_slide16.php# ).

TODO

TODO
Complete...

Communication

One of your most important considerations is how you project the way peers will communicate, even if we discard the use of central servers like SuperNodes and multiple distributed clients as Peer/Nodes there will be several questions to consider:

  • Will communication need to go across firewalls and proxy servers?
  • Is the network transmission speed important? Can it be configured by the user?
  • Will communications be synchronous or asynchronous?
  • Will it need/use only a single port? what port should we use?
  • What resources will you need to support? Is there size limits? Should compression be used ?
  • Will the data have to be encrypted?
  • etc...

One must carefully consider your project's specific goals and requirements, this will help you evaluate the use of toolkits and frameworks. Try not to reinvent the wheel if you can't came up with a better solution or have the time,capacity or disposition to. You can also use open standards (ie: use the HTTP protocol) to, but are also free to explore other approaches.

Security Considerations

By using a P2P system users will broadcast their existence to others, this in contrast to a centralized service were they may interact with others but their anonymity can be protected.

This can result in identity attacks (e.g. tracking down the users of the network and harassing or legally attacking them), DoS, Spamming, eavedropping and other threats or abuses. All this actions are generally targeted to a single user and some may even be automated, there are several actions the creator can take to make it more difficult but ultimately they can't be stopped and should be expected and dealt with, one of the first steps is to provide information to the user so they can locally implement hardware or software actions and even a social behavior to counteract this abuse.

DoS (denial of service), Spamming

Since each user is a "server" they are also prone to denial of service attacks (attacks that may, if optimized, make the network run very slowly or break completely), the result may depend on the attacker resources and how the decentralized is the P2P protocol on the other hand to be the target of spamm (e.g. sending unsolicited information across the network- not necessarily as a denial of service attack) does only depend how visible and contactable you are, if for instance other users can send messages to you. Most P2P applications support some kind of chat system and this type of abuse is very hold on such system, they can address the problem but will complete solve it, what can lead to social engineering attacks were users can be lead to perform actions that will compromise them or their system, on this last point only giving information to users that enables them to be aware of the risk will work.

Eavesdropping

Hardware traffic control

TODO

TODO

ISPs and Net Neutrality, filtering (network operators may attempt to prevent peer-to-peer network data from being carried)

Software traffic control

Since most Network applications and in specific P2P tools are prone to be a source of security problems (they will bypass some of the default security measures from inside), when using or creating such a tool one must take care on granting the possibility or configuring the system to be as safe as possible.

tools for security

There are several tools and options that can be used for this effect, be it configuring a firewall, adding a IP blocker or making sure some restrictions are turned on by default as you deploy your application.

  • PeerGuardian 2 ( http://phoenixlabs.org/pg2/ ) a OpenSource tool produced by Phoenix Labs’, consisting in a IP blocker for Windows OS that supports multiple lists, list editing, automatic updates, and blocking all of IPv4 (TCP, UDP, ICMP, etc)
  • PeerGuardian Lite ( http://phoenixlabs.org/pglite/ ) a version of the PeerGuardian 2 that is aimed at having a low system footprint.
blocklists

A blocklists is a text files containing the IP addresses of organizations opposed and actively working against file-sharing (such as the RIAA), any enterprise that mines the networks or attempts to use resources without participating in the actual sharing of files. It is basically a spam filter like the ones that exist for eMail systems.

TODO

TODO

Add examples and resources

Firewalls

As computers attempt to be more secure for the user, todays OS will provide by default some form of external communication restriction that will permit the user to define different levels of trust, this is called a Firewall. A Firewall may have hardware or software implementation and is configured to permit, deny, or proxy data through a computer network. Most recent OSs will come with a software implementation running, since a connection to the Internet are becoming common and the lack or even the default configuration of the Firewall can cause some difficulties to the use of P2P applications.

TODO

TODO
Complete

on Windows

Microsoft in the last OS releases has taken the chance to provide some security by default to it's users, by including and enabling a simple firewall solutions.

Routers

NAT

NAT (network address translation)

TODO

TODO
ICS (Internet Connection Sharing) in Windows 2000+

NAT Traversal

Users behind NAT should be able to connect with each other, there are some solutions available that try to enable it.

STUN

STUN (Simple Traversal of UDP over NATs) is a network protocol which helps many types of software and hardware receive UDP data properly through home broadband routers that use NAT ).

Quoted from its standard document, RFC 3489:

"Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs) (STUN) is a lightweight protocol that allows applications to discover the presence and types of NATs and firewalls between them and the public Internet.
It also provides the ability for applications to determine the public IP address allocated to them by the NAT.
"STUN works with many existing NATs, and does not require any special behavior from them. As a result, it allows a wide variety of applications to work through existing NAT infrastructure."

As STUN RFC states this protocol is not a cure-all for the problems associated with NAT but it is particularly helpful for getting voice over IP working through home routers. VoIP signaling protocols like SIP use UDP packets for the transfer of sound data over the Internet, but these UDP packets often have trouble getting through NATs in home routers.

STUN is a client-server protocol. A VoIP phone or software package may include a STUN client, which will send a request to a STUN server. The server then reports back to the STUN client what the public IP address of the NAT router is, and what port was opened by the NAT to allow incoming traffic back in to the network.

The response also allows the STUN client to determine what type of NAT is in use, as different types of NATs handle incoming UDP packets differently. It will work with three of four main types: full cone NAT, restricted cone NAT, and port restricted cone NAT. It will not work with symmetric NAT (also known as bi-directional NAT) which is often found in the networks of large companies.

Port Forwarding

TODO

TODO
Complete

UPnP

The Universal Plug and Play (UPnP) architecture consists in a set of open standards and technologies promulgated by the UPnP Forum ( http://www.upnp.org/ ), with the goal of extending the Plug and Play concept to support networks and peer-to-peer discovery, configuration and control and so enable that appliances, PCs, and services be able to connect transparently.

As UPnP is offered in most modern routers and network devices and also supported by Microsoft since Windows XP, it is a necessity that a P2P application should support this architecture and avoid the requirements that users deal with the necessary changes when UPnP is enabled (by default is should be disable due to security risks).

Unique ID

Universally Unique Identifiers (UUID) / Globally Unique Identifier (GUID)

For a P2P protocol/application to be able to manage user identification, authentication, build a routing protocol, identify resources etc... there is a need for (a set of) Unique Identifiers. While a true peer to peer protocol doesn't intend to establish a centralized service, it can frequently make use of an already established such service. So for instance, Usenet makes use of domain names to create globally unique identifier for articles. If the protocol refrains from even make use of such a service then uniqueness can only be based on random numbers and mathematical probabilities.

The problem of generating unique IDs can be broken down as uniqueness over space and uniqueness over time which, when combined, aim to produce a globally unique sequence. This leads to a problem detected over some P2P networks using Open Protocols/Multiple vendors implementations, due to the use of different algorithms on the generation of the GUIDs the uniqueness over space is broken leading to sporadic collisions.

UUIDs are officially and specifically defined as part of the ISO-11578 standard other specifications also exist, like RFC 4122, ITU-T Rec. X.667.

Examples of uses

  1. Usenet article IDs.
  2. In Microsoft's Component Object Model (COM) morass, an object oriented programming model that incorporates MFC (Microsoft Foundation Classes), OLE (Object Linking Embedding), ActiveX, ActiveMovie and everything else Microsoft is hawking lately, a GUID is a 16 byte or 128 bit number used to uniquely identify objects, data formats, everything.
  3. The identifiers in the windows registry.
  4. The identifiers used in used in RPC (remote procedure calls).
  5. Within ActiveMovie, there are GUID's for video formats, corresponding to the FOURCC's or Four Character Codes used in Video for Windows. These are specified in the file uuids.h in the Active Movie Software Developer Kit (SDK). ActiveMovie needs to pass around GUID's that correspond to the FOURCC for the video in an AVI file.

Security There is a know fragility on UUIDs of version 1 (time and node based), as they broadcast the node's ID.

Software implementation

Programmers needing to implement UUID could take a look on these examples:

  • OSSP uuid ( http://www.ossp.org/pkg/lib/uuid/ ) is an API for ISO C, ISO C++, Perl and PHP and a corresponding CLI for the generation of DCE 1.1, ISO/IEC 11578:1996, and RFC4122 compliant Universally Unique Identifiers (UUIDs). It supports DCE 1.1 variant UUIDs of version 1 (time and node based), version 3 (name based, MD5), version 4 (random number based), and version 5 (name based, SHA-1). UUIDs are 128-bit numbers that are intended to have a high likelihood of uniqueness over space and time and are computationally difficult to guess. They are globally unique identifiers that can be locally generated without contacting a global registration authority. It is Open Sourced under the MIT/X Consortium License.

Hashes, Cryptography and Compression

Most P2P systems will have to deal with the implementation of several algorithms for Hashing, (D)Encryption and (De)Compression, this section will try to provide some ideas of this actions in relation to the P2P subject as we will address this issues later in other sections.

Detailed information on the subject can be found on the Cryptography Wikibook ( http://en.wikibooks.org/wiki/Cryptography ).

One way of creating structured P2P networks is by maintaining a Distributed Hash Table (DHT), that will server as a distributed index of the resources on the network.

Another need for cryptography is in the protection of the integrity of the distributed resources themselves, to make them able to survive an attack most implementations of P2P some kind of Hash function (MD5, SHA1) and may even implement a Hash tree designed to detect corruption of the resource content as a hole or of the parts a user gets (for instance using Tiger Tree Hash).

Hash function

A hash function is a reproducible method of turning some kind of data into a (relatively) small number that may serve as a digital "fingerprint" of the data. The algorithm substitutes or transposes the data to create such fingerprints. The fingerprints are called hash sums, hash values, hash codes or simply hashes.

NOTE:
Note that hashes can also mean the hash functions.

A typical hash function at work

Hash sums are commonly used as indices into hash tables or hash files. Cryptographic hash functions are used for various purposes in information security applications.

Choosing a good hash function

A good hash function is essential for good hash table performance. A poor choice of a hash function is likely to lead to clustering, in which probability of keys mapping to the same hash bucket (i.e. a collision) is significantly greater than would be expected from a random function. A nonzero collision probability is inevitable in any hash implementation, but usually the number of operations required to resolve a collision scales linearly with the number of keys mapping to the same bucket, so excess collisions will degrade performance significantly. In addition, some hash functions are computationally expensive, so the amount of time (and, in some cases, memory) taken to compute the hash may be burdensome.

Simplicity and speed are readily measured objectively (by number of lines of code and CPU benchmarks, for example), but strength is a more slippery concept. Obviously, a cryptographic hash function such as SHA-1 would satisfy the relatively lax strength requirements needed for hash tables, but their slowness and complexity makes them unappealing. However, using cryptographic hash functions can protect against collision attacks when the hash table modulus and its factors can be kept secret from the attacker, or alternatively, by applying a secret salt. However, for these specialized cases, a universal hash function can be used instead of one static hash.

In the absence of a standard measure for hash function strength, the current state of the art is to employ a battery of statistical tests to measure whether the hash function can be readily distinguished from a random function. Arguably the most important test is to determine whether the hash function displays the avalanche effect, which essentially states that any single-bit change in the input key should affect on average half the bits in the output. Bret Mulvey advocates testing the strict avalanche condition in particular, which states that, for any single-bit change, each of the output bits should change with probability one-half, independent of the other bits in the key. Purely additive hash functions such as CRC fail this stronger condition miserably.

NOTE:
CRC is often used to denote either the function or the function's output. A CRC can be used in the same way as a checksum to detect accidental alteration of data during transmission or storage. CRCs are popular because they are simple to implement in binary hardware, are easy to analyze mathematically, and are particularly good at detecting common errors caused by noise in transmission channels. Historically CRCs have been given an ample used as a error detection/correction in telecommunications.

For additional information on Hashing:

Collision avoidance

TODO

TODO
Complete

Implementing a Hash algorithm

Most Hash algorithms are have an high degree of complexity and are designed for a specific target use and may not apply with the same level of guarantees in each task. The algorithms or raw descriptions are freely accessible so you can implement you own version or select to use an already existing and tested implementation.

NOTE:
Some Hash functions may be subject to export restrictions.

  • Mhash ( http://mhash.sourceforge.net/ ) is an OpenSource (under GNU Lesser GPL) C library which provides a uniform interface to a large number of hash algorithms (SHA1, SHA160, SHA192, SHA224, SHA384, SHA512, HAVAL128, HAVAL160, HAVAL192, HAVAL224, HAVAL256, RIPEMD128, RIPEMD256, RIPEMD320, MD4, MD5, TIGER, TIGER128, TIGER160, ALDER32, CRC32, CRC32b, WHIRLPOOL, GOST, SNEFRU128, SNEFRU256), for Windows support you need to use cygwin to compile. A Python interface exists.

Hash tree (Merkle trees)

In cryptography, hash trees' (also known as Merkle trees, invented in 1979 by Ralph Merkle) are an extension of the simpler concept of hash list, which in turn is an extension of the old concept of hashing. It is a hash construct that exhibits desirable properties for verifying the integrity of files and file subranges in an incremental or out-of-order fashion.

Hash trees where the underlying hash function is Tiger ( http://www.cs.technion.ac.il/~biham/Reports/Tiger/ ) are often called Tiger trees or Tiger tree hashes.

The main use of hash trees is to make sure that data blocks received from other peers in a peer-to-peer network are received undamaged and unaltered, and even to check that the other peers do not send adulterated blocks of data. This will optimize the use of the Network and permit to quickly exclude adulterated content in place of waiting for the download of the hole file to complete to check with a single hash, an partial or complete hash tree can be downloaded and the integrity of each branch can be checked immediately (since they consist in "hashed" blocks or leaves of the Hash tree), even though the whole tree/content is not available yet, making also possible for the downloading peer to upload blocks of an unfinished files.

Usually, a cryptographic hash function such as SHA-1, Whirlpool, or Tiger is used for the hashing. If the hash tree only needs to protect against unintentional damage, the much less secure checksums such as CRCs can be used.

In the top of a hash tree there is a top hash (or root hash or master hash). Before downloading a file on a P2P network, in most cases the top hash is acquired from a trusted source (a Peer or a central server that has elevated trust ratio). When the top hash is available, the hash tree can then be received from any source. The received hash tree is then checked against the trusted top hash, and if the hash tree is damaged or corrupted, another hash tree from another source will be tried until the program finds one that matches the top hash.

This requires several considerations:

  1. What is a trusted source for the root hash.
  2. A consistent implementation of the hashing algorithm (for example the size of the blocks to be transfered must be known and constant on every file transfer).
Tiger Tree Hash (TTH)

The Tiger tree hash is one of most widely used form of hash tree on P2P Networks. It uses a binary hash tree (two child nodes under each node), usually has a data block size of 1024-bytes and uses the cryptographically secure Tiger hash.

Tiger hash is used because it's fast (and the tree requires the computations of a lot of hashes), with recent implementations and architectures, TTH is as fast as SHA1, with more optimization and the use of 64-bit processors, it will become faster, even though it generates larger hash values (192 bits vs. 160 for SHA1).

Tiger tree hashes are used in the Gnutella, Gnutella2, and Direct Connect and many other P2P file sharing protocols and in file sharing in general.

A step by step introduction to the TTH is available as part of the Tree Hash Exchange (THEX) format ( http://open-content.net/specs/draft-jchapweske-thex-02.html ) page.

Hash table

In computer science, a hash table, or a hash map, is a data structure that associates keys with values. The primary operation it supports efficiently is a lookup: given a key, find the corresponding value. It works by transforming the key using a hash function into a hash, a number that is used to index into an array to locate the desired location ("bucket") where the values should be.

A small phone book as a hash table.

Hash tables support the efficient addition of new entries, and the time spent searching for the required data is independent of the number of items stored (i.e. O(1).)

In P2P system Hash tables are used locally on every client/server application to perform the routing of data or the local indexing of files, this concept is taken further as we try to use the same system in a distributed way, in that case distributed hash tables are used to solve the problem.

Distributed Hash Table (DHT)

The Distributed hash tables (DHTs) concept was made public in 2001 but very few did publicly-release robust implementations.

Protocols
  • Content addressable network (CAN)
  • Chord ( http://pdos.csail.mit.edu/chord/ ) - aims to build scalable, robust distributed systems using peer-to-peer ideas. It is completely decentralized and symmetric, and can find data using only log(N) messages, where N is the number of nodes in the system. Chord's lookup mechanism is provably robust in the face of frequent node failures and re-joins. A single research implementation is available in C but there are other implementations in C++, Java and Python.
  • Tulip
  • Tapestry
  • Pastry
    • Bamboo (http://bamboo-dht.org/) - based on Pastry, a re-engineering of the Pastry protocols written in Java and licensed under the BSD license.
TODO

TODO
Tulip seems to have a C++ implementation couldn't find info about it...

Encryption

A part of the security of any P2P Network, encryption is needed to make make sure only the "allowed" parties have access to sensitive data. Examples are the encryption of the data on a server/client setup (even on P2P) were clients could share data without fear of it being accessed on the server (a mix of this is if the Network would in itself enable a distributed cache mechanism for transfers, server-less), encryption of transfers, to prevent man-in-the-middle attacks, or monitor of data (see FreeNet) and many other applications with the intent of protecting the privacy and enable an extended level of security to Networks.

There are several algorithm that can be used to implement encryption most used by P2P project include: BlowFish.

TODO

TODO
Extend and concrete provide examples

Compression

TODO

TODO
Complete

Resources (Content, other)

TODO

TODO
...digital assets...

the Database

TODO

TODO
Complete

Indexing

TODO

TODO
...type... ...location... ...route... ...tracking... ...scalable... ...flexible...

using a DHT

Metadata

Decentralized Metadata
Free Services
  • MusicBrainz ( http://musicbrainz.org/ ) is a community music metadatabase (nonprofit service) that attempts to create a comprehensive music information site. You can use the MusicBrainz data either by browsing this web site, or you can access the data from a client program — for example, a CD player program can use MusicBrainz to identify CDs and provide information about the CD, about the artist or about related information. MusicBrainz is also supporting MusicIP's Open FingerprintTM Architecture, which identifies the sounds in an audio file, regardless of variations in the digital-file details. The community provides a REST styled XML based Web Service.
Strategies
TODO

TODO
Extend

Searching

Services

TODO

TODO
...tracking...

Seeds

Leechers

In Nature, cooperation is widespread but so too are leechers (cheats, mutants). In evolutionary terms, cheats should indeed prosper, since they don´t contribute to the collective good but simply reap the benefits of others’ cooperative efforts, but they don't. Both compete for the same goal using different strategies, cooperation is the path of less cost to all (even for leechers on the long run), cooperation provides stability and previsibility and on the other hand if cheats are not kept in some sort of equilibrium they generate a degradation of the system that can lead to its global failure.

In computer science and especially on the Internet, being a leech or leecher refers to the practice of benefiting, usually deliberately, from others' information or effort but not offering anything in return, or only token offerings in an attempt to avoid being called a leech. They are universally derided.

The name derives from the leech, an animal which sucks blood and then tries to leave unnoticed. Other terms are used, such as freeloader, but leech is the most common.

Examples

  • On peer to peer networks, a leecher shares nothing (or very little of little worth) for upload. Many applications have options for dealing with leeches, such as uploading at reduced rates to those who share nothing, or simply not allowing uploads to them at all. Many warez Internet forums have an anti-leech policy to protect the download content whereby that requires users to expend more energy or patience than most leechers are willing to before they can access the "download area".
  • Most BitTorrent sites refer to leeches as clients who are downloading a file, but can't seed it because they don't have a complete copy of it. They are by default configured to allow a certain client to download more when they upload more.
  • When on a shared network (Such as a school or office LAN), any deliberate overuse of bandwidth (To the point at which normal use of the network would be noticeably degraded) can be called leeching.
  • In online computer games (especially role-playing games), leeching refers to the practice of a player joining a group for the explicit purpose of gaining rewards without contributing anything to the efforts necessary to acquire those rewards. Sometimes this is allowed in an effort to powerlevel a player. Usually it is considered poor behavior to do this without permission from the group. In first person shooters the term used for a person that benefits by having his team mates carry him to a win.
  • Direct linking is a form of bandwidth leeching that occurs when placing an unauthorized linked object, often an image, from one site in a web page belonging to a second site (the leech). This constitutes an unauthorized use of the host site's bandwidth and content.

In some cases, leeching is used synonymously with freeloading rather than being restricted to computer contexts.

Possible solutions

TODO

TODO
Complete

Trust & Reputation
TODO

TODO
Complete

File sharing

Traditionally, file transfers involve two computers, often designated as a client and a server and most operations are for the copying files from one machine to another.

Most WEB and FTP servers are punished for being popular. Since all uploading is done from one central place, a popular site needs more resources (CPU and bandwidth) to be able to cope. With the use of P2P, the clients automatically mirror the files they download, easing the publisher's burden.

One limitation of most P2P protocols is that they don't provide a complex file-system emulation or a user-right system (permissions), so complex file operations like NFS or FTP protocols provide are very rare, this also has a reason to be, since the networks is decentralized a system for the authentication of users is hard to implement and most are easy to break.

Another concept about P2P transfers is the use of bandwidth, things will not be linear, transfers will depend on the availability of the resource, the load of the seeding peers, size of the network and the local user connection and load on its bandwidth.

As seen before, Downloading files with a restrictive copyright, license or under a given country law, may increase the risk of being sued. Some of the files available on these networks may be copyrighted or protected under the law. You must be aware that there is a risk involved.

Receiving

TODO

TODO
...authentication... ...digital signature...

from multiple sources (segmented downloading, swarm)

Multiple source download, (segmented download, swarming download), can be a more efficient way of downloading files from many peers at once. The one single file is downloaded, in parallel, from several distinct sources or uploaders of the file. This can help a group of users with asymmetric connections, such as ADSL to provide a high total bandwidth to one downloader, and to handle peaks in download demand.

This technique can not magically solve the problem, in a group of users that has insufficient upload-bandwidth, with demand higher than supply. It can however very nicely handle peaks, and it can also to some degree let uploaders upload "more often" to better utilize their connection. However, naive implementations can often result in file corruption, as there is no way of knowing if all sources are actually uploading segments of the same file. This has led to most programs using segmented downloading using some sort of checksum or hash algorithm to ensure file integrity.

Resuming

TODO

TODO
Complete

Preview while Downloading

Is may help to let users preview files before the downloading process finishes and as soon as possible, this will improve the quality of shares on the network increasing users confidence and reducing lost time and bandwidth.

Security

TODO

TODO
Complete

Poisioning and Pollution

Examples include:

  • poisoning attacks (e.g. providing files whose contents are different from the description)
  • polluting attacks (e.g. inserting "bad" chunks/packets into an otherwise valid file on the network)
  • insertion of:
    • viruses to carried data (e.g. downloaded or carried files may be infected with viruses or other malware)
    • malware in the peer-to-peer network software itself (e.g. distributed software may contain spyware)
TODO

TODO
Complete

Distributed Proxy

TODO

TODO
Complete (Tor,Squid)

VoIP

TODO

TODO
Complete

Distributed Streaming

TODO

TODO
Complete (Freecast)

Priority settings

Enabling the dynamic set of priorities on transfers will not only keep users happy but provide an easy way to boost transfers on highly sicked content increasing the speed of replication of the same on the network.

One can even go a step further and permit a by resource configuration, enabling the removal or configuration access rights to each resource, like on a file system or even enable a way to permit a market for free trading of data letting users set a specific ratio for that resource.

Bandwidth Scheduler

Managing local resources is important not only to the local user but to the global network. Managing and enabling control of the application use of bandwidth will serve as incentive for users to improve how they manage that resource (what and how it is being used) and if taken in consideration by the application as a dynamic resource it can have positive effects on the global network by reducing wasting.

Many of the actual P2P applications enable users such control of their bandwidth, but not only P2P benefits from this strategy, today with a significant part of most computers connected to the Internet, managing this scarce resource is of top most importance. One example is Microsoft's Background Intelligent Transfer Service (BITS) aimed at enabling system updates or even the MS IM service to transfer data whenever there is bandwidth which is not being used by other applications, of note is also the ability to use the BITS technology since it is exposed through Component Object Model (COM).

"New" models

Fault-Tolerant Web Sites

Many people have speculated that peer-to-peer file sharing technology could be used to improve wiki and other kinds of Internet services.

High quality video or large files distribution

The Internet infrastructure was not designed to support broadcasting. P2P partially solves this infrastructural bottleneck by switching the server or content provider from a single point to a decentralized infrastructure, that depends not on the specific network limitations but on the protocol that optimizes the distribution and its popularity.

TODO

TODO
Complete, cover “multicasting”

In February 2008 the European Union announced its commitment into a four-year project that aims to create an open source, peer-to-peer BitTorrent-like client called P2P-Next, based on an improvement of the Delft University of Technology python project Tribler. The EU will contribute 14 million euros (£10.5 million, $22 million) into this project and another 5 million euros (£3.7 million, $7.4 million) will be added by another 21 partners that includes the European Broadcasting Union, Lancaster University, BBC, Markenfilm, VTT Technical Research Center and Pioneer Digital Design Center Limited.

Real Time Video

Transmission of live events to millions of people using the actual infrastructure imposes limits on the quality of the output and high expectations on the hardware resources, not only on network resources but on the encoding and playback capabilities on each side of the transfer.

TODO

TODO
Complete, cover date streaming

Hardware

Traffic Shapers
Set Top Boxes

P2P technologies can also be used to provide a means to at low cost distribute content in an automated way.

Using a peer to peer architecture directly connected to a broadband line, a set top box (a stripped down PC of sorts), with an operating software and some storage space can for instance provide a service similar to video on demand.

VUDU

VUDU ( http://www.vudulabs.com/ ), thousands of movies delivered directly to your TV, it doesn't require a PC and is independent of your cable or satellite TV service.

TODO

TODO
TiVo, WebTV, Openwave, 2Wire, Slim Devices, OpenTV, and Danger

External Links

Personal tools
Namespaces

Variants
Actions
Navigation
FlaggedRevs
Print/export
Toolbox