Skip to main content

Datasets

The Gephi sample datasets below are available in various formats (GEXF, GDF, GML, NET, GraphML, DL, DOT). Feel free to add new datasets, but be sure to cite the original authors.

Supported graph formats are described here.

Gephi can open zipped files directly.

Web and Internet

  • GEXF file. EuroSiS web mapping study: Mapping interactions between Science in Society actors on the Web of 12 European countries. Original report and data can be found here.
  • GML file. Internet: a symmetrized snapshot of the structure of the Internet at the level of autonomous systems, reconstructed from BGP tables posted by the University of Oregon Route Views Project. This snapshot was created by Mark Newman on July 22, 2006 and was not previously published.

Social networks

  • GML file. Les Miserables: coappearance weighted network of characters in the novel Les Miserables. D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley, Reading, MA (1993).
  • GEXF file. Hypertext 2009 dynamic contact network: contact network during the Hypertext 2009 conference. Source: Sociopatterns.org.
  • GEXF file. CLASS OF 1880/81: friendship network of a German boys' school class from 1880/1881. It's based on the probably first ever primarily collected social network dataset, assembled by the primary school teacher Johannes Delitsch. The data was reanalyzed and compiled for the article: Heidler, R., Gamper, M., Herz, A., Eßer, F. (2014): Relationship patterns in the 19th century: The friendship network in a German boys' school class from 1880 to 1881 revisited. Social Networks 13: 1--13..
  • GML file. Zachary's karate club: social network of friendships between 34 members of a karate club at a US university in the 1970s. W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33, 452-473 (1977).
  • GML file. Coauthorships in network science: coauthorship network of scientists working on network theory and experiment, as compiled by M. Newman in May 2006. A figure depicting the largest component of this network can be found here. M. E. J. Newman, Phys. Rev. E 74, 036104 (2006).
  • GEXF file. CPAN authors: CPAN Explorer is a visualization project aiming at analyzing the relationships between the developers and the packages of the Perl language, known as the CPAN community. This snapshot was created by Linkfluence in July 2009. This file contains the network of developers, linked when they use the same Perl module. Original data can be found here.
  • GEXF file. CPAN distributions: CPAN Explorer is a visualization project aiming at analyzing the relationships between the developers and the packages of the Perl language, known as the CPAN community. This snapshot was created by Linkfluence in July 2009. This file contains the network of Perl modules dependencies. Orginal data can be found here.
  • NET file. Jazz musicians network: List of edges of the network of Jazz musicians. P.Gleiser and L. Danon , Adv. Complex Syst.6, 565 (2003).
  • DL file. Online Social Network 1899 nodes - Opsahl, T., Panzarasa, P., 2009. Clustering in weighted networks. Social Networks 31 (2), 155-163
  • GEPHI file. The Marvel Social Network Networks of super heroes, constructed by Cesc Rosselló, Ricardo Alberich, and Joe Miro from the University of the Balearic Islands. Collected by Infochimps and transformed & enhanced by Kai Chang.
  • GDF file. Comic and Hero Network Same data as above, but this includes the comics the heroes appear in.
  • GEXF file. Contact networks in a primary school, SocioPatterns team, 2011.
  • GEXF file. Historical Social Network of Chinese Buddhism 漢傳佛教歷史社會網絡 17,000+ persons, 25,000+ connections.

Biological networks

  • GEXF. Diseasome: A network of disorders and disease genes linked by known disorder–gene associations, indicating the common genetic origin of many diseases. Genes associated with similar disorders show both higher likelihood of physical interactions between their products and higher expression profiling similarity for their transcripts, supporting the existence of distinct disease-specific functional modules. The original dataset can be found here: The Human Disease Network, Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, Barabási A-L (2007), Proc Natl Acad Sci USA 104:8685-8690
  • GEXF. C. Elegans neural network: A directed, weighted network representing the neural network of C. Elegans. Data compiled by D. Watts and S. Strogatz and made available on the web here. Please cite D. J. Watts and S. H. Strogatz, Nature 393, 440-442 (1998). Original experimental data taken from J. G. White, E. Southgate, J. N. Thompson, and S. Brenner, Phil. Trans. R. Soc. London 314, 1-340 (1986).
  • GEXF. Yeast: Protein-Protein interaction network in yeast. Original data can be found here.

Infrastructure networks

  • GML. Power grid: An undirected, unweighted network representing the topology of the Western States Power Grid of the United States. Data compiled by D. Watts and S. Strogatz and made available on the web here. Please cite D. J. Watts and S. H. Strogatz, Nature 393, 440-442 (1998).
  • GRAPHML. Airlines: unknown source.
  • NET. US Air97: North American Transportation Atlas Data (NORTAD). Original data can be found here.

Other networks

  • GEXF. Java code: Source code structure of a Java program, by S.Heymann & J.Palmier, 2008.
  • GEXF. Dynamic Java code: Dynamic source code structure of a Java program by evolution of commits on the SVN, by S.Heymann & J.Bilcke, 2008.
  • GML. Word adjacencies: adjacency network of common adjectives and nouns in the novel David Copperfield by Charles Dickens. Please cite M. E. J. Newman, Phys. Rev. E 74, 036104 (2006).
  • NET. Wordnet English dictionary: unknown source.
  • DOT. Abstract mesh: 331 nodes.

Sources

Some of the above datasets are from:

Other network data repositories