SCOP/CATH Mapping FAQ

What are SCOP and CATH?

SCOP and CATH are databases that hierarchically classify protein domains according to their structure. In particular, they form superfamilies: groups of structures that are deemed to be homologous. These two databases are at the heart of the methods used by the Genome3D predictive resources.

Why Is It Useful to Map between SCOP and CATH?

There are many reasons that a SCOP/CATH mapping is useful. One of the most important for Genome3D is that it allows us to identify pairs of highly similar SCOP and CATH superfamilies and hence to find where predictions based on the two resources' superfamilies can be thought of as similar. This identification of highly similar superfamilies is automatically used on the  annotations pages to colour equivalent superfamilies identically and hence make the relationships between SCOP-based and CATH-based predictions much clearer.

Another reason it is useful to map between SCOP and CATH is to identify and understand their differences. SCOP and CATH have different but complementary approaches on such issues as what constitutes a domain. By highlighting the differences between the resources, we can better understand these differences and better explain them to the biological community.

How Has the Mapping Been Conducted?

Mapping between two such substantial resources is rather involved, however it is possible to give an overview here. The mapping has been conducted in two stages:

  • Domain mapping - This stage involves identifying and recording every pair of SCOP/CATH domains that share any residues in common. For each pair, the analysis stores the number of residues in the SCOP domain, the number in the CATH domain and the number in common between the two. All the results are stored in a database, even for those pairs with very small overlaps because this permits them to be easily discarded or not later on.
  • Superfamily/family mapping - This stage involves agglomerating the results from the domain mapping across pairs of SCOP/CATH families and superfamilies. There is a rich array of possible relationships between a pair of SCOP/CATH superfamilies so this stage involves calculating and storing a substantial number of statistics so as to characterise the relationship as effectively as possible.

Which Parts of the Mapping Can I Currently Access? Where Can I Find Them?

The current release of Genome3D exposes a key part of the mapping: those pairs of superfamilies that have been identified as consensus pairs. These are used to colour the  annotations pages and are also directly provided as tables of Bronze/Silver/Gold Consensus Superfamily Pairs in the  SCOP/CATH Mapping area of the website.

What is a Bronze/Silver/Gold Consensus Superfamily Pair?

The full details of these categories are complicated but an overview is provided here. A Bronze Standard consensus indicates a pair of SCOP and CATH superfamilies that:

  • are more similar to each other than to any other superfamily.

Such a pair may still involve substantial dissimilarities. A Silver Standard consensus indicates a pair of SCOP and CATH superfamilies that:

  • meet that Bronze Standard criterion,
  • each have at least 80% of their domains mapping to the other, without penalisation for differences in domains not yet classified and
  • each have domains that map to domains in the other over an average of at least 80% of their residues.

A Gold Standard consensus indicates a pair of SCOP and CATH superfamilies that:

  • meet that Bronze Standard criterion,
  • each have at least 80% of their domains mapping to the other, with penalisation for differences in domains not yet classified and
  • each have domains that map to domains in the other over a minimum of at least 80% of their residues.