Aldo-Keto Reductase (AKR) Superfamily Database

This site contains existing and potential protein sequences of the AKR protein superfamily, as well as tools to visualize aligned sequences and their conservation across species. In addition, scientists are encouraged to submit newly identified AKRs.

AKR

AKRs share similar three-dimensional structures involving a parallel β-8/α-8-barrel fold, and they function as enzymes that catalyze the reduced nicotinamide adenine dinucleotide (phosphate) (NAD(P)H)-dependent oxido-reduction of carbonyl groups. Over 190 members have been identified in species ranging from prokaryotes to plants, fungi, and animals. These proteins, which are grouped into 16 families named AKR1-AKR16, have unique structural features that influence their substrates and kinetics.

Contributors

Coding and design by Jaehyun Joo, Blanca Himes, and Trevor Penning. Full code for this Shiny app is available in GitHub.

References

Mindnich RD, Penning TM. Aldo-keto reductase (AKR) superfamily: genomics and annotation. Hum Genomics. 2009 Jul;3(4):362-70. doi: 10.1186/1479-7364-3-4-362. PMID: 19706366; PMCID: PMC3206293.

Penning TM. The aldo-keto reductases (AKRs): Overview. Chem Biol Interact. 2015 Jun 5;234:236-46. doi: 10.1016/j.cbi.2014.09.024. Epub 2014 Oct 7. PMID: 25304492; PMCID: PMC4388799.

Funding

This work was supported by the University of Pennsylvania Center of Excellence in Environmental Toxicology (P30 ES013508).

Nomenclature

The general format for AKR names is as follows: the root symbol 'AKR' for Aldo-Keto Reductase; an Arabic number designating the family; a letter indicating the subfamily when multiple subfamilies exist; and an Arabic numeral representing the unique protein sequence. Under the system, the protein AKR1A1 would be the first AKR in family 1, subfamily A, and in this instance corresponds to human aldehyde reductase.

Definition of Families. Delineation of families occurs at the 40% amino acid identity level. Members of an AKR family should have < 40% amino acid identity with any other family. At present, the fourteen families defined by our cluster analysis satisfy this criterion.

Definition of Subfamilies. Within a given family, subfamilies may be defined by a > 60% identity in amino acid sequence among subfamily members. By this definition, nine of fourteen AKR families include multiple subfamilies. For example, family AKR1 includes the following subfamilies: A) mammalian aldehyde reductases; B) mammalian aldose reductases; C) hydroxysteroid dehydrogenases (HSDs); and D) Δ4-3-ketosteroid-5β-reductase. Numbering of the known members of each subfamily was assigned in an arbitrary fashion. For example, AKR1A1, AKR1A2, and AKR1A3 are the aldehyde reductases from human, pig, and rat, respectively. Any new additions to a subfamily are numbered chronologically.

Allelic Variants versus Isoforms. Allelic variation may occur between superfamily members. We propose that proteins with > 97% amino acid sequence identity are alleles of the same gene unless: they have different enzyme activities; they are encoded by different cDNA's, usually evident by a distinct 3'-untranslated region (UTR); and they are derived from genes of different structure. While AKR1C1 [human dihydrodiol dehydrogenases 1 (DD1)], and AKR1C2 [human dihydrodiol dehydrogenases 2 (DD2)] are 98% identical in amino acid sequence and have 3'-UTRs which are 97% identical, the substrate specificity and function of these proteins are quite different. AKR1C1 is predominantly a 20α-HSD while AKR1C2 is the major bile acid binding protein in human liver. Based on these functional differences, we have assigned AKR1C1 and AKR1C2 as unique members of the AKR superfamily.

Dimeric Proteins. Currently, the AKR2, AKR6, and AKR7 families have been shown to form multimers. To expand the nomenclature to accommodate multimers we recommend that the composition and stoichiometry be listed. For example, AKR7A1:AKR7A4 (1:3) would designate a tetramer of the composition indicated.

AKR Genes. The designation for an AKR superfamily gene should be noted in italics to distinguish between the gene and the protein. For example, the gene AKR1A1 encodes the protein AKR1A1.

The above nomenclature system was adopted at the 8th International Workshop on the Enzymology and Molecular Biology of Carbonyl Metabolism. It is similar to that for the cytochrome P450 superfamily, but, unlike that system, amino acid sequences are used for comparisons. For historical reasons, the AKR1A subfamily represents the aldehyde reductases and the AKR1B subfamily represents the aldose reductase. We recommend that authors referencing members of the AKR superfamily use any previous names along with the new designation in parenthesis - for example, human aldehyde reductase (AKR1A1).

Protein Structures

AKRs are characterized by an (αβ)8-barrel structure:

The (αβ)8-barrel motif of AKRs


Loop Structure. Using the secondary structure of rat liver 3α-HSD (AKR1c9) as a template the positions of β-sheets, α-helices and the three large loops can be assigned.

Loop structures of AKRs

  • B1 (7-9), B2 (15-17), H1 (239-248) and H2 (290-298) are β-sheets and α-helices, respectively not in the core of the (α/β) 8-barrel structure.
  • Positions of α-helices are α1 (32-43), α2 (58-70), α3 (95-106), α4 (144-156), α5 (170-177), α6 (200-209), α7 (252-262), and α8 (274-280).
  • Positions of β-strands in the barrel are: β1 (20-22), β2 (48-50), β3 (80-85), β4 (111-116), β5 (160-166), β6 (188-192), β7 (212-216), and β8 (265-269).
  • Loop A is located from 117-143, Loop B is located at 217-238 and loop C is located from 299-322.
  • Residues involved in cofactor biding are: T (24), D (50), S (166), N (167), Q (190), Y (216), S (221), R (270), S (271), R (276), E (279), and N (280).
  • Residues involved in substrate binding are: T (24), L (54), F (118), F (129), T (226), W (227), N (306), and Y (310).
  • Residues involved in catalysis are: D (50), Y (55), K (84) and H (117).
  • All residues are numbered relative to 3α-HSD structure.

Cofactor Binding Site. Cofactor binding site for 3α-HSD (AKR1C9); taken from PDB 1LWI. Distances are in angstroms.

Loop structures of AKRs


Typical Catalytic Tetrad. Blue sphere indicates the position of a water molecule and the probable position of the substrate carbonyl. Taken from 3α-HSD (AKR1c9). See PDB 1LW1.

Loop structures of AKRs

This table contains all known members of the AKR Superfamily that are contained in our database.


1 Where no reference is given please refer to the accession number in the appropriate database
2 Trichosporonoides megachilieni, as known as Moniliella megachiliensis

This table lists potential AKR superfamily members. These are currently excluded from the nomenclature because either no functional data exists for the protein or the sequence is a partial cDNA or derived from a genomics project.


This table contains AKR sequences grouped according to Protein Data Bank (PDB) structures. Note that more than one AKR can map to the same PDB entry.


This dendrogram replaces the older version constructed in the GCG program and was constructed using the multialign program which enables any user to conduct their own pair-wise comparison. As a result of this enhancement, some families have shifted. However, the nomenclature of the individual AKR families, subfamilies and their members are essentially unchanged.

This page enables visualization of aligned protein sequences for various groups of AKRs that are stored in our database. To use it, select a group of interest from the interactive menu to adjust the set of sequences output. The alignment is created with MSAViewer, which provides an interactive JavaScript-based visualization of multiple sequence alignment. Options for the aligner can be set with the blue buttons, including the color scheme. Further details can be found in the MSAViewer user manual and this GitHub repository on available color schemes.


Select a set of AKR proteins to visualize:


  1. Since the proposed nomenclature system is protein-based, the newly identified AKR will require that the amino acid sequence has been obtained by either cDNA cloning or by direct methods. The protein encoded by a cDNA should have been either overexpressed or purified from its natural source. Investigators should provide GenBank, Swiss-Prot or PIR accession numbers.

  2. Upon submission of a complete protein sequence, it will be matched against the AKRs in the database and placed within the cluster analysis. When submitting sequences investigator should provide the following information:

    • Trivial name if one has been assigned
    • Species of origin
    • Expression system used
    • Substrate used to assign enzyme activity
    • Accession number
    • Status of publication
    • Citation if exists
    • Complete contact information for the submitter
    •  
  3. The location of the sequence within the superfamily cluster analysis will determine its assigned designation. As needed, new families and subfamilies will be added to the existing system.

  4. The sequence, the assigned designation, and position within the cluster analysis will be returned to the submitter, but the database will not be updated until the submission has been published. We encourage the submitter to use the new assignment in their publication. It is an investigator's responsibility to notify the web-site that the information submitted has been published and provide the appropriate citation.


Thanks, a new AKR sequence was submitted successfully!

Submit another AKR sequence
Submitting...

Error: