Quick Start Guide
Please see the introductory User Guide which presents a "hands on" introduction to bio3d and is intended to familiarize the user with a few essentials important for getting off to a quick start.
Basics
Bio3d is an R package containing utilities to process, organize and explore structure and sequence data. Features include the ability to read and write structure, sequence and dynamic trajectory data, perform atom selection, re-orientation, superposition, rigid core identification, clustering, distance matrix analysis, conservation analysis and principal component analysis. Bio3d takes advantage of the extensive graphical and statistical capabilities of the R environment (Ihaka and Gentleman, 1996) and thus represents a useful framework for exploratory analysis of structural data.
In the examples below we assume that you have successfully installed R and the Bio3D package, by following the installation instructions on the Download page.
Loading bio3d
Start R (type R at the comand prompt or, on Windows, double click on the R icon) and load the bio3d package by typing library(bio3d) at the R console prompt
Use the command lbio3d() to list the functions within the package
Note. in order to be executed, a function always needs to be written with parentheses, even if there is nothing within them (e.g., lbio3d()). If one just types the name of a function without parentheses, R will display the contents of the function (i.e. the code of the function).
Finding Help
To get help on a particular function try ?function or help(function). For example
To search the help system for documentation matching a particular word or topic use the comand help.search("topic")
Typing help.start() will start a local HTML interface. After initiating 'help.start()' in a session the '?function' commands will open as HTML pages
To execute examples for a particular function use the comand example(function). For run examples for the read dcd function try:
Bailing out
R is generally very tolerent, and can be interrupted by Ctrl-C (i.e. hold down the key marked Control and hit the C key). This will interrupt the current operation and return to the R prompt.
To quit R, type q()
Function Overview
The bio3d package consists of input/output functions, conversion and manipulation functions, analysis functions, and graphics functions which are summarized below with links to further documentation.
IO
Read and Write Common Data Types
read.fasta Read aligned or un-aligned sequences from a file in FASTA format
read.pdb Read structure data from Protein Data Bank (PDB) files
read.fasta.pdb Read aligned PDB structures
read.dcd Read coordinate data from CHARMM/X-PLOR/NAMD binary DCD files
write.fasta Write aligned or un-aligned sequences to a FASTA format file
write.pdb Write a PDB format coordinate file
mktrj.pca Make a trjactory of atomic displacments along a given principal component
Analysis
Do Interesting Things
pca.xyz Performs principal components analysis (PCA) on a set of conformers
dm Construct a distance matrix for a given protein structure
rmsd Calculate the root mean square deviation (RMSD) between coordinate sets
core.find Identify the most invarient region in an aligned set of protein structures
entropy Calculate the sequence entropy score per position in an alignment
consensus Determines the consensus sequence for a given alignment at a given identity cutoff value
Graphics
Plotting and Graphic Display
plot.dmat Plot a distance or difference distance matrix as returned from the 'dm' function
plot.pca Produces a z-score plot (conformer plot) and an eigen spectrum plot (scree plot)
plot.pca.loadings
plot.pca.score
plot.pca.scree
Convert
Convert and Manipulate Data
aa321 Convert between three-letter PDB style aminoacid codes and one-letter IUPAC aminoacid codes
aa123 Convert between one-letter IUPAC aminoacid codes and three-letter PDB style aminoacid codes
orient.pdb
fit.xyz Coordinate superposition of multiple conformers
rot.lsq Coordinate superposition with the Kabsch algorithm
Utilities
Useful Odds and Ends
pdb.summary some text
atom.select some text
gap.inspect some text
pairwise some text
bwr.colors some text
mono.colors some text
bounds some text
Data Input
Reading Sequence data
Use the read.fasta() function
aln <- read.fasta("eg.fa")
The 'aln' object is a list with $id and $ali components try the cmd 'attributes(aln)'
Reading trajectory data
Start R and load the bio3d package as described above.
Then use the read.dcd() function to read your trajectory
Note. You can use VMD to write a dcd format trajectory from multi-model PDBs and other common trajectory formats.
| trj<-read.dcd("prot.dcd") |
This will create a trajectory matrix called 'trj' from the file 'prot.dcd' in the current directory.
To find the number of structures in your 'trj' data structure try:
and to find the number of atoms use
Note that each row corresponds to a frame from your trajectory and each coloum coresponds to a an atom coordinate. DCD files contain no atom type information but this can be convinently obtained from a coresponding PDB file.
Reading PDB data
Start R and load the bio3d package as described above.
Then use the read.pdb() function
| pdb.1<-read.pdb("complex_cgbd_3000-5000.pdb") |
| pdb.2<-read.pdb("1HPV.pdb") |
To see whats in 'pdb.1' type
To see ATOM records
To see coords as a numeric vector
How do I tidy up my PDB file
Read your PDB
| pdb.1<-read.pdb("complex_cgbd_3000-5000.pdb") |
Change residue names to upper case
| pdb.1$atom[,"resid"]<-toupper(pdb.1$atom[,"resid"]) |
Write out a new PDB
| write.pdb(pdb.1, file="complex_cgbd_3000-5000_tidy.pdb") |
Or write with no chain id
| write.pdb(pdb.1, file="complex_cgbd_3000-5000_tidy.pdb",chain=rep(" ",nrow(pdb.1$atom)) |
How to extract the sequence from my PDB
to get sequence from 'pdb.1' (rember to change case to Upper)
| seq.1 <- aa321(toupper(pdb$atom[pdb$calpha,"resid"])) |
| seq.2 <- aa321(pdb.2$atom[pdb.2$calpha,"resid"]) |
Or take the sequence from the SEQRES records
| seq.3 <- aa321(pdb$seqres) |
To write a sequence file you need an 'laignment' style object which is nothing more than a list object with an "id" and "ali" components
| write.fasta(list( id=c("complex_cgbd_3000-5000_tidy"),ali=seq.1 ), file="eg.fa") |
| write.fasta(list( id=c("1HPV"),ali=seq.2 ), file="eg.fa", append=TRUE) |
Reading multiple PDBs
If they are related, and you want to keep track for their correspondances, use the function read.fasta.pdb() and input a sequence alignment of the PDBs
| aln <- read.fasta("eg.fa") |
| pdbs<-read.fasta.pdb(aln) |
Reading data from other sources
scan(), readLines(), read.fwf(), read.table()
Import of specific file lines with grep regular expressions
The following example demonstrates the retrieval of specific lines from an external file with a regular expression. First, an external file is created with the 'cat' function, all lines of this file are imported into a vector with 'readLines', the specific elements (lines) are then retieved with the 'grep' function, and the resulting lines are split into vector fields with 'strsplit'.
cat(month.name, file="zzz.txt", sep="n")
x <- readLines("zzz.txt")
x <- x[c(grep("^J", as.character(x), perl = TRUE))]
t(as.data.frame(strsplit(x,"u")))
Batch import and export of many files
In the following example the *.txt file names in the current directory are first assigned to a list (the '$' sign is used to anchor the match to the end of a string). Second, the files are imported one-by-one using a for loop where the original names are assigned to the generated data frames with the 'assign' function. Read ?read.table to understand arguments 'row.names=1' and 'comment.char = "A"'. Third, the data frames are exported using their names for file naming and appending '*.out'.
files <- list.files(pattern=".txt$")
for(i in files) {
x <- read.table(i, header=TRUE, comment.char = "A", sep="t")
assign(i, x)
print(i)
write.table(x, paste(i, c(".out"), sep=""), quote=FALSE, sep="t", col.names = NA)
}
Data output
writing PDBs
writing sequences
writing trajectorys
writiing matrics and tables
saving your work
Worked examples
Sequence conservation
Tidying PDBs
Fitting
Rigid core identification
RMSD
PCA
Distance Matrices
Writing your own functions
One of the main attractions of using the R environment is the ease with which users can write their own programs and functions. The R programming syntax is extremely easy to learn, even for users with no previous programming experience. Once the basic R programming control structures are understood, users can use the language as a powerful calculation machine to perform complex analyses of almost any type of quantitative data.
simple example function
Select some residue etc.
inds<-atom.select(pdb.1,string='////XK2///')
pdb.1$atominds$atom,
pdb.1$xyzinds$xyz
xyz<-fit.xyz(fixed=pdb.1$xyz-(inds$xyz), mobile=trj)
Fit on some residues
##- fit on selected positions 23 to 31 and 84 to 87 (see core.find)
inds.core <- atom.select(pdb.1,"///23:31,84:87///CA/")
xyz.c<-fit.xyz(fixed=pdb.1$xyz-(inds$xyz), mobile=trj, fixed.inds=inds.core$xyz, mobile.inds=inds.core$xyz)
##- Write out fitted trj
write.pdb(xyz=xyz.c, resid=pdb.1$atom[-inds$atom,"resid"], resno=pdb.1$atom[-inds$atom,"resno"], file="fit_core.pdb", verbose=TRUE)
#write.pdb(xyz=xyz.c, file="fit_core.pdb", verbose=TRUE)
r.trj<-rmsd(pdb.1$xyz-gaps,xyz.c)
How I do PCA on my trajectory
Read your trajectory and make sure all frames are fitted optimally
Then use the function pca.xyz() on the supperposed coordinates
How do I do PCA on X-ray structures
Read your aligned PDBs as described above
Fit
Exclude gap positions
Do PCA
Comments (0)
You don't have permission to comment on this page.