SPIKE users manual
Appendix A - File Formats
Note: SPIKE's expression file format is very strict - files that do not follow it exactly may be
rejected or presented incorrectly.
experiment was done using a model animal, you should first replace the gene IDs with
the Entrez Gene IDs of their human homologues.
Files are tab-delimited text files. Each line should contain exactly the same number of fields,
separated by tabs. Therefore the files are actually table files, and we can speak of columns as
well as rows.
Expression File Format
SPIKE currently supports two kinds of experimental gene expression data - "absolute" values
(as generated by Affymetrix chips) and "relative" values (as generated by cDNA microarrays).
The first column should contain the specific id of each probe (as it is identified on the chip).
The second column should contain the Human Entrez Gene ID of the gene whose probe it is.
If no Entrez Gene ID exists, keep it empty (thereby leaving two consecutive tabs), and SPIKE
will ignore this row. There may of course be more than one line with the same Entrez Gene ID
(for different probes of the same gene). SPIKE can handle this situation (see above).
The next one or more columns should include the measurements in the various biological
conditions that were profiled in the experiment. In the "absolute" format, these are the
(normalized) intensity values; in the "relative" format, these values should be log (base 2) of
the ratio between the expression levels in the test and reference conditions.
The first row should contain a header for each column ("probeID", "GeneID", and then labels
for the various conditions; SPIKE will name the conditions according to these labels). The
other rows should contain the data for each probe.
Example (absolute format):
AffyId
Entrez-GeneId
wt0
wt30
wt120
100005_at
9618
228.80
236.05
236.50
100009_r_at
6657
150.06
83.25
108.35
100011_at
51274
80.50
81.60
71.15
Clustering File Format
Grouping of genes into clusters can be achieved by clustering algorithms or by any given
annotation (such as the GO annotation, or grouping according to cellular location) or any
partition relevant for the researcher. After assigning genes to groups, the input to SPIKE
should be of the following format.
The first column should contain a probe id (e.g., Affymetrix ID for results generated by an
algorithm which clusters genes according to their Affymetrix expression profile).
The second column should contain the Human Entrez Gene ID of the gene. If no Entrez Gene
ID exists, keep it empty (thereby leaving two consecutive tabs), and SPIKE will ignore this
row.
The third column should include the name of the cluster that the gene belongs to (can be a
string or a number).
|