SpikeTutorial - page 23 of 25

23 of 25

SPIKE user’s manual

Appendix A - File Formats

Note: SPIKE's expression file format is very strict - files that do not follow it exactly may be

rejected or presented incorrectly.

SPIKE identifies genes according to their Human Entrez Gene ID. In case your microarray

experiment was done using a model animal, you should first replace the gene IDs with

the Entrez Gene IDs of their human homologues.

Files are tab-delimited text files. Each line should contain exactly the same number of fields,

separated by tabs. Therefore the files are actually table files, and we can speak of columns as

well as rows.

Expression File Format

SPIKE currently supports two kinds of experimental gene expression data - "absolute" values

(as generated by Affymetrix chips) and "relative" values (as generated by cDNA microarrays).

The first column should contain the specific id of each probe (as it is identified on the chip).

The second column should contain the Human Entrez Gene ID of the gene whose probe it is.

If no Entrez Gene ID exists, keep it empty (thereby leaving two consecutive tabs), and SPIKE

will ignore this row. There may of course be more than one line with the same Entrez Gene ID

(for different probes of the same gene). SPIKE can handle this situation (see above).

The next one or more columns should include the measurements in the various biological

conditions that were profiled in the experiment. In the "absolute" format, these are the

(normalized) intensity values; in the "relative" format, these values should be log (base 2) of

the ratio between the expression levels in the test and reference conditions.

The first row should contain a header for each column ("probeID", "GeneID", and then labels

for the various conditions; SPIKE will name the conditions according to these labels). The

other rows should contain the data for each probe.

Example (absolute format):

AffyId

Entrez-GeneId

wt0

wt30

wt120

100005_at

9618

228.80

236.05

236.50

100009_r_at

6657

150.06

83.25

108.35

100011_at

51274

80.50

81.60

71.15

Clustering File Format

Grouping of genes into clusters can be achieved by clustering algorithms or by any given

annotation (such as the GO annotation, or grouping according to cellular location) or any

partition relevant for the researcher. After assigning genes to groups, the input to SPIKE

should be of the following format.

The first column should contain a probe id (e.g., Affymetrix ID for results generated by an

algorithm which clusters genes according to their Affymetrix expression profile).

The second column should contain the Human Entrez Gene ID of the gene. If no Entrez Gene

ID exists, keep it empty (thereby leaving two consecutive tabs), and SPIKE will ignore this

row.

The third column should include the name of the cluster that the gene belongs to (can be a

string or a number).