EXPosition

Modeling CRISPR Effects on Gene Expression

How To Cite

Cohen & Bergman et al. A tool for CRISPR Cas-9 gRNA evaluation based on computational models of gene expression. pre-print, 2024.

Download and Installation

The EXPosition binaries and source code are provided freely for non-commercial use. Extract from contents from the .zip file and run Exposition_GUI.exe. If you're unfamiliar with how to extract .zip files, please see 7.1 How to extract .zip files. If you're having trouble with running the Linux version, please see 7.2 Linux - xcb not found. The source code is available in github (https://github.com/shaicoh3n/EXPosition) and the data for the source code (5.38 GB) is available to download here: https://zenodo.org/records/14228618. Standalone executable versions (8GB) of the tool are available for Windows and Linux in the the zenodo page.

Release Notes

Version 1.0

First full distribution (binaries, docs, website).

Quick User Guide - Single Case

EXPosition is a tool designed to evaluate the effect of sgRNAs on the target gene's expression (Cohen & Bergman et al., 2024). It can predict mutations following CRISPR use (or have manually inputted mutations regardless of their source) and estimate their effect on the transcription, splicing and translation initiation of the target gene. It can also predict whether a sgRNA will silence a target gene. There is a thorough explanation after this section, but here is a quick rundown of how to use the tool:

Quick User Guide - Single Case

Fill in the 20-nt site sequence.
Fill in the genomic position of the 1st nt of the site from its 5' end (this depends on the strand of the site).
Fill in the strand of the site.
Fill in the chromosome of the site.
Click on analyze and wait for the results!

Quick User Guide - Multiple Cases

If you want to analyze multiple gRNAs or mutations, you can follow these quick instructions (although a thorough explanation is available further down):

Depending on whether you want to analyze gRNAs or specific mutations, open a copy of "pred_muts_example.csv" or "man_muts_example.csv" respectively.
Write over the lines in the file with your data (and add new lines if you need to). The shared columns for both files are:
- 'serial' - A number from 0 to the amount of gRNAs\mutations you're interested in analyzing
- 'gene_symbol' - The symbol of the gene the gRNA is targeting or the mutation is affecting (for example - "A1BG"). The gene symbols were acquired through NCBI annotations (you can see the list of genes in the file "Data/all_genome_genes.csv").
- 'ensembl_id' - The Ensembl ID of a specific transcript you're interested in analyzing. The Ensembl IDs were acquired through https://www.ensembl.org/. You can see the list of IDs in the file "Data/all_genome_transcripts.csv".
- 'use_mut_predictor' - A parameter set to 1 if the data consists of gRNAs or 0 if the data is of mutations.
- 'site_chr' - The chromosome the gRNA\mutation is on. The format is 'Chr' followed by 1-22 or X or Y.
- 'do_transcription'/'do_splicing'/'do_initiation' - A paramter set to 1 if you're interested in predicting the effect of the gRNA/mutation on the transcription/splicing/translation initiation of the affected gene respectively. The other columns' meanings and how to fill them in can be understood from the examples in the files, but also from a more thorough explanation available here -Predicting vs Manually Inserted Mutations.
Run the tool and after the program finishes starting up and its window pops up, click on the "File" tab and on "Load Configuration" and select the file you edited - see image at the end of 5. Batch Processing.
Wait for the analysis to conclude, after which the results file will be saved in the Results folder.
In that file, Each row (i.e. gRNA\mutation) will have cells in columns titled "significant_X" where X can be each one of the sub-models and the results in those columns can be either significant or not significant - Telling you if the gRNA\mutation affected the gene in question or not for every analyzed aspect of expression.

User Guide

The following tutorial will guide you step-by-step through the analysis and design pipelines. For a thorough description of the methods and results which validate the tool, please see the original paper.

1 Models and Target Gene\Transcript Selection

2 Predicting vs Manually Inserted Mutations

2.1 Site Selection (Predicting Mutations)
2.2 Manually Inserted Mutations

3. Analyzing Sites

4. Output Files

5. Batch Processing

6. Additionl Parameters and Features

7. Troubleshooting

1. Models and Target Gene\Transcript Selection ^

EXPosition allows the user to input the target gene or transcript to be analyzed. The target gene's name can be written in the 'Gene Symbol' textbox. Once the 'Gene Symbol' textbox is partially filled, a list consisting of all genes that begin with the characters already written will be available to choose from. If the option 'No Interest Gene' is chosen (the default option) then EXPosition will analyze all genes affected by the predicted (or manually inserted) mutations. In this case, the final result for each model will be the maximum score from all the analyzed genes.

Similarly, it's possible to write the Ensembl ID of an interest transcript in the 'Ensembl ID' textbox. If partially filled, a list will with all the Ensembl IDs that begin with the characters already written in be available to choose from. If a transcript of interest is provided, the tool will only analyze it and disregard other transcripts. However, if the 'No Interest Transcript' option is chosen (the default option), then all transcripts belonging to the affected gene(s) will be analyzed i.e. either the gene of interest or for each gene affected (depending on the input in the 'Gene Symbol' textbox).

The gene symbols were acquired through NCBI annotations (you can see the list of genes in the file "Data/all_genome_genes.csv"). The Ensembl IDs were acquired through https://www.ensembl.org/ (You can see the list of IDs in the file "Data/all_genome_transcripts.csv").

Both textboxes will be highlighted in red if the input does not exist within our database. In addition, if a gene of interest and\or a transcript of interest are chosen, then the 'Site Chr.' textbox will be automatically filled in accordingly and locked for editing until the default option is chosen again.

All gene expression models - Transcription, Splicing and (Translation) Initiation - are chosen by default, but one can untick the checkbox next to a model to have the tool skip its analysis. Note that one cannot analyze the effects of mutations on translation initiation if the splicing checkbox is not checked.

Models and Target Generanscript Selection

2. Predicting vs Manually Inserted Mutations ^

The user can either predict mutations based on a target site or input the mutations manually. Below we describe each option separately. Note that if you have many mutations you wish to analyze, you should create a '.csv' file with all those mutations and analyze that file (see 5. Batch Processing).

2.1 Site Selection (Predicting Mutations) ^

EXPosition accepts a genomic site to analyze the outcomes of targeting it with CRISPR. If the 'Predict Muts' option is selected, the user can fill in the following textboxes to input the wanted site:

'Site Start Position' - The genomic position (1-based) of the 1st nucleotide of the site relative to the strand the site is located on. Note the site must be next to a PAM motif ('NGG') otherwise the tool will prompt an error.
'Site Sequence' - A 20-nucleotide sequence starting at the 5’ of the site relative to the strand of the site.
'Site Strand' - The strand (“+” or “-”) on which the site is located on i.e. the strand where PAM-sequence is.
'Site Chr.' - The chromosome where the site is located (1-22, X or Y).

Note that the 'Site Chr.' textbox might already be filled in and\or locked if an interest gene or transcript were chosen (see previous section).

The number of predicted mutations (4 by default) can be altered in the 'N Muts.' textbox. Since Lindel (Chen et al. 2019) predicts probabilities for hundreds of possible mutations, our tool analyzes the top 'N' chosen by the user.

The file "pred_muts_example.csv" in the folder has examples of sites where you can see examples of appropriate inputs.

Note that if any of these textboxes are filled incorrectly they will be colored in red.

Site Selection (Predicting Mutations)

2.2 Manually Inserted Mutations ^

Alternatively, EXPosition can also analyze manually inserted mutations, regardless of whether they're the results of using CRISPR or not. To do this, the 'Manual Muts' option should be selected; this will allow the user to fill in the mutations in the textboxes inside the 'Mutation Parameters ('+' strand)' box.

Each row is for a different type of mutation:

'INS' - Insertion
- 'Prob.' - Probability of the insertion.
- 'Inserted NTs' - The nucleotides inserted.
- 'Ins. Pos' - The positions (1-based on '+' strand) to insert the nucleotides.
'DEL' - Deletion
- 'Prob.' - Probability of the deletion.
- 'Del Start' - The position (1-based on '+' strand) of the start of the deleted sequence.
- 'Del NTs' - The nucleotides deleted.
'SUB' - Substitution
- 'Prob.' - Probability of the substitution.
- 'Del Start' - The position (1-based on '+' strand) of the start of the deleted sequence.
- 'Del NTs' - The nucleotide deleted. The logic is the same as it is for the 'Del NTs' for deletions.
- 'Inserted NTs' - The nucleotides inserted.

To analyze a single mutation, enter '1' in the probability textbox of the wanted mutation type (INS\DEL\SUB), then fill in the rest of the textboxes in that row.

Manually Inserted Mutations

Note that if you have many mutations you wish to analyze, you should create a '.csv' file with all those mutations and analyze that file (see 5. Batch Processing).

If the user wishes to analyze a case where multiple mutations occur,each with its own probability, they can be inserted as a slash-separated list. For example, if the user wishes to input two insertions: an insertion of “ACG” at position 1000 with a probability of 0.8, and an insertion of “G” at position 1005 with a probability of 0.2. The probabilities should be written in the textbox as “0.8/0.2”, the inserted nucleotides as “ACG/G” and the positions as “1000/1005”. The same applies for deletion and substitution mutations. Note that the mutation properties (i.e. probability, sequence and position) should be written in the same mutation order in all fields.

The probabilities over all types of mutations must add up to 1, otherwise the tool will show an error upon analysis.

If the deleted sequence for deletions or substitutions isn't the same as the genome, the tool will notify of this error and show the genomic sequence found in the positions given.

Please note that the information related to predicting mutations is irrelevant here and will be locked to edit unless the 'Predict Muts' option is selected.

The file "man_muts_example.csv" in the folder has examples of sites where you can see examples of appropriate inputs.

If any textbox is filled incorrectly, it will be marked in red.

Manually Inserted Mutations

3. Analyzing Sites ^

To analyze a site press the 'Analyze' button. The results of the analysis will be shown in the 'Results' read-only textbox.

If predicting the mutations, they will be shown along with their probabilities. Then, a table with the affected genes, and the mutations affecting them, will appear. The mutations are named in the following format:

[Aspect][Type][n]

“Aspect” is either “XP”, if the mutation is found to possibly affect the gene’s transcription; or “ST”, if the mutation is found to possibly affect the gene’s splicing and/or translation initiation. “Type” is the mutation type: insertion (“INS”), deletion (“DEL”) or substitution (“SUB”). “n” is the mutation number for its corresponding mutation type.

For example, 'XP_DEL_2' next to a certain gene means that the 3rd deletion had to be checked for its effects on the transcription of that gene. If a mutation doesn't appear in the table then it does not affect the gene according to our models.

Afterwards, the results for each analyzed gene are shown – either all genes affected by the mutations, or the gene/transcript of interest selected by the user. Finally, the results for the target site are shown. The results for each aspect of a gene's expression are all between 0 and 1, where 0 means the gene is predicted to be unaffected by the mutations and 1 means the gene is predicted to be very highly affected (see the original paper for additional details).

The results are saved in a file in the Results folder if the 'Save Results' checkbox is checked (it is by default). The name of the file can be written in the 'Results Name:' textbox. The file will be a '.csv' file by default unless the name of the file in the 'Results Name:' ends with '.pkl', then the file will be saved as a '.pkl' file.

Manually Inserted Mutations

4. Output Files ^

The EXPosition output files contain all the inputs to the tool, along with the outputs for each analyzed gene. In all the following column names, [model] is either transcription, splicing or trans_init. “all_int_genes_[model]_score” - This contains the results of all affected genes according to [model] “[model]score” – The final model score. “clinvar[model]max”, “clinvar[model] _mean” – the final model score’s maximum and average ClinVar rankings, respectively. The mutations and their probabilities are also available in the file.

5. Batch Processing ^

EXPosition supports batch processing of multiple sites. This is done by preparing a '.csv' file with the sites and relevant input where each row corresponds to a different site. In order to load a '.csv' file with sites, select the 'Load Configuration' option that's available upon clicking on the 'File' tab.

The columns in the '.csv' file for analysis are:

gene_symbol
ensembl_id
target_seq
site_start_pos
site_strand
site_chr
N_muts
do_transcription
do_splicing
do_initiation
N_muts
INS_prob
INS_ins_mut
INS_ins_pos
DEL_prob
DEL_del_start
DEL_del_nts
SUB_prob
SUB_ins_mut
SUB_del_start
SUB_del_nts
use_mut_predictor
increased_sensitivity
init_search_dist
final_search_dist
titer_threshold
adjacent_srch_dist
do_adj_srch

that correspond to each of the inputs in the tool. If choosing to use manually inserted mutations, the inputs to each of the relevant mutation columns should be in the same format as when inserting them in the tool (see 2.2 Manually Inserted Mutations). Two example batch input files called 'man_muts_example.csv' and 'pred_muts_example' are available in the EXPosition folder, with their results files in the Results folder.

The value in 'use_mut_predictor' in each row should be set to 1 if the tool should predict the mutations or set to 0 in the mutations are to be inputted manually.

'do_transcription', 'do_splicing' and 'do_initiation' should be set to 0 or 1 for each row (i.e site) depending on the wanted models for analysis. 'do_adj_srch' can be set to 0 or 1 depending on if this feature is requested for each row (see next section for details about this feature). 'increased_sensitivity' should be set to 0 or 1 depending on if the user wishes for increased sensitivity when classifying mutations\gRNAs as significantly affecting some aspect of expression (see next section - 6. Additional Parameters and Features)

Batch Processing

6. Additional Parameters and Features ^

In order to save a configuration set in the tool into a '.csv' file, select the 'Save Configuration' option in the 'File' tab.

It's possible to increase the sensitivity of the classification of the effect of the mutations by checking the "Increase Sensitivity" checkbox. If checked, the score of the maximum mutation is checked against the the threshold (as opposed to average over all mutations weighted by their probabilities)

In the algorithm to find alternative start codons, the parameters that control how far away from the mutation to look for a suitable start codon are available in the 'Initial Search Distance' and 'Final Search Distance' textboxes. The 'Initial Search Distance' textbox dictates the initial distance to look for a start codon around the mutation (from each side) while the 'Final Search Distance' textbox dictates the final distance to look for. A start codon is chosen as suitable if its rank amongst all TITER scores for the annotated TISs in the human transcriptome is within p% (5% by default) from the rank of the original start codon. This parameter can be changed in the 'TITER Threshold' textbox. Note that expected input should be '0.XX'. For more details regarding these parameters please see the original paper.

It's also possible to get information about what exists around the predicted\manually inserted mutations by checking the 'Adjacent Genes Info' checkbox (by default it is off). EXPosition will show all genomic objects found in the annotations of the human genome that either:

Have a mutation within them
Have the mutation adjacent to them up to a certain distance. This distance can be modified by the user in the 'Adjacent Genes Info' textbox inside the 'General Parameters' box

Additional Parameters and Features

7. Troubleshooting ^

7.1 Windows - How to extract .zip files ^

Download and install WinRAR or WinZip. After you finished installing, right-click on the file you downloaded and select "Extract files...". Then, choose where to save Exposition.

Extract

7.2 Linux - xcb not found ^

I'm running EXPosition on linux and receiving the following error: qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.

The xcb plugin is needed; you can install it for example by running: sudo apt-get install libxcb-xinerama0