Step 1 - RDBMS install |
- In order to use mBioSQL you have to have an installed RDBMS (Relational Database Management System),
like PostgreSQL or MySQL;
- You need RDBMS administrator privileges for the next step;
- See docs#tuning for RDBMS and kernel tuning for increased performance.
|
 |
Step 2 - mBioSQL install |
- Creation of databases (no data are loaded at this stage);
- Optional creation of a bioroot and/or biouser database users;
- Scripts are created and copied to $MBIOSQL/bin and $MBIOSQL/lib;
- Configuration file containing options for the scripts is created ($MBIOSQL/etc/modbiosql.conf);
- See docs#intall for details.
|
 |
Step 3 - Populating the database |
- In order to populate your databases you have to
run mbs_init.py with the appropriate options
(this creates the necessary tables);
- To run mbs_init.py for the UniProt database you have to obtain
*.dat, reldate.txt, keywlist.txt, dbxref.txt files from EBI;
- Use bl_load.py
to load data into BioLocal;
- Loading of data (results) into BioRes is discussed below (Step 5).
|
 |
Step 4 - Analysis I |
- Sequences for analysis can be retrieved from the RDBMS itself (arrows b) or
from flat files (arrows c);
- connectorom script is used as a bridge between EMBOSS and the database system;
see EMBOSS configuration in this section;
- To query RDBMS by ID or SQL mbs_query.py can
be used; it returns FASTA formatted sequence(s).
|
 |
Step 5 - Loading the results back into the RDBMS |
- To demonstrate the power of SQL for analysis of results, loading of two types of analyses were implemented:
(1) pattern search by fuzzpro (EMBOSS) (2) restriction mapping analysis by remap & restrict (EMBOSS);
- Use br_load.py for processes marked as arrow d and e;
- See examples directory for examples.
|
 |
Step 6 - Analysis II from BioRes |
- Use br_analy2.py or your own script for analysis;
- If you do not want to merge your results with sequence data then you store them in BioRes (arrow e), and perform the analysis II step from there (arrow g);
- Call mbs_info.py -i -T=res symbolic_db to get a list of result tables in your database;
- The br_anal2.py script figures out the type of analysis based on result table name (based on the meta data stored in the db_info table identified with that table name), but you have to provide the appropriate options for the given analysis.
|
 |
Step 7 - Analysis II from a sequence database |
- See step 6;
- We would like to emphasize that storage of result sets in the sequence database itself allows
easy merging of sequence data and results by joins;
- At this moment this type of analysis is implemented for fuzzpro results
(run br_anal2.py on a fuzzpro result table loaded into UniProt).
|
 |