[Home] [Docs/Download] [SemiBenchmarks] [Schemas] [CFTR R domain/example]

ModBioSQL SemiBenchmarks

Important notes:

My system:



Parsing; UniProt 2.0

This is only measuring the parsing and generating INSERT queries with some consistency checking.

SwissProtTrEMBL
Size of the flat-file:575M2379M
Parsing and preparing for inserts:10 min52 min
 

Parsing and loading; UniProt 2.0

flat-file: N/A ( 0 min)N/A ( 0 min)
ModBioSQL; MySQLdb:22 min 99 min
ModBioSQL; PyPgSQL:22 min 92 min
 

Creating indexes; UniProt 2.0

flat-file: 27 min
ModBioSQL; MySQLdb:46 min
ModBioSQL; PyPgSQL:31 min
 

Creating constrains; UniProt 2.0

ModBioSQL; MySQLdb:100 min
ModBioSQL; PyPgSQL: 68 min
 

Total time

BioSQL; SwissProt; MySQLdb: 7h 40 min
BioSQL; SwissProt; psycopg: 7h 23 min
ModBioSQL; SwissProt; MySQLdb:48 min
ModBioSQL; SwissProt; psycopg:41 min
ModBioSQL; UniProt; MySQLdb:4h 24 min
ModBioSQL; UniProt; PyPgSQL:3h 33 min
 

Size on disk (approx.)

flat-file; UniProt: 3,225M
766M
3,991M
  dat files
  indexes
  total
BioSQL; SwissProt;
+indexes; PgSQL:
1.12G
ModBioSQL; SwissProt;
+indexes; PgSQL:
706M
ModBioSQL; UniProt; PgSQL: 3,041M
780M
3,812M
  data
  indexes
  total
 


Reading (100,000) random sequences

flat-file: 1,111,1111 seqs/s
BioSQL; SwissProt;
psycopg:
7,142 seqs/s
ModBioSQL; SwissProt;
psycopg:
14,556 seqs/s
ModBioSQL; SwissProt;
PyPgSQL:
3,600 seqs/s
ModBioSQL; UniProt;
MySQLdb:
5321 seqs/s
ModBioSQL; UniProt;
PyPgSQL:
2832 seqs/s