Updated for basepair version: 1.3.59


Python bindings for Basepair's API. An outline of the contents on this page:


  • 1. Setup
    • 1.1 Installation
    • 1.2 Setup your configuation
  • 2. Examples
    • 2.1 Listing available data
    • 2.2 Creating or deleting a sample
    • 2.3 Create an analysis
    • 2.4 Download results




1. Setup


1.0 Requirements

Python version 3. We no longer actively support python 2.


1.1 Installation:

You need to first install the basepair Python package. If you don't have pip, you can get it here.


pip install basepair


1.2 Setup your configuration:


Step 1

You need to obtain your configuration file to connect to Basepair's API. To obtain the file:

  1. Go to your Basepair dashboard: https://app.basepairtech.com
  2. Click on your profile name in the top right corner.
  3. Click on "Profile".
  4. Click on "Download API config file" in the lower left corner.


When you do get your configuration file it should look like the following:


{
  "api": {
    "host": "https://app.basepairtech.com/",
    "prefix": "api/v1/",
    "username": "user@basepairtech.com",
    "api_key": ""
  },
  "aws": {
    "s3": {
      "bucket": "basepair"
    },
    "aws_id": "",
    "aws_secret": ""
  }
}


The "api_key", "aws_id", and "aws_secret" are codes unique to your account and allow you to access Basepair and your data.


Step 2

Then to make this file available to your API calls through python, you can do one of the following:


Option 1: Define an environment variable


export BP_CONFIG_FILE=/path/to/basepair.config.json


Option 2: Give it as a parameter when you create the database connection


bp = basepair.connect(json.load(open('/path/to/basepair.config.json')))



Examples



Here are some examples to get you up and running. For all the examples show below, it is assumed you have run the following first:


import basepair
import json

bp = basepair.connect(json.load(open('/path/to/basepair.config.json')))



1. Listing available data.


You can list various types of data available to you via the "bp.print_data()" command. By default it prints data in a human-readable table format, which does not include all data associated with each item, only the critical data. If you want to see all data associated with each item then set the json parameter to True.


1.1 List genomes

The following will print the genomes available as a table:


bp.print_data('genomes')

# will return a list like so:

  id  name                     date_created
----  -----------------------  --------------------------
   1  hg19                     2018-04-18T14:58:15.865993
   2  mm10                     2018-04-18T15:05:39.770488
   3  mm9                      2018-04-18T15:10:33.438603
   4  danRer10                 2018-04-18T15:15:11.281866
   5  tair10                   2018-04-18T15:17:19.582104
   6  dm6                      2018-04-18T15:23:27.298556
   7  danRer7                  2018-04-18T15:28:12.977544
   8  sacCer3                  2018-04-18T15:30:50.993198
   9  SL2.40                   2018-04-18T15:31:50.603006
  10  Xla.v91                  2018-04-18T15:36:55.491452
  11  xenTro7                  2018-04-18T15:43:42.386466
  12  taeGut2                  2018-04-18T15:44:56.550366
  13  Streptococcus_sanguinis  2018-04-18T15:45:06.695995
  14  CENPK113-7D-RP           2018-04-18T15:45:26.305311
  15  rn6                      2018-04-18T15:46:13.155202
  16  susScr11                 2018-04-18T15:48:53.443899
  17  ce11                     2018-04-18T15:49:32.644737
  18  Manis_javanica           2018-04-18T16:09:05.805615
  19  Ictalurus_punctatus      2018-04-18T16:16:12.306990
  20  mouse_GRCm38_M18         2018-10-04T15:40:16.284047
  21  Citrus_clementina.v1     2018-10-04T15:46:23.320161
  22  Nematostella_vectensis   2018-10-26T18:06:36.340901
  23  Gallus_gallus            2018-10-30T14:02:58.811138


To see the raw JSON data:

bp.genomes

# which will output something like this:

[{'date_created': '2018-04-18T14:58:15.865993',
  'id': 1,
  'name': 'hg19',
  'resource_uri': '/api/v1/genomes/1'},
 {'date_created': '2018-04-18T15:05:39.770488',
  'id': 2,
  'name': 'mm10',
  'resource_uri': '/api/v1/genomes/2'},
 {'date_created': '2018-04-18T15:10:33.438603',
  'id': 3,
  'name': 'mm9',
  'resource_uri': '/api/v1/genomes/3'},
...



1.2 List workflows

When constructing your analyses, you will also want to know what workflows are available with this command.


To see the workflows as a pretty table:

bp.print_data('workflows')


And to see the raw JSON:

bp.get_workflows()


1.3 List samples

You can list the samples you already have on your Basepair account.


To see the samples in a pretty table:

bp.print_data('samples')


And to see the raw JSON:

bp.get_samples()


1.4 Get a sample

You can also use the JSON-formatted information association with a sample. The example below gets sample with id 10000:


sample = bp.get_sample(10000)


1.4 List analyses

You get a list of all your analyses.


To see the analyses in a  pretty table:

bp.print_data('analyses')


And to see the raw JSON:

bp.get_analyses()



1.5 List analysis detail

You want to see data (such as the files) associated with a particular analysis. The analysis detail can be seen in the url. E.g. https://app.basepairtech.com/#/analyses/10000


To see the analysis detail in a pretty table:

bp.print_data('analysis', uid=[10000])


And to see the raw JSON:

bp.get_analysis(10000)




2. Creating or deleting a sample.


A Basepair sample is handled through the "BpSample" class. You can do two main things with it: create a new sample or get information associated with an existing sample.


2.1 Create a new sample


In the example below, we create a new sample named "test", which uses the "hg19" genome, and is paired-end data with two files: "1-reads_1.fq.gz" and "2-reads_2.fq.gz". After creating the sample we upload it to Basepair.


data = {
    'name': 'Sample1',
    'genome':'hg19',
    'filepaths1': [
        'Sample1.lane1.R1.fastq.gz'
        'Sample1.lane2.R1.fastq.gz'],
    'filepaths2': [
        'Sample1.lane1.R2.fastq.gz'
        'Sample1.lane2.R2.fastq.gz'],
}

sample = bp.create_sample(data=data)


The above example provided it with some bare minimum information, run "?BpSample" in a python console to find out more.


2.2 Delete a sample


You can also delete your sample, which will return "<Response [204]>" if successful:

bp.delete_sample(10000)



3. Creating or deleting an analysis

**This section is still under construction***


To create an analysis you need the workflow "id" (which you can get by running "bp.get_workflows()"). In the example below, we create an analysis to map and quantify the reads with STAR:


bp.create_analysis(workflow_id='4',sample_id='3206')


You can run an analysis with custom parameters like this:

bp.create_analysis(
    workflow_id = '5',
    sample_id = '1',
    params={
        'node':{
            'annotate':{
                'upstream':'5000',
                'downstream':'5000',
            }
        }
    }
)

The params variable is used to override default parameters in the specific analysis modules. The format for the params input is {'node': {'module_name': {'module_parameter': 'parameter_custom_value'}}}


For the pipelines that need more than 2 groups, we have to override the inputs for the specific module that uses these inputs. Like for deseq and cuffdiff with 3+ group pipelines:

bp.create_analysis(workflow_id='42', sample_ids=[5014, 5016, 5017, 5018], params={'node':{'deseq':{'group_ids':'5017,5018:5016:5014', 'group_names': 'group 1 name:group 2 name:group 3 name'}}})

bp.create_analysis(workflow_id='29', sample_ids=[5014, 5016, 5017, 5018], params={'node':{'cuffdiff':{'group_ids':'5017,5018:5016:5014', 'group_names': 'group 1 name:group 2 name:group 3 name'}}})



4 Download results


You can download the files from one or more analyses. Here are several examples:


4.1 Example 1

What this example does:

  1. Downloads files from analysis with id 10000
  2. Only get files that do not have the tag 'bam'
  3. Downloads all files to the "test" directory.


bp.download_analysis(
    10000,
    tags=[['bam']],
    tagkind='diff',
    outdir='./test/')



4.2 Example 2

What this example does:

  1. Downloads files from analysis with id 10000
  2. Only get files that contain the tag 'fastqc'
  3. Downloads all files to the "test" directory.


bp.download_analysis(
    10000,
    tags=[['fastqc']],
    tagkind='subset',
    outdir='./test/')



4.3 Example 3

What this example does:

  1. Downloads files from analysis with id 10000
  2. Only includes files that are tagged with either 'rnaseq_metrics' and 'json' or files tagged with 'fastqc' and 'zip'.
  3. Downloads all files to the "test" directory.


bp.download_analysis(
    10000,
    tags=[['rnaseq_metrics','json'],['fastqc','zip']],
    tagkind='exact',
    outdir='./test/')