Python API

Modified on Tue, 2 Jul, 2024 at 3:51 AM

Updated for basepair version: 2.0.0

Python bindings for Basepair's API. An outline of the contents on this page:

Listing available data
Creating or deleting a sample
Create an analysis
Download result

After starting a python3 interactive session run the following:

import basepair
import json

bp = basepair.connect(json.load(open('/path/to/basepair.config.json')))

1. Listing available data.

You can list various types of data available to you via the "bp.print_data()" command. By default it prints data in a human-readable table format, which does not include all data associated with each item, only the critical data. If you want to see all data associated with each item then set the json parameter to True.

1.1 List genomes

The following will print the genomes available as a table:

bp.print_data('genomes')

# will return a list like so:

  id  name                     date_created
----  -----------------------  --------------------------
   1  hg19                     2018-04-18T14:58:15.865993
   2  mm10                     2018-04-18T15:05:39.770488
   3  mm9                      2018-04-18T15:10:33.438603
   4  danRer10                 2018-04-18T15:15:11.281866
   5  tair10                   2018-04-18T15:17:19.582104
   6  dm6                      2018-04-18T15:23:27.298556
   7  danRer7                  2018-04-18T15:28:12.977544
   8  sacCer3                  2018-04-18T15:30:50.993198
   9  SL2.40                   2018-04-18T15:31:50.603006
  10  Xla.v91                  2018-04-18T15:36:55.491452
  11  xenTro7                  2018-04-18T15:43:42.386466
  12  taeGut2                  2018-04-18T15:44:56.550366
  13  Streptococcus_sanguinis  2018-04-18T15:45:06.695995
  14  CENPK113-7D-RP           2018-04-18T15:45:26.305311
  15  rn6                      2018-04-18T15:46:13.155202
  16  susScr11                 2018-04-18T15:48:53.443899
  17  ce11                     2018-04-18T15:49:32.644737
  18  Manis_javanica           2018-04-18T16:09:05.805615
  19  Ictalurus_punctatus      2018-04-18T16:16:12.306990
  20  mouse_GRCm38_M18         2018-10-04T15:40:16.284047
  21  Citrus_clementina.v1     2018-10-04T15:46:23.320161
  22  Nematostella_vectensis   2018-10-26T18:06:36.340901
  23  Gallus_gallus            2018-10-30T14:02:58.811138

To see the raw JSON data:

bp.genomes

# which will output something like this:

[{'date_created': '2018-04-18T14:58:15.865993',
  'id': 1,
  'name': 'hg19',
  'resource_uri': '/api/v1/genomes/1'},
 {'date_created': '2018-04-18T15:05:39.770488',
  'id': 2,
  'name': 'mm10',
  'resource_uri': '/api/v1/genomes/2'},
 {'date_created': '2018-04-18T15:10:33.438603',
  'id': 3,
  'name': 'mm9',
  'resource_uri': '/api/v1/genomes/3'},
...

1.2 List workflows

When constructing your analyses, you will also want to know what workflows are available with this command.

To see the workflows as a pretty table:

bp.print_data('workflows')

And to see the raw JSON:

bp.get_workflows()

1.3 List samples

You can list the samples you already have on your Basepair account.

To see the samples in a pretty table:

bp.print_data('samples')

And to see the raw JSON:

bp.get_samples()

1.4 Get a sample

You can also use the JSON-formatted information association with a sample. The example below gets sample with id 10000:

sample = bp.get_sample(10000)

1.4 List analyses

You get a list of all your analyses.

To see the analyses in a pretty table:

bp.print_data('analyses')

And to see the raw JSON:

bp.get_analyses()

1.5 List analysis detail

You want to see data (such as the files) associated with a particular analysis. The analysis detail can be seen in the url. E.g. https://app.basepairtech.com/#/analyses/10000

To see the analysis detail in a pretty table:

bp.print_data('analysis', uid=[10000])

And to see the raw JSON:

bp.get_analysis(10000)

2. Creating or deleting a sample.

A Basepair sample is handled through the "BpSample" class. You can do two main things with it: create a new sample or get information associated with an existing sample.

2.1 Create a new sample

In the example below, we create a new sample named "Sample1". This sample contains 'RNA-seq' data, sequenced on the 'Illumina' platform and used the "hg19" genomes as a reference during the pipeline run. It is paired-end data with two files: "1-reads_1.fq.gz" and "2-reads_2.fq.gz". Optionally, you can also provide a project ID as 'projects': '123' to specify which project you want these samples to be in.

Note: The data type (such as 'RNA-seq') and either 'filepaths1' (if single-ended data) or both 'filepaths1' and 'filepaths2' (if paired-end data) are mandatory fields. The rest are optional, so you can skip them if not needed.

data = {
    'name': 'Sample1',
    'genome':'hg19',
    'datatype': 'rna-seq',
    'platform': 'illumina',
    'filepaths1': [
        'Sample1.lane1.R1.fastq.gz'
        'Sample1.lane2.R1.fastq.gz'],
    'filepaths2': [
        'Sample1.lane1.R2.fastq.gz'
        'Sample1.lane2.R2.fastq.gz'],
}

sample = bp.create_sample(data=data)

The above example provided it with some bare minimum information, run "?BpSample" in a python console to find out more.

2.2 Delete a sample

You can also delete your sample, which will return "<Response [204]>" if successful:

bp.delete_sample(10000)

3. Creating or deleting an analysis

**This section is still under construction***

To create an analysis you need the workflow "id" (which you can get by running "bp.get_workflows()"). In the example below, we create an analysis to map and quantify the reads with STAR:

bp.create_analysis(workflow_id='4',sample_id='3206')

You can run an analysis with custom parameters like this:

bp.create_analysis(
    workflow_id = '5',
    sample_id = '1',
    params={
        'node':{
            'annotate':{
                'upstream':'5000',
                'downstream':'5000',
            }
        }
    }
)

The params variable is used to override default parameters in the specific analysis modules. The format for the params input is {'node': {'module_name': {'module_parameter': 'parameter_custom_value'}}}

For the pipelines that need more than 2 groups, we have to override the inputs for the specific module that uses these inputs. Like for deseq and cuffdiff with 3+ group pipelines:

bp.create_analysis(workflow_id='42', sample_ids=[5014, 5016, 5017, 5018], params={'node':{'deseq':{'group_ids':'5017,5018:5016:5014', 'group_names': 'group 1 name:group 2 name:group 3 name'}}})

bp.create_analysis(workflow_id='29', sample_ids=[5014, 5016, 5017, 5018], params={'node':{'cuffdiff':{'group_ids':'5017,5018:5016:5014', 'group_names': 'group 1 name:group 2 name:group 3 name'}}})

4. Download results

You can download the files from one or more analyses. Here are several examples:

4.1 Example 1

What this example does:

Downloads files from analysis with id 10000
Only get files that do not have the tag 'bam'
Downloads all files to the "test" directory.

bp.download_analysis(
    10000,
    tags=[['bam']],
    tagkind='diff',
    outdir='./test/')

4.2 Example 2

What this example does:

Downloads files from analysis with id 10000
Only get files that contain the tag 'fastqc'
Downloads all files to the "test" directory.

bp.download_analysis(
    10000,
    tags=[['fastqc']],
    tagkind='subset',
    outdir='./test/')

4.3 Example 3

What this example does:

Downloads files from analysis with id 10000
Only includes files that are tagged with either 'rnaseq_metrics' and 'json' or files tagged with 'fastqc' and 'zip'.
Downloads all files to the "test" directory.

bp.download_analysis(
    10000,
    tags=[['rnaseq_metrics','json'],['fastqc','zip']],
    tagkind='exact',
    outdir='./test/')