Updated for basepair version: 2.0.0
Python bindings for Basepair's API. An outline of the contents on this page:
- Listing available data
- Creating or deleting a sample
- Create an analysis
- Download result
After starting a python3 interactive session run the following:
import basepair import json bp = basepair.connect(json.load(open('/path/to/basepair.config.json')))
1. Listing available data.
You can list various types of data available to you via the "bp.print_data()" command. By default it prints data in a human-readable table format, which does not include all data associated with each item, only the critical data. If you want to see all data associated with each item then set the json parameter to True.
1.1 List genomes
The following will print the genomes available as a table:
bp.print_data('genomes') # will return a list like so: id name date_created ---- ----------------------- -------------------------- 1 hg19 2018-04-18T14:58:15.865993 2 mm10 2018-04-18T15:05:39.770488 3 mm9 2018-04-18T15:10:33.438603 4 danRer10 2018-04-18T15:15:11.281866 5 tair10 2018-04-18T15:17:19.582104 6 dm6 2018-04-18T15:23:27.298556 7 danRer7 2018-04-18T15:28:12.977544 8 sacCer3 2018-04-18T15:30:50.993198 9 SL2.40 2018-04-18T15:31:50.603006 10 Xla.v91 2018-04-18T15:36:55.491452 11 xenTro7 2018-04-18T15:43:42.386466 12 taeGut2 2018-04-18T15:44:56.550366 13 Streptococcus_sanguinis 2018-04-18T15:45:06.695995 14 CENPK113-7D-RP 2018-04-18T15:45:26.305311 15 rn6 2018-04-18T15:46:13.155202 16 susScr11 2018-04-18T15:48:53.443899 17 ce11 2018-04-18T15:49:32.644737 18 Manis_javanica 2018-04-18T16:09:05.805615 19 Ictalurus_punctatus 2018-04-18T16:16:12.306990 20 mouse_GRCm38_M18 2018-10-04T15:40:16.284047 21 Citrus_clementina.v1 2018-10-04T15:46:23.320161 22 Nematostella_vectensis 2018-10-26T18:06:36.340901 23 Gallus_gallus 2018-10-30T14:02:58.811138
To see the raw JSON data:
bp.genomes # which will output something like this: [{'date_created': '2018-04-18T14:58:15.865993', 'id': 1, 'name': 'hg19', 'resource_uri': '/api/v1/genomes/1'}, {'date_created': '2018-04-18T15:05:39.770488', 'id': 2, 'name': 'mm10', 'resource_uri': '/api/v1/genomes/2'}, {'date_created': '2018-04-18T15:10:33.438603', 'id': 3, 'name': 'mm9', 'resource_uri': '/api/v1/genomes/3'}, ...
1.2 List workflows
When constructing your analyses, you will also want to know what workflows are available with this command.
To see the workflows as a pretty table:
bp.print_data('workflows')
And to see the raw JSON:
bp.get_workflows()
1.3 List samples
You can list the samples you already have on your Basepair account.
To see the samples in a pretty table:
bp.print_data('samples')
And to see the raw JSON:
bp.get_samples()
1.4 Get a sample
You can also use the JSON-formatted information association with a sample. The example below gets sample with id 10000:
sample = bp.get_sample(10000)
1.4 List analyses
You get a list of all your analyses.
To see the analyses in a pretty table:
bp.print_data('analyses')
And to see the raw JSON:
bp.get_analyses()
1.5 List analysis detail
You want to see data (such as the files) associated with a particular analysis. The analysis detail can be seen in the url. E.g. https://app.basepairtech.com/#/analyses/10000
To see the analysis detail in a pretty table:
bp.print_data('analysis', uid=[10000])
And to see the raw JSON:
bp.get_analysis(10000)
2. Creating or deleting a sample.
A Basepair sample is handled through the "BpSample" class. You can do two main things with it: create a new sample or get information associated with an existing sample.
2.1 Create a new sample
In the example below, we create a new sample named "Sample1". This sample contains 'RNA-seq' data, sequenced on the 'Illumina' platform and used the "hg19" genomes as a reference during the pipeline run. It is paired-end data with two files: "1-reads_1.fq.gz" and "2-reads_2.fq.gz". Optionally, you can also provide a project ID as 'projects': '123' to specify which project you want these samples to be in.
Note: The data type (such as 'RNA-seq') and either 'filepaths1' (if single-ended data) or both 'filepaths1' and 'filepaths2' (if paired-end data) are mandatory fields. The rest are optional, so you can skip them if not needed.
data = { 'name': 'Sample1', 'genome':'hg19', 'datatype': 'rna-seq', 'platform': 'illumina', 'filepaths1': [ 'Sample1.lane1.R1.fastq.gz' 'Sample1.lane2.R1.fastq.gz'], 'filepaths2': [ 'Sample1.lane1.R2.fastq.gz' 'Sample1.lane2.R2.fastq.gz'], } sample = bp.create_sample(data=data)
The above example provided it with some bare minimum information, run "?BpSample" in a python console to find out more.
2.2 Delete a sample
You can also delete your sample, which will return "<Response [204]>" if successful:
bp.delete_sample(10000)
3. Creating or deleting an analysis
**This section is still under construction***
To create an analysis you need the workflow "id" (which you can get by running "bp.get_workflows()"). In the example below, we create an analysis to map and quantify the reads with STAR:
bp.create_analysis(workflow_id='4',sample_id='3206')
You can run an analysis with custom parameters like this:
bp.create_analysis( workflow_id = '5', sample_id = '1', params={ 'node':{ 'annotate':{ 'upstream':'5000', 'downstream':'5000', } } } )
The params variable is used to override default parameters in the specific analysis modules. The format for the params input is {'node': {'module_name': {'module_parameter': 'parameter_custom_value'}}}
For the pipelines that need more than 2 groups, we have to override the inputs for the specific module that uses these inputs. Like for deseq and cuffdiff with 3+ group pipelines:
bp.create_analysis(workflow_id='42', sample_ids=[5014, 5016, 5017, 5018], params={'node':{'deseq':{'group_ids':'5017,5018:5016:5014', 'group_names': 'group 1 name:group 2 name:group 3 name'}}}) bp.create_analysis(workflow_id='29', sample_ids=[5014, 5016, 5017, 5018], params={'node':{'cuffdiff':{'group_ids':'5017,5018:5016:5014', 'group_names': 'group 1 name:group 2 name:group 3 name'}}})
4. Download results
You can download the files from one or more analyses. Here are several examples:
4.1 Example 1
What this example does:
- Downloads files from analysis with id 10000
- Only get files that do not have the tag 'bam'
- Downloads all files to the "test" directory.
bp.download_analysis( 10000, tags=[['bam']], tagkind='diff', outdir='./test/')
4.2 Example 2
What this example does:
- Downloads files from analysis with id 10000
- Only get files that contain the tag 'fastqc'
- Downloads all files to the "test" directory.
bp.download_analysis( 10000, tags=[['fastqc']], tagkind='subset', outdir='./test/')
4.3 Example 3
What this example does:
- Downloads files from analysis with id 10000
- Only includes files that are tagged with either 'rnaseq_metrics' and 'json' or files tagged with 'fastqc' and 'zip'.
- Downloads all files to the "test" directory.
bp.download_analysis( 10000, tags=[['rnaseq_metrics','json'],['fastqc','zip']], tagkind='exact', outdir='./test/')