dgenies.bin package¶
Submodules¶
dgenies.bin.clean_jobs module¶
- dgenies.bin.clean_jobs.parse_data_folders(app_data, gallery_jobs, now, max_age, fake=False)[source]¶
Parse data folder and remove too old jobs
- Parameters
app_data – folder where jobs are stored
gallery_jobs (list) – id of jobs which are inside the gallery
now (float) – current timestamp
max_age (dict) – remove all files & folders older than this age. Define it for each category (uploads, data, error, …)
fake (bool) – if True, just print files to delete, without delete them
- Returns
- dgenies.bin.clean_jobs.parse_database(app_data, max_age, fake=False)[source]¶
Parse database and remove too old jobs (from database and from disk)
- Parameters
app_data (str) – folder where jobs are stored
max_age (dict) – remove all files & folders older than this age. Define it for each category (uploads, data, error, …)
fake (bool) – if True, just print files to delete, without delete them
- Returns
id jobs which are in the gallery (not removed independently of their age)
- Return type
list
- dgenies.bin.clean_jobs.parse_upload_folders(upload_folder, now, max_age, fake=False)[source]¶
Parse upload folders and remove too old files and folders
- Parameters
upload_folder (str) – upload folder path
now (float) – current timestamp
max_age (dict) – remove all files & folders older than this age. Define it for each category (uploads, data, error, …)
fake (bool) – if True, just print files to delete, without delete them
dgenies.bin.filter_contigs module¶
- class dgenies.bin.filter_contigs.Filter(fasta, index_file, type_f, min_filtered=0, split=False, out_fasta=None, replace_fa=False)[source]¶
Bases:
object
Filter of a fasta file: remove too small contigs
- Parameters
fasta (str) – fasta file path
index_file (str) – index file path
type_f (str) – type of sample (query or target)
min_filtered (int) – minimum number of large contigs to allow filtering
split (bool) – are contigs split
out_fasta (str) – output fasta file path
replace_fa (bool) – if True, replace fasta file
dgenies.bin.index module¶
- class dgenies.bin.index.Index[source]¶
Bases:
object
Manage Fasta Index
- static load(index_file, merge_splits=False)[source]¶
Load index
- Parameters
index_file – index file path
merge_splits (bool) – if True, merge split contigs together
- Returns
[0] sample name
[1] contigs order
[2] contigs size
[3] reversed status for each contig
[4] absolute start position for each contig
[5] total len of the sample
- Return type
(str, list, dict, dict, dict, int)
- dgenies.bin.index.index_file(fasta_path, fasta_name, out, write_fa=None)[source]¶
Index fasta file
- Parameters
fasta_path (str) – fasta file path
fasta_name (str) – sample name
out (str) – output index file
write_fa (str) – file path of the new fasta file to write, None to don’t save fasta in a new file
- Returns
[0] True if success, else False
[1] Number of contigs
[2] Error message
- Return type
(bool, int, str)
dgenies.bin.local_scheduler module¶
- dgenies.bin.local_scheduler.get_prep_scheduled_jobs()[source]¶
Get list of jobs ready to be prepared (all data is downloaded and parsed)
- Returns
list of jobs
- Return type
list
- dgenies.bin.local_scheduler.get_preparing_jobs_cluster_nb()[source]¶
Get number of jobs in preparation step (for cluster runs)
- Returns
number of jobs
- Return type
int
- dgenies.bin.local_scheduler.get_preparing_jobs_nb()[source]¶
Get number of jobs in preparation step (for local runs)
- Returns
number of jobs
- Return type
int
- dgenies.bin.local_scheduler.get_scheduled_cluster_jobs()[source]¶
Get list of jobs ready to be started (for cluster runs)
- Returns
list of jobs
- Return type
list
- dgenies.bin.local_scheduler.get_scheduled_local_jobs()[source]¶
Get list of jobs ready to be started (for local runs)
- Returns
list of jobs
- Return type
list
- dgenies.bin.local_scheduler.move_job_to_cluster(id_job)[source]¶
Change local job to be run on the cluster
- Parameters
id_job –
- Returns
- dgenies.bin.local_scheduler.parse_args()[source]¶
Parse command line arguments and define DEBUG and LOG_FILE constants
- dgenies.bin.local_scheduler.parse_started_jobs()[source]¶
Parse all started jobs: check all is OK, change jobs status if needed. Look for died jobs
- Returns
(list of id of jobs started locally, list of id of jobs started on cluster)
- Return type
(list, list)
- dgenies.bin.local_scheduler.parse_uploads_asks()[source]¶
Parse asks for an upload: allow new uploads when other end, remove expired sessions, …
dgenies.bin.merge_splitted_chrms module¶
- class dgenies.bin.merge_splitted_chrms.Merger(paf_in, paf_out, query_in, query_out, debug=False)[source]¶
Bases:
object
Merge splitted contigs together in PAF file
- Parameters
paf_in (str) – input PAF file path
paf_out (str) – output PAF file path
query_in (str) – input query index file path
query_out (str) – output query index file path
debug (bool) – True to enable debug mode
- load_query_index(index)[source]¶
Load query index
- Parameters
index (str) – index file path
- Returns
[0] contigs length
[1] splitted contigs length
[2] sample name
- Return type
(dict, dict, str)
dgenies.bin.sort_paf module¶
dgenies.bin.split_fa module¶
- class dgenies.bin.split_fa.Splitter(input_f, name_f, output_f, size_c=10000000, query_index='query_split.idx', debug=False)[source]¶
Bases:
object
Split large contigs in smaller ones
- Parameters
input_f (str) – input fasta file path
name_f (str) – sample name
output_f (str) – output fasta file path
size_c (int) – size of split contigs
query_index (str) – index file path for query
debug (bool) – True to enable debug mode