dgenies.bin package¶
Submodules¶
dgenies.bin.clean_jobs module¶
-
dgenies.bin.clean_jobs.
parse_data_folders
(app_data, gallery_jobs, now, max_age, fake=False)[source]¶ Parse data folder and remove too old jobs
Parameters: - app_data – folder where jobs are stored
- gallery_jobs (list) – id of jobs which are inside the gallery
- now (float) – current timestamp
- max_age (dict) – remove all files & folders older than this age. Define it for each category (uploads, data, error, …)
- fake (bool) – if True, just print files to delete, without delete them
Returns:
-
dgenies.bin.clean_jobs.
parse_database
(app_data, max_age, fake=False)[source]¶ Parse database and remove too old jobs (from database and from disk)
Parameters: - app_data (str) – folder where jobs are stored
- max_age (dict) – remove all files & folders older than this age. Define it for each category (uploads, data, error, …)
- fake (bool) – if True, just print files to delete, without delete them
Returns: id jobs which are in the gallery (not removed independently of their age)
Return type: list
-
dgenies.bin.clean_jobs.
parse_upload_folders
(upload_folder, now, max_age, fake=False)[source]¶ Parse upload folders and remove too old files and folders
Parameters: - upload_folder (str) – upload folder path
- now (float) – current timestamp
- max_age (dict) – remove all files & folders older than this age. Define it for each category (uploads, data, error, …)
- fake (bool) – if True, just print files to delete, without delete them
dgenies.bin.filter_contigs module¶
-
class
dgenies.bin.filter_contigs.
Filter
(fasta, index_file, type_f, min_filtered=0, split=False, out_fasta=None, replace_fa=False)[source]¶ Bases:
object
Filter of a fasta file: remove too small contigs
Parameters: - fasta (str) – fasta file path
- index_file (str) – index file path
- type_f (str) – type of sample (query or target)
- min_filtered (int) – minimum number of large contigs to allow filtering
- split (bool) – are contigs split
- out_fasta (str) – output fasta file path
- replace_fa (bool) – if True, replace fasta file
-
_check_filter
()[source]¶ Load index of fasta file, and determine contigs which must be removed. Remove them only in the index
Returns: list of contigs which must be removed Return type: list
dgenies.bin.index module¶
-
class
dgenies.bin.index.
Index
[source]¶ Bases:
object
Manage Fasta Index
-
static
load
(index_file, merge_splits=False)[source]¶ Load index
Parameters: - index_file – index file path
- merge_splits (bool) – if True, merge split contigs together
Returns: - [0] sample name
- [1] contigs order
- [2] contigs size
- [3] reversed status for each contig
- [4] absolute start position for each contig
- [5] total len of the sample
Return type: (str, list, dict, dict, dict, int)
-
static
-
dgenies.bin.index.
index_file
(fasta_path, fasta_name, out, write_fa=None)[source]¶ Index fasta file
Parameters: - fasta_path (str) – fasta file path
- fasta_name (str) – sample name
- out (str) – output index file
- write_fa (str) – file path of the new fasta file to write, None to don’t save fasta in a new file
Returns: - [0] True if success, else False
- [1] Number of contigs
- [2] Error message
Return type: (bool, int, str)
dgenies.bin.local_scheduler module¶
-
dgenies.bin.local_scheduler.
_printer
(*messages)[source]¶ print messages to stdout or to a file (according to LOG_FILE global constant)
Parameters: messages – messages to print
-
dgenies.bin.local_scheduler.
get_prep_scheduled_jobs
()[source]¶ Get list of jobs ready to be prepared (all data is downloaded and parsed)
Returns: list of jobs Return type: list
-
dgenies.bin.local_scheduler.
get_preparing_jobs_cluster_nb
()[source]¶ Get number of jobs in preparation step (for cluster runs)
Returns: number of jobs Return type: int
-
dgenies.bin.local_scheduler.
get_preparing_jobs_nb
()[source]¶ Get number of jobs in preparation step (for local runs)
Returns: number of jobs Return type: int
-
dgenies.bin.local_scheduler.
get_scheduled_cluster_jobs
()[source]¶ Get list of jobs ready to be started (for cluster runs)
Returns: list of jobs Return type: list
-
dgenies.bin.local_scheduler.
get_scheduled_local_jobs
()[source]¶ Get list of jobs ready to be started (for local runs)
Returns: list of jobs Return type: list
-
dgenies.bin.local_scheduler.
move_job_to_cluster
(id_job)[source]¶ Change local job to be run on the cluster
Parameters: id_job – Returns:
-
dgenies.bin.local_scheduler.
parse_args
()[source]¶ Parse command line arguments and define DEBUG and LOG_FILE constants
-
dgenies.bin.local_scheduler.
parse_started_jobs
()[source]¶ Parse all started jobs: check all is OK, change jobs status if needed. Look for died jobs
Returns: (list of id of jobs started locally, list of id of jobs started on cluster) Return type: (list, list)
-
dgenies.bin.local_scheduler.
parse_uploads_asks
()[source]¶ Parse asks for an upload: allow new uploads when other end, remove expired sessions, …
dgenies.bin.merge_splitted_chrms module¶
-
class
dgenies.bin.merge_splitted_chrms.
Merger
(paf_in, paf_out, query_in, query_out, debug=False)[source]¶ Bases:
object
Merge splitted contigs together in PAF file
Parameters: - paf_in (str) – input PAF file path
- paf_out (str) – output PAF file path
- query_in (str) – input query index file path
- query_out (str) – output query index file path
- debug (bool) – True to enable debug mode
-
static
_get_sorted_splits
(contigs_split, all_contigs)[source]¶ For each contigs_split, save how many base we will must add to each line of the corresponding split contig in PAF file. Also, save the final merged contig size in all contig dict
Parameters: - contigs_split (dict) – split contigs
- all_contigs (dict) – all and final contigs
Returns: all contigs and new split contigs with start of each split contig set
Return type: (dict, dict)
-
_printer
(message)[source]¶ Print debug messages if debug mode enabled
Parameters: message (str) – message to print
-
load_query_index
(index)[source]¶ Load query index
Parameters: index (str) – index file path Returns: - [0] contigs length
- [1] splitted contigs length
- [2] sample name
Return type: (dict, dict, str)
-
static
merge_paf
(paf_in, paf_out, contigs, contigs_split)[source]¶ Do merge PAF staff
Parameters: - paf_in (str) – path of input PAF with split contigs
- paf_out (str) – path of output PAF where split contigs are now merged together
- contigs (dict) – contigs size
- contigs_split (dict) – split contigs size
dgenies.bin.sort_paf module¶
-
class
dgenies.bin.sort_paf.
Sorter
(input_f, output_f)[source]¶ Bases:
object
Sort PAF file by match size
Parameters: - input_f (str) – input fasta file path
- output_f (str) – output fasta file path
dgenies.bin.split_fa module¶
-
class
dgenies.bin.split_fa.
Splitter
(input_f, name_f, output_f, size_c=10000000, query_index='query_split.idx', debug=False)[source]¶ Bases:
object
Split large contigs in smaller ones
Parameters: - input_f (str) – input fasta file path
- name_f (str) – sample name
- output_f (str) – output fasta file path
- size_c (int) – size of split contigs
- query_index (str) – index file path for query
- debug (bool) – True to enable debug mode