dgenies.lib package¶
Submodules¶
dgenies.lib.crons module¶
-
class
dgenies.lib.crons.
Crons
(base_dir, debug)[source]¶ Bases:
object
Manage crontab jobs (webserver mode)
Parameters: - base_dir (str) – software base directory path
- debug (bool) – True to enable debug mode
-
clear
(kill_scheduler=True)[source]¶ Clear all crons
Parameters: kill_scheduler (bool) – if True, kill local scheduler currently running
dgenies.lib.decorators module¶
dgenies.lib.drmaasession module¶
dgenies.lib.fasta module¶
-
class
dgenies.lib.fasta.
Fasta
(name, path, type_f, example=False)[source]¶ Bases:
object
Defines a fasta file: name of the sample, path to the fasta file, type of file (URL or local file), …
Parameters: - name (str) – sample name
- path (str) – fasta file path
- type_f (str) – type of file (local file or URL)
- example (bool) – is an example job
dgenies.lib.functions module¶
-
class
dgenies.lib.functions.
Functions
[source]¶ Bases:
object
General functions
-
static
_Functions__get_do_sort
(fasta, is_sorted)¶ Check whether query must be sorted (False if already done)
Parameters: - fasta (str) – fasta file
- is_sorted (bool) – True if it’s sorted
Returns: do sort
Return type: bool
-
static
allowed_file
(filename, file_formats=('fasta', ))[source]¶ Check whether a file has a valid format
Parameters: - filename – file path
- file_formats – accepted file formats
Returns: True if valid format, else False
-
static
compress
(filename)[source]¶ Compress a file with gzip
Parameters: filename (str) – file to compress Returns: path of the compressed file Return type: str
-
static
compress_and_send_mail
(job_name, fasta_file, index_file, lock_file, mailer)[source]¶ Compress fasta file and the send mail with its link to the client
Parameters: - job_name (str) – job id
- fasta_file (str) – fasta file path
- index_file (str) – index file path
- lock_file (str) – lock file path
- mailer (Mailer) – mailer object (to send mail)
-
config
= <dgenies.config_reader.AppConfigReader object>¶
-
static
get_fasta_file
(res_dir, type_f, is_sorted)[source]¶ Get fasta file path
Parameters: - res_dir (str) – job results directory
- type_f (str) – type of file (query or target)
- is_sorted (bool) – is fasta sorted
Returns: fasta file path
Return type: str
-
static
get_gallery_items
()[source]¶ Get list of items from the gallery
Returns: list of item of the gallery. Each item is a dict with 7 keys: - name : name of the job
- id_job : id of the job
- picture : illustrating picture filename (located in gallery folder of the data folder)
- query : query specie name
- target : target specie name
- mem_peak : max memory used for the run (human readable)
- time_elapsed : time elapsed for the run (human readable)
Return type: list of dict
-
static
get_list_all_jobs
(mode='webserver')[source]¶ Get list of all jobs
Parameters: mode (str) – webserver or standalone Returns: list of all jobs in standalone mode. Empty list in webserver mode Return type: list
-
static
get_mail_for_job
(id_job)[source]¶ Retrieve associated mail for a job
Parameters: id_job (int) – job id Returns: associated mail address Return type: str
-
static
get_readable_size
(size, nb_after_coma=1)[source]¶ Get human readable size from a given size in bytes
Parameters: - size (int) – size in bytes
- nb_after_coma (int) – number of digits after coma
Returns: size, human readable
Return type: str
-
static
get_readable_time
(seconds)[source]¶ Get human readable time
Parameters: seconds (int) – time in seconds Returns: time, human readable Return type: str
-
static
get_valid_uploaded_filename
(filename, folder)[source]¶ Check whether uploaded file already exists. If yes, rename it
Parameters: - filename (str) – uploaded file
- folder (str) – folder into save the file
Returns: unique filename
Return type: str
-
static
is_in_gallery
(id_job, mode='webserver')[source]¶ Check whether a job is in the gallery
Parameters: - id_job (str) – job id
- mode (str) – webserver or standalone
Returns: True if job is in the gallery, else False
Return type: bool
-
static
query_fasta_file_exists
(res_dir)[source]¶ Check if a fasta file exists
Parameters: res_dir (str) – job result directory Returns: True if file exists and is a regular file, else False Return type: bool
-
static
random_string
(s_len)[source]¶ Generate a random string
Parameters: s_len (int) – length of the string to generate Returns: the random string Return type: str
-
static
read_index
(index_file)[source]¶ Load index of query or target
Parameters: index_file (str) – index file path Returns: - [0] index (size of each chromosome) {dict}
- [1] sample name {str}
Return type: (dict, str)
-
static
send_fasta_ready
(mailer, job_name, sample_name, compressed=False, path='fasta-query', status='success', ext='fasta')[source]¶ Send link to fasta file when treatment ended
Parameters: - mailer (Mailer) – mailer object
- job_name (str) – job id
- sample_name (str) – sample name
- compressed (bool) – is a compressed fasta file
- path (str) – fasta path
- status (str) – treatment status
- ext (str) – file extension
-
static
sort_fasta
(job_name, fasta_file, index_file, lock_file, compress=False, mailer=None, mode='webserver')[source]¶ Sort fasta file according to the sorted index file
Parameters: - job_name (str) – job id
- fasta_file (str) – fasta file path
- index_file (str) – index file path
- lock_file (str) – lock file path
- compress (bool) – compress result fasta file
- mailer (Mailer) – mailer object (to send mail)
- mode (str) – webserver or standalone
-
static
dgenies.lib.job_manager module¶
-
class
dgenies.lib.job_manager.
JobManager
(id_job, email=None, query: dgenies.lib.fasta.Fasta = None, target: dgenies.lib.fasta.Fasta = None, mailer=None, tool='minimap2', align: dgenies.lib.fasta.Fasta = None, backup: dgenies.lib.fasta.Fasta = None)[source]¶ Bases:
object
Jobs management
Parameters: - id_job (str) – job id
- email (str) – email from user
- query (Fasta) – query fasta
- target (Fasta) – target fasta
- mailer (Mailer) – mailer object (to send mail throw flask app)
- tool (str) – tool to use for mapping (choice from tools config)
- align (Fasta) – alignment file (PAF, MAF, …) as a fasta object
- backup (Fasta) – backup TAR file
-
_after_start
(success, error_set)[source]¶ Tasks done after input files downloaded, checked and parsed: if success, set job status to “waiting”. Else, set job error and send mail.
Parameters: - success (bool) – job success
- error_set (bool) – error already set (else, set it now)
-
_check_url
(fasta, formats)[source]¶ Check if an URL is valid, and if the file is valid too
Parameters: - fasta (Fasta) – fasta file object
- formats (tuple) – allowed file formats
Returns: True if URL and file are valid, else False
Return type: bool
-
_download_file
(url)[source]¶ Download a file from an URL
Parameters: url (str) – url of the file to download Returns: absolute path of the downloaded file Return type: str
-
_end_of_prepare_dotplot
()[source]¶ Tasks done after preparing dot plot data: parse & sort of alignment file
-
_get_filename_from_url
(url)[source]¶ Retrieve filename from an URL (http or ftp)
Parameters: url (str) – url of the file to download Returns: filename Return type: str
-
_getting_file_from_url
(fasta, type_f)[source]¶ Download file from URL
Parameters: - fasta (Fasta) – Fasta object describing the input file
- type_f (str) – type of the file (query or target)
Returns: - [0] True if no error happened, else False
- [1] If an error happened, True if the error was saved for the job, else False (will be saved later)
- [2] Finale path of the downloaded file {str}
- [3] Name of the downloaded file {str}
Return type: tuple
-
_getting_local_file
(fasta, type_f)[source]¶ Copy temp file to its final location
Parameters: - fasta (Fasta) – fasta file Object
- type_f (str) – query or target
Returns: final full path of the file
Return type: str
-
_launch_drmaa
(batch_system_type)[source]¶ Launch the mapping step to a cluster
Parameters: batch_system_type – slurm or sge Returns: True if job succeed, else False
-
_launch_local
()[source]¶ Launch a job on the current machine
Returns: True if job succeed, else False Return type: bool
-
_set_analytics_job_status
(status)[source]¶ Change status for a job in analytics database
Parameters: status (str (20)) – new status
-
check_file
(input_type, should_be_local, max_upload_size_readable)[source]¶ Check if file is correct: format, size, valid gzip
Parameters: - input_type – query or target
- should_be_local – True if job should be treated locally
- max_upload_size_readable – max upload size human readable
Returns: (True if correct, True if error set [for fail], True if should be local)
-
check_job_status_sge
()[source]¶ Check status of a SGE job run
Returns: True if the job jas successfully ended, else False
-
check_job_status_slurm
()[source]¶ Check status of a SLURM job run
Returns: True if the job has successfully ended, else False
-
check_job_success
()[source]¶ Check if a job succeed
Returns: status of a job: succeed, no-match or fail Return type: str
-
delete
()[source]¶ Remove a job
Returns: - [0] Success of the deletion
- [1] Error message, if any (else empty string)
Return type: (bool, str)
-
do_align
()[source]¶ Check if we have to make alignment
Returns: True if the job is launched with an alignment file
-
download_files_with_pending
(files_to_download, should_be_local, max_upload_size_readable)[source]¶ Download files from URLs, with pending (according to the max number of concurrent downloads)
Parameters: - files_to_download (list of list) – files to download. For each item of the list, it’s a list with 2 elements: first one is the Fasta object, second one the input type (query or target)
- should_be_local (bool) – True if the job should be run locally (according to input file sizes), else False
- max_upload_size_readable (str) – Human readable max upload size (to show on errors)
-
static
find_error_in_log
(log_file)[source]¶ Find error in log (for cluster run)
Parameters: log_file – log file of the job Returns: error (empty if no error) Return type: str
-
get_file_size
(filepath: str)[source]¶ Get file size
Parameters: filepath (str) – file path Returns: file size (bytes) Return type: int
-
get_mail_content
(status, target_name, query_name=None)[source]¶ Build mail content for status mail
Parameters: - status (str) – job status
- target_name (str) – name of target
- query_name (str) – name of query
Returns: mail content
Return type: str
-
get_mail_content_html
(status, target_name, query_name=None)[source]¶ Build mail content as HTML
Parameters: - status (str) – job status
- target_name (str) – name of target
- query_name (str) – name of query
Returns: mail content (html)
Return type: str
-
get_mail_subject
(status)[source]¶ Build mail subject
Parameters: status (str) – job status Returns: mail subject Return type: str
-
static
get_pending_local_number
()[source]¶ Get number of of jobs running or waiting for a run
Returns: number of jobs Return type: int
-
get_query_split
()[source]¶ Get query split fasta file
Returns: split query fasta file Return type: str
-
get_status_standalone
(with_error=False)[source]¶ Get job status in standalone mode
Parameters: with_error – get also the error Returns: status (and error, if with_error=True) Return type: str or tuple (if with_error=True)
-
getting_files
()[source]¶ Get files for the job
Returns: - [0] True if getting files succeed, False else
- [1] If error happenned, True if error already saved for the job, False else (error will be saved later)
- [2] True if no data must be downloaded (will be downloaded with pending if True)
Return type: tuple
-
static
is_gz_file
(filepath)[source]¶ Check if a file is gzipped
Parameters: filepath (str) – file to check Returns: True if gzipped, else False
-
is_target_filtered
()[source]¶ Check if target has been filtered
Returns: True if filtered, else False Returns:
-
launch_to_cluster
(step, batch_system_type, command, args, log_out, log_err)[source]¶ Launch a program to the cluster
Parameters: - step (str) – step (prepare, start)
- batch_system_type (str) – slurm or sge
- command (str) – program to launch (without arguments)
- args (list) – arguments to use for the program
- log_out (str) – log file for stdout
- log_err (str) – log file for stderr
Returns: True if succeed, else False
Return type: bool
-
prepare_data_cluster
(batch_system_type)[source]¶ Launch of prepare data on a cluster
Parameters: batch_system_type (str) – slurm or sge Returns: True if succeed, else False Return type: bool
-
prepare_data_local
()[source]¶ Prepare data locally. On standalone mode, launch job after, if success. :return: True if job succeed, else False :rtype: bool
-
prepare_dotplot_cluster
(batch_system_type)[source]¶ Prepare data if alignment already done: just index the fasta (if index not given), then parse the alignment
Parameters: batch_system_type (str) – type of cluster (slurm or sge)
-
prepare_dotplot_local
()[source]¶ Prepare data if alignment already done: just index the fasta (if index not given), then parse the alignment file and sort it.
-
run_job
(batch_system_type)[source]¶ Run of a job (mapping step)
Parameters: batch_system_type (str) – type of cluster (slurm or sge)
-
run_job_in_thread
(batch_system_type='local')[source]¶ Run a job asynchronously into a new thread
Parameters: batch_system_type (str) – slurm or sge
-
search_error
()[source]¶ Search for an error in the log file (for local runs). If no error found, returns a generic error message
Returns: error message to give to the user Return type: str
-
set_job_status
(status, error='')[source]¶ Change status of a job
Parameters: - status (str) – new job status
- error (str) – error description (if any)
-
set_status_standalone
(status, error='')[source]¶ Change job status in standalone mode
Parameters: - status (str) – new status
- error (str) – error description (if any)
dgenies.lib.latest module¶
dgenies.lib.mailer module¶
dgenies.lib.paf module¶
-
class
dgenies.lib.paf.
Paf
(paf: str, idx_q: str, idx_t: str, auto_parse: bool = True, mailer=None, id_job=None)[source]¶ Bases:
object
Functions applied to PAF files
Parameters: - paf (str) – PAF file path
- idx_q (str) – query index file path
- idx_t (str) – target index file path
- auto_parse (bool) – if True, parse PAF file at initialisation
- mailer (Mailer) – mailer object, to send mails
- id_job (str) – job id
-
_add_percents
(percents, item)[source]¶ Update percents with interval
Parameters: - percents (dict) – initial percents
- item (Interval) – interval from IntervalTree
Returns: new percents
Return type: dict
-
static
_flush_blocks
(index_c, new_index_c, new_index_o, current_block)[source]¶ When parsing index, build a mix of too small sequential contigs (if their number exceed 5), else just add co to the new index
Parameters: - index_c (dict) – current index contigs def
- new_index_o (list) – new index contigs order
- new_index_c (dict) – new index contigs def
- current_block (list) – contigs in the current analyzed block
Returns: (new index contigs defs, new index contigs order)
Return type: (dict, list)
-
_remove_overlaps
(position_idy, percents)[source]¶ Remove overlaps between matches on the diagonal
Parameters: - position_idy (IntervalTree) – matches intervals with associated identity category
- percents (dict) – Percent of matches for each identity category
Returns: new percents (updated after overlap removing)
Return type: dict
-
_update_query_index
(contigs_reoriented)[source]¶ Write new query index file (including new reoriented contigs info)
Parameters: contigs_reoriented (list) – reoriented contigs list
-
build_list_no_assoc
(to)[source]¶ Build list of queries that match with None target, or the opposite
Parameters: to – query or target Returns: content of the file
-
build_query_chr_as_reference
()[source]¶ Assemble query contigs like reference chromosomes
Returns: path of the fasta file
-
build_query_on_target_association_file
()[source]¶ For each query, get the best matching chromosome and save it to a CSV file. Use the order of queries
Returns: content of the file
-
build_summary_stats
(status_file)[source]¶ Get summary of identity
Returns: table with percents by category
-
compute_gravity_contigs
()[source]¶ Compute gravity for each contig on each chromosome (how many big matches they have). Will be used to find which chromosome has the highest value for each contig
Returns: - [0] gravity for each contig and each chromosome:
- {contig1: {chr1: value, chr2: value, …}, contig2: …}
- [1] For each block save lines inside:
- [median_on_query, squared length, median_on_target, x1, x2, y1, y2, length] (x : on target, y: on query)
-
get_d3js_data
()[source]¶ Build data for D3.js client
Returns: data for d3.js: - y_len: length of query (Bp)
- x_len: length of target (Bp)
- min_idy: minimum of identity (float)
- max_idy: maximum of identity (float)
- lines: matches lines, by class of identity (dict)
- y_contigs: query contigs definitions (dict)
- y_order: query contigs order (list)
- x_contigs: target contigs definitions (dict)
- x_order: target contigs order (list)
- name_y: name of the query (str)
- name_x: name of the target (str)
- limit_idy: limit for each class of identities (list)
Return type: dict
-
get_queries_on_target_association
()[source]¶ For each target, get the list of queries associated to it
Returns: list of queries associated to each target Return type: dict
-
get_query_on_target_association
(with_coords=True)[source]¶ For each query, get the best matching chromosome
Returns: query on target association Return type: dict
-
get_summary_stats
()[source]¶ Load summary statistics from file
Returns: summary object or None if summary not already built Return type: dict
-
is_contig_well_oriented
(lines, contig, chrom)[source]¶ Returns True if the contig is well oriented. A well oriented contig must have y increased when x increased. We check that only for highest matches (small matches must be ignored)
Parameters: - lines (list) – lines inside the contig
- contig (str) – query contig name
- chrom (str) – target chromosome name
Returns: True if well oriented, False else
Return type: bool
-
keyerror_message
(exception, type_f)[source]¶ Build message if contig not found in query or target
Parameters: - exception (KeyError) – exception object
- type_f (str) – type of data (query or target)
Returns: error message
Return type: str
-
limit_idy
= [0.25, 0.5, 0.75]¶
-
max_nb_lines
= 100000¶
-
parse_index
(index_o: list, index_c: dict, full_len: int)[source]¶ Parse index and merge too small contigs together
Parameters: - index_o (list) – index contigs order
- index_c (dict) – index contigs def
- full_len (int) – length of the sequence
Returns: (new contigs def, new contigs order)
Return type: (dict, list)
-
parse_paf
(merge_index=True, noise=True)[source]¶ Parse PAF file
Parameters: - merge_index (bool) – if True, merge too small contigs in index
- noise (bool) – if True, remove noise
-
static
remove_noise
(lines, noise_limit)[source]¶ Remove noise from the dot plot
Parameters: - lines (dict) – lines of the dot plot, by class
- noise_limit (float) – line length limit
Returns: kept lines, by class
Return type: dict
dgenies.lib.parsers module¶
Define tools parsers here
Each parser (main function) must have 2 and only 2 arguments: - First argument: input file which is the tool raw output - Second argument: finale PAF file
Returns True if parse succeed, else False
dgenies.lib.upload_file module¶
dgenies.lib.validators module¶
Define formats validators here (for alignment files)
Each validator (main function) has a name which is exactly the name of the format in the aln-formats.yaml file. Only 1 argument to this function: - Input file to check
Secondary functions must start with _
Returns True if file is valid, else False
-
dgenies.lib.validators.
_filter_maf
(in_file)[source]¶ Filter Maf file (remove unused lines)
Parameters: in_file – maf file to filter
-
dgenies.lib.validators.
idx
(in_file)[source]¶ Index file validator
Parameters: in_file (str) – index file to test Returns: True if valid, else False Return type: bool