dgenies.lib package¶
Submodules¶
dgenies.lib.crons module¶
- class dgenies.lib.crons.Crons(base_dir, debug)[source]¶
Bases:
object
Manage crontab jobs (webserver mode)
- Parameters
base_dir (str) – software base directory path
debug (bool) – True to enable debug mode
- clear(kill_scheduler=True, remove_pid_file=True)[source]¶
Clear all crons
- Parameters
kill_scheduler (bool) – if True, kill local scheduler currently running
remove_pid_file (bool) – if True, remove pid file if local scheduler was killed successfully
dgenies.lib.decorators module¶
dgenies.lib.drmaasession module¶
dgenies.lib.fasta module¶
- class dgenies.lib.fasta.Fasta(name, path, type_f, example=False)[source]¶
Bases:
object
Defines a fasta file: name of the sample, path to the fasta file, type of file (URL or local file), …
- Parameters
name (str) – sample name
path (str) – fasta file path
type_f (str) – type of file (local file or URL)
example (bool) – is an example job
dgenies.lib.functions module¶
- class dgenies.lib.functions.Functions[source]¶
Bases:
object
General functions
- static allowed_file(filename, file_formats=('fasta',))[source]¶
Check whether a file has a valid format
- Parameters
filename – file path
file_formats – accepted file formats
- Returns
True if valid format, else False
- static compress(filename)[source]¶
Compress a file with gzip
- Parameters
filename (str) – file to compress
- Returns
path of the compressed file
- Return type
str
- static compress_and_send_mail(job_name, fasta_file, index_file, lock_file, mailer)[source]¶
Compress fasta file and the send mail with its link to the client
- Parameters
job_name (str) – job id
fasta_file (str) – fasta file path
index_file (str) – index file path
lock_file (str) – lock file path
mailer (Mailer) – mailer object (to send mail)
- config = <dgenies.config_reader.AppConfigReader object>¶
- static get_fasta_file(res_dir, type_f, is_sorted)[source]¶
Get fasta file path
- Parameters
res_dir (str) – job results directory
type_f (str) – type of file (query or target)
is_sorted (bool) – is fasta sorted
- Returns
fasta file path
- Return type
str
- static get_gallery_items()[source]¶
Get list of items from the gallery
- Returns
list of item of the gallery. Each item is a dict with 7 keys:
name : name of the job
id_job : id of the job
picture : illustrating picture filename (located in gallery folder of the data folder)
query : query specie name
target : target specie name
mem_peak : max memory used for the run (human readable)
time_elapsed : time elapsed for the run (human readable)
- Return type
list of dict
- static get_list_all_jobs(mode='webserver')[source]¶
Get list of all jobs
- Parameters
mode (str) – webserver or standalone
- Returns
list of all jobs in standalone mode. Empty list in webserver mode
- Return type
list
- static get_mail_for_job(id_job)[source]¶
Retrieve associated mail for a job
- Parameters
id_job (int) – job id
- Returns
associated mail address
- Return type
str
- static get_readable_size(size, nb_after_coma=1, base='B')[source]¶
Get human readable size from a given size in bytes
- Parameters
size (int) – size in bytes
nb_after_coma (str) – number of digits after coma
base – base unit of size, must be either “B”, “KiB”, “MiB” or “GiB”
- Returns
size, human readable
- Return type
str
- static get_readable_time(seconds)[source]¶
Get human readable time
- Parameters
seconds (int) – time in seconds
- Returns
time, human readable
- Return type
str
- static get_valid_uploaded_filename(filename, folder)[source]¶
Check whether uploaded file already exists. If yes, rename it
- Parameters
filename (str) – uploaded file
folder (str) – folder into save the file
- Returns
unique filename
- Return type
str
- static is_in_gallery(id_job, mode='webserver')[source]¶
Check whether a job is in the gallery
- Parameters
id_job (str) – job id
mode (str) – webserver or standalone
- Returns
True if job is in the gallery, else False
- Return type
bool
- static query_fasta_file_exists(res_dir)[source]¶
Check if a fasta file exists
- Parameters
res_dir (str) – job result directory
- Returns
True if file exists and is a regular file, else False
- Return type
bool
- static random_string(s_len)[source]¶
Generate a random string
- Parameters
s_len (int) – length of the string to generate
- Returns
the random string
- Return type
str
- static read_index(index_file)[source]¶
Load index of query or target
- Parameters
index_file (str) – index file path
- Returns
[0] index (size of each chromosome) {dict}
[1] sample name {str}
- Return type
(dict, str)
- static send_fasta_ready(mailer, job_name, sample_name, compressed=False, path='fasta-query', status='success', ext='fasta')[source]¶
Send link to fasta file when treatment ended
- Parameters
mailer (Mailer) – mailer object
job_name (str) – job id
sample_name (str) – sample name
compressed (bool) – is a compressed fasta file
path (str) – fasta path
status (str) – treatment status
ext (str) – file extension
- static sort_fasta(job_name, fasta_file, index_file, lock_file, compress=False, mailer=None, mode='webserver')[source]¶
Sort fasta file according to the sorted index file
- Parameters
job_name (str) – job id
fasta_file (str) – fasta file path
index_file (str) – index file path
lock_file (str) – lock file path
compress (bool) – compress result fasta file
mailer (Mailer) – mailer object (to send mail)
mode (str) – webserver or standalone
dgenies.lib.job_manager module¶
- class dgenies.lib.job_manager.JobManager(id_job, email=None, query: Optional[dgenies.lib.fasta.Fasta] = None, target: Optional[dgenies.lib.fasta.Fasta] = None, mailer=None, tool='minimap2', align: Optional[dgenies.lib.fasta.Fasta] = None, backup: Optional[dgenies.lib.fasta.Fasta] = None, options=None)[source]¶
Bases:
object
Jobs management
- Parameters
id_job (str) – job id
email (str) – email from user
query (Fasta) – query fasta
target (Fasta) – target fasta
mailer (Mailer) – mailer object (to send mail throw flask app)
tool (str) – tool to use for mapping (choice from tools config)
align (Fasta) – alignment file (PAF, MAF, …) as a fasta object
backup (Fasta) – backup TAR file
options (list) – list of str containing options for the chosen tool
- check_file(input_type, should_be_local, max_upload_size_readable)[source]¶
Check if file is correct: format, size, valid gzip
- Parameters
input_type – query or target
should_be_local – True if job should be treated locally
max_upload_size_readable – max upload size human readable
- Returns
(True if correct, True if error set [for fail], True if should be local)
- check_job_status_sge()[source]¶
Check status of a SGE job run
- Returns
True if the job jas successfully ended, else False
- check_job_status_slurm()[source]¶
Check status of a SLURM job run
- Returns
True if the job has successfully ended, else False
- check_job_success()[source]¶
Check if a job succeed
- Returns
status of a job: succeed, no-match or fail
- Return type
str
- delete()[source]¶
Remove a job
- Returns
[0] Success of the deletion
[1] Error message, if any (else empty string)
- Return type
(bool, str)
- do_align()[source]¶
Check if we have to make alignment
- Returns
True if the job is launched with an alignment file
- download_files_with_pending(files_to_download, should_be_local, max_upload_size_readable)[source]¶
Download files from URLs, with pending (according to the max number of concurrent downloads)
- Parameters
files_to_download (list of list) – files to download. For each item of the list, it’s a list with 2 elements: first one is the Fasta object, second one the input type (query or target)
should_be_local (bool) – True if the job should be run locally (according to input file sizes), else False
max_upload_size_readable (str) – Human readable max upload size (to show on errors)
- static find_error_in_log(log_file)[source]¶
Find error in log (for cluster run)
- Parameters
log_file – log file of the job
- Returns
error (empty if no error)
- Return type
str
- get_file_size(filepath: str)[source]¶
Get file size
- Parameters
filepath (str) – file path
- Returns
file size (bytes)
- Return type
int
- get_mail_content(status, target_name, query_name=None)[source]¶
Build mail content for status mail
- Parameters
status (str) – job status
target_name (str) – name of target
query_name (str) – name of query
- Returns
mail content
- Return type
str
- get_mail_content_html(status, target_name, query_name=None)[source]¶
Build mail content as HTML
- Parameters
status (str) – job status
target_name (str) – name of target
query_name (str) – name of query
- Returns
mail content (html)
- Return type
str
- get_mail_subject(status)[source]¶
Build mail subject
- Parameters
status (str) – job status
- Returns
mail subject
- Return type
str
- static get_pending_local_number()[source]¶
Get number of of jobs running or waiting for a run
- Returns
number of jobs
- Return type
int
- get_query_split()[source]¶
Get query split fasta file
- Returns
split query fasta file
- Return type
str
- get_status_standalone(with_error=False)[source]¶
Get job status in standalone mode
- Parameters
with_error – get also the error
- Returns
status (and error, if with_error=True)
- Return type
str or tuple (if with_error=True)
- getting_files()[source]¶
Get files for the job
- Returns
[0] True if getting files succeed, False else
[1] If error happenned, True if error already saved for the job, False else (error will be saved later)
[2] True if no data must be downloaded (will be downloaded with pending if True)
- Return type
tuple
- static is_gz_file(filepath)[source]¶
Check if a file is gzipped
- Parameters
filepath (str) – file to check
- Returns
True if gzipped, else False
- is_target_filtered()[source]¶
Check if target has been filtered
- Returns
True if filtered, else False
- Returns
- launch_to_cluster(step, batch_system_type, command, args, log_out, log_err)[source]¶
Launch a program to the cluster
- Parameters
step (str) – step (prepare, start)
batch_system_type (str) – slurm or sge
command (str) – program to launch (without arguments)
args (list) – arguments to use for the program
log_out (str) – log file for stdout
log_err (str) – log file for stderr
- Returns
True if succeed, else False
- Return type
bool
- prepare_data_cluster(batch_system_type)[source]¶
Launch of prepare data on a cluster
- Parameters
batch_system_type (str) – slurm or sge
- Returns
True if succeed, else False
- Return type
bool
- prepare_data_local()[source]¶
Prepare data locally. On standalone mode, launch job after, if success. :return: True if job succeed, else False :rtype: bool
- prepare_dotplot_cluster(batch_system_type)[source]¶
Prepare data if alignment already done: just index the fasta (if index not given), then parse the alignment
- Parameters
batch_system_type (str) – type of cluster (slurm or sge)
- prepare_dotplot_local()[source]¶
Prepare data if alignment already done: just index the fasta (if index not given), then parse the alignment file and sort it.
- run_job(batch_system_type)[source]¶
Run of a job (mapping step)
- Parameters
batch_system_type (str) – type of cluster (slurm or sge)
- run_job_in_thread(batch_system_type='local')[source]¶
Run a job asynchronously into a new thread
- Parameters
batch_system_type (str) – slurm or sge
- search_error()[source]¶
Search for an error in the log file (for local runs). If no error found, returns a generic error message
- Returns
error message to give to the user
- Return type
str
- set_job_status(status, error='')[source]¶
Change status of a job
- Parameters
status (str) – new job status
error (str) – error description (if any)
- set_status_standalone(status, error='')[source]¶
Change job status in standalone mode
- Parameters
status (str) – new status
error (str) – error description (if any)
dgenies.lib.latest module¶
dgenies.lib.mailer module¶
dgenies.lib.paf module¶
- class dgenies.lib.paf.Paf(paf: str, idx_q: str, idx_t: str, auto_parse: bool = True, mailer=None, id_job=None)[source]¶
Bases:
object
Functions applied to PAF files
- Parameters
paf (str) – PAF file path
idx_q (str) – query index file path
idx_t (str) – target index file path
auto_parse (bool) – if True, parse PAF file at initialisation
mailer (Mailer) – mailer object, to send mails
id_job (str) – job id
- build_list_no_assoc(to)[source]¶
Build list of queries that match with None target, or the opposite
- Parameters
to – query or target
- Returns
content of the file
- build_query_chr_as_reference()[source]¶
Assemble query contigs like reference chromosomes
- Returns
path of the fasta file
- build_query_on_target_association_file()[source]¶
For each query, get the best matching chromosome and save it to a CSV file. Use the order of queries
- Returns
content of the file
- build_summary_stats(status_file)[source]¶
Get summary of identity
- Returns
table with percents by category
- compute_gravity_contigs()[source]¶
Compute gravity for each contig on each chromosome (how many big matches they have). Will be used to find which chromosome has the highest value for each contig
- Returns
- [0] gravity for each contig and each chromosome:
{contig1: {chr1: value, chr2: value, …}, contig2: …}
- [1] For each block save lines inside:
[median_on_query, squared length, median_on_target, x1, x2, y1, y2, length] (x : on target, y: on query)
- config = <dgenies.config_reader.AppConfigReader object>¶
- get_d3js_data()[source]¶
Build data for D3.js client
- Returns
data for d3.js:
y_len: length of query (Bp)
x_len: length of target (Bp)
min_idy: minimum of identity (float)
max_idy: maximum of identity (float)
lines: matches lines, by class of identity (dict)
y_contigs: query contigs definitions (dict)
y_order: query contigs order (list)
x_contigs: target contigs definitions (dict)
x_order: target contigs order (list)
name_y: name of the query (str)
name_x: name of the target (str)
limit_idy: limit for each class of identities (list)
- Return type
dict
- get_queries_on_target_association()[source]¶
For each target, get the list of queries associated to it
- Returns
list of queries associated to each target
- Return type
dict
- get_query_on_target_association(with_coords=True)[source]¶
For each query, get the best matching chromosome
- Returns
query on target association
- Return type
dict
- get_summary_stats()[source]¶
Load summary statistics from file
- Returns
summary object or None if summary not already built
- Return type
dict
- is_contig_well_oriented(lines, contig, chrom)[source]¶
Returns True if the contig is well oriented. A well oriented contig must have y increased when x increased. We check that only for highest matches (small matches must be ignored)
- Parameters
lines (list) – lines inside the contig
contig (str) – query contig name
chrom (str) – target chromosome name
- Returns
True if well oriented, False else
- Return type
bool
- keyerror_message(exception, type_f)[source]¶
Build message if contig not found in query or target
- Parameters
exception (KeyError) – exception object
type_f (str) – type of data (query or target)
- Returns
error message
- Return type
str
- limit_idy = [0.25, 0.5, 0.75]¶
- max_nb_lines = 100000¶
- parse_index(index_o: list, index_c: dict, full_len: int)[source]¶
Parse index and merge too small contigs together
- Parameters
index_o (list) – index contigs order
index_c (dict) – index contigs def
full_len (int) – length of the sequence
- Returns
(new contigs def, new contigs order)
- Return type
(dict, list)
- parse_paf(merge_index=True, noise=True)[source]¶
Parse PAF file
- Parameters
merge_index (bool) – if True, merge too small contigs in index
noise (bool) – if True, remove noise
- static remove_noise(lines, noise_limit)[source]¶
Remove noise from the dot plot
- Parameters
lines (dict) – lines of the dot plot, by class
noise_limit (float) – line length limit
- Returns
kept lines, by class
- Return type
dict
dgenies.lib.parsers module¶
Define tools parsers here
Each parser (main function) must have 2 and only 2 arguments: - First argument: input file which is the tool raw output - Second argument: finale PAF file
Returns True if parse succeed, else False
dgenies.lib.upload_file module¶
dgenies.lib.validators module¶
Define formats validators here (for alignment files)
Each validator (main function) has a name which is exactly the name of the format in the aln-formats.yaml file. Only 1 argument to this function: - Input file to check
Secondary functions must start with _
Validators for non-mapping files must start with “v_”
Returns True if file is valid, else False
- dgenies.lib.validators.maf(in_file)[source]¶
Maf validator
- Parameters
in_file (str) – maf file to test
- Returns
True if valid, else False
- Return type
bool