dgenies.lib package¶

Submodules¶

dgenies.lib.crons module¶

class dgenies.lib.crons.Crons(base_dir, debug)[source]¶

Bases: object

Manage crontab jobs (webserver mode)

Parameters

base_dir (str) – software base directory path
debug (bool) – True to enable debug mode

clear(kill_scheduler=True, remove_pid_file=True)[source]¶

Clear all crons

Parameters

kill_scheduler (bool) – if True, kill local scheduler currently running
remove_pid_file (bool) – if True, remove pid file if local scheduler was killed successfully

init_clean_cron()[source]¶: Initialize clean cron: will clear old jobs. Clean cron is launched at 1h00am each day

init_launch_local_cron()[source]¶: Try to launch local scheduler (if not already launched)

start_all()[source]¶: Start all crons

dgenies.lib.decorators module¶

class dgenies.lib.decorators.Singleton(klass)[source]¶

Bases: object

Define a singleton (design pattern)

dgenies.lib.drmaasession module¶

dgenies.lib.fasta module¶

class dgenies.lib.fasta.Fasta(name, path, type_f, example=False)[source]¶

Bases: object

Defines a fasta file: name of the sample, path to the fasta file, type of file (URL or local file), …

Parameters

name (str) – sample name
path (str) – fasta file path
type_f (str) – type of file (local file or URL)
example (bool) – is an example job

get_name()[source]¶

Get sample name

Returns: sample name
Return type: str

get_path()[source]¶

Get path of the fasta file

Returns: fasta path
Return type: str

get_type()[source]¶

Get type: URL or local file

Returns: type
Return type: str

is_example()[source]¶

Return if current sample is an example data

Returns: current sample is an example data
Return type: bool

set_name(name)[source]¶

Set sample name

Parameters: name (str) – new sample name

set_path(path)[source]¶

Set path to the fasta file

Parameters: path (str) – new path

dgenies.lib.functions module¶

class dgenies.lib.functions.Functions[source]¶

Bases: object

General functions

static allowed_file(filename, file_formats=('fasta',))[source]¶

Check whether a file has a valid format

Parameters

filename – file path
file_formats – accepted file formats

Returns

True if valid format, else False

static compress(filename)[source]¶

Compress a file with gzip

Parameters: filename (str) – file to compress
Returns: path of the compressed file
Return type: str

static compress_and_send_mail(job_name, fasta_file, index_file, lock_file, mailer)[source]¶

Compress fasta file and the send mail with its link to the client

Parameters

job_name (str) – job id
fasta_file (str) – fasta file path
index_file (str) – index file path
lock_file (str) – lock file path
mailer (Mailer) – mailer object (to send mail)

config = <dgenies.config_reader.AppConfigReader object>¶

static get_fasta_file(res_dir, type_f, is_sorted)[source]¶

Get fasta file path

Parameters

res_dir (str) – job results directory
type_f (str) – type of file (query or target)
is_sorted (bool) – is fasta sorted

Returns

fasta file path

Return type

str

static get_gallery_items()[source]¶

Get list of items from the gallery

Returns

list of item of the gallery. Each item is a dict with 7 keys:

name : name of the job
id_job : id of the job
picture : illustrating picture filename (located in gallery folder of the data folder)
query : query specie name
target : target specie name
mem_peak : max memory used for the run (human readable)
time_elapsed : time elapsed for the run (human readable)

Return type

list of dict

static get_list_all_jobs(mode='webserver')[source]¶

Get list of all jobs

Parameters: mode (str) – webserver or standalone
Returns: list of all jobs in standalone mode. Empty list in webserver mode
Return type: list

static get_mail_for_job(id_job)[source]¶

Retrieve associated mail for a job

Parameters: id_job (int) – job id
Returns: associated mail address
Return type: str

static get_readable_size(size, nb_after_coma=1, base='B')[source]¶

Get human readable size from a given size in bytes

Parameters

size (int) – size in bytes
nb_after_coma (str) – number of digits after coma
base – base unit of size, must be either “B”, “KiB”, “MiB” or “GiB”

Returns

size, human readable

Return type

str

static get_readable_time(seconds)[source]¶

Get human readable time

Parameters: seconds (int) – time in seconds
Returns: time, human readable
Return type: str

static get_valid_uploaded_filename(filename, folder)[source]¶

Check whether uploaded file already exists. If yes, rename it

Parameters

filename (str) – uploaded file
folder (str) – folder into save the file

Returns

unique filename

Return type

str

static is_in_gallery(id_job, mode='webserver')[source]¶

Check whether a job is in the gallery

Parameters

id_job (str) – job id
mode (str) – webserver or standalone

Returns

True if job is in the gallery, else False

Return type

bool

static query_fasta_file_exists(res_dir)[source]¶

Check if a fasta file exists

Parameters: res_dir (str) – job result directory
Returns: True if file exists and is a regular file, else False
Return type: bool

static random_string(s_len)[source]¶

Generate a random string

Parameters: s_len (int) – length of the string to generate
Returns: the random string
Return type: str

static read_index(index_file)[source]¶

Load index of query or target

Parameters

index_file (str) – index file path

Returns

[0] index (size of each chromosome) {dict}
[1] sample name {str}

Return type

(dict, str)

static send_fasta_ready(mailer, job_name, sample_name, compressed=False, path='fasta-query', status='success', ext='fasta')[source]¶

Send link to fasta file when treatment ended

Parameters

mailer (Mailer) – mailer object
job_name (str) – job id
sample_name (str) – sample name
compressed (bool) – is a compressed fasta file
path (str) – fasta path
status (str) – treatment status
ext (str) – file extension

static sort_fasta(job_name, fasta_file, index_file, lock_file, compress=False, mailer=None, mode='webserver')[source]¶

Sort fasta file according to the sorted index file

Parameters

job_name (str) – job id
fasta_file (str) – fasta file path
index_file (str) – index file path
lock_file (str) – lock file path
compress (bool) – compress result fasta file
mailer (Mailer) – mailer object (to send mail)
mode (str) – webserver or standalone

static uncompress(filename)[source]¶

Uncompress a gzipped file

Parameters: filename (str) – gzipped file
Returns: path of the uncompressed file
Return type: str

dgenies.lib.job_manager module¶

class dgenies.lib.job_manager.JobManager(id_job, email=None, query: Optional[dgenies.lib.fasta.Fasta] = None, target: Optional[dgenies.lib.fasta.Fasta] = None, mailer=None, tool='minimap2', align: Optional[dgenies.lib.fasta.Fasta] = None, backup: Optional[dgenies.lib.fasta.Fasta] = None, options=None)[source]¶

Bases: object

Jobs management

Parameters

id_job (str) – job id
email (str) – email from user
query (Fasta) – query fasta
target (Fasta) – target fasta
mailer (Mailer) – mailer object (to send mail throw flask app)
tool (str) – tool to use for mapping (choice from tools config)
align (Fasta) – alignment file (PAF, MAF, …) as a fasta object
backup (Fasta) – backup TAR file
options (list) – list of str containing options for the chosen tool

check_file(input_type, should_be_local, max_upload_size_readable)[source]¶

Check if file is correct: format, size, valid gzip

Parameters

input_type – query or target
should_be_local – True if job should be treated locally
max_upload_size_readable – max upload size human readable

Returns

(True if correct, True if error set [for fail], True if should be local)

check_job_status_sge()[source]¶

Check status of a SGE job run

Returns: True if the job jas successfully ended, else False

check_job_status_slurm()[source]¶

Check status of a SLURM job run

Returns: True if the job has successfully ended, else False

check_job_success()[source]¶

Check if a job succeed

Returns: status of a job: succeed, no-match or fail
Return type: str

clear()[source]¶: Remove job dir

delete()[source]¶

Remove a job

Returns

[0] Success of the deletion
[1] Error message, if any (else empty string)

Return type

(bool, str)

do_align()[source]¶

Check if we have to make alignment

Returns: True if the job is launched with an alignment file

download_files_with_pending(files_to_download, should_be_local, max_upload_size_readable)[source]¶

Download files from URLs, with pending (according to the max number of concurrent downloads)

Parameters

files_to_download (list of list) – files to download. For each item of the list, it’s a list with 2 elements: first one is the Fasta object, second one the input type (query or target)
should_be_local (bool) – True if the job should be run locally (according to input file sizes), else False
max_upload_size_readable (str) – Human readable max upload size (to show on errors)

static find_error_in_log(log_file)[source]¶

Find error in log (for cluster run)

Parameters: log_file – log file of the job
Returns: error (empty if no error)
Return type: str

get_file_size(filepath: str)[source]¶

Get file size

Parameters: filepath (str) – file path
Returns: file size (bytes)
Return type: int

get_mail_content(status, target_name, query_name=None)[source]¶

Build mail content for status mail

Parameters

status (str) – job status
target_name (str) – name of target
query_name (str) – name of query

Returns

mail content

Return type

str

get_mail_content_html(status, target_name, query_name=None)[source]¶

Build mail content as HTML

Parameters

status (str) – job status
target_name (str) – name of target
query_name (str) – name of query

Returns

mail content (html)

Return type

str

get_mail_subject(status)[source]¶

Build mail subject

Parameters: status (str) – job status
Returns: mail subject
Return type: str

static get_pending_local_number()[source]¶

Get number of of jobs running or waiting for a run

Returns: number of jobs
Return type: int

get_query_split()[source]¶

Get query split fasta file

Returns: split query fasta file
Return type: str

get_status_standalone(with_error=False)[source]¶

Get job status in standalone mode

Parameters: with_error – get also the error
Returns: status (and error, if with_error=True)
Return type: str or tuple (if with_error=True)

getting_files()[source]¶

Get files for the job

Returns

[0] True if getting files succeed, False else
[1] If error happenned, True if error already saved for the job, False else (error will be saved later)
[2] True if no data must be downloaded (will be downloaded with pending if True)

Return type

tuple

static is_gz_file(filepath)[source]¶

Check if a file is gzipped

Parameters: filepath (str) – file to check
Returns: True if gzipped, else False

is_query_filtered()[source]¶

Check if query has been filtered

Returns: True if filtered, else False

is_target_filtered()[source]¶

Check if target has been filtered

Returns: True if filtered, else False
Returns

launch()[source]¶: Launch a job in webserver mode (asynchronously in a new thread)

launch_standalone()[source]¶: Launch a job in standalone mode (asynchronously in a new thread)

launch_to_cluster(step, batch_system_type, command, args, log_out, log_err)[source]¶

Launch a program to the cluster

Parameters

step (str) – step (prepare, start)
batch_system_type (str) – slurm or sge
command (str) – program to launch (without arguments)
args (list) – arguments to use for the program
log_out (str) – log file for stdout
log_err (str) – log file for stderr

Returns

True if succeed, else False

Return type

bool

prepare_data()[source]¶: Launch preparation of data

prepare_data_cluster(batch_system_type)[source]¶

Launch of prepare data on a cluster

Parameters: batch_system_type (str) – slurm or sge
Returns: True if succeed, else False
Return type: bool

prepare_data_in_thread()[source]¶: Prepare data in a new thread

prepare_data_local()[source]¶: Prepare data locally. On standalone mode, launch job after, if success. :return: True if job succeed, else False :rtype: bool

prepare_dotplot_cluster(batch_system_type)[source]¶

Prepare data if alignment already done: just index the fasta (if index not given), then parse the alignment

Parameters: batch_system_type (str) – type of cluster (slurm or sge)

prepare_dotplot_local()[source]¶: Prepare data if alignment already done: just index the fasta (if index not given), then parse the alignment file and sort it.

run_job(batch_system_type)[source]¶

Run of a job (mapping step)

Parameters: batch_system_type (str) – type of cluster (slurm or sge)

run_job_in_thread(batch_system_type='local')[source]¶

Run a job asynchronously into a new thread

Parameters: batch_system_type (str) – slurm or sge

search_error()[source]¶

Search for an error in the log file (for local runs). If no error found, returns a generic error message

Returns: error message to give to the user
Return type: str

send_mail()[source]¶: Send mail

send_mail_post()[source]¶: Send mail using POST url (if there is no access to mailer)

set_inputs_from_res_dir()[source]¶: Sets inputs (query, target, …) from job dir

set_job_status(status, error='')[source]¶

Change status of a job

Parameters

status (str) – new job status
error (str) – error description (if any)

set_status_standalone(status, error='')[source]¶

Change job status in standalone mode

Parameters

status (str) – new status
error (str) – error description (if any)

start_job()[source]¶: Start job: download, check and parse input files

status()[source]¶

Get job status and error. In webserver mode, get also mem peak and time elapsed

Returns: status and other information
Return type: dict

unpack_backup()[source]¶: Untar backup file

update_job_status(status, id_process=None)[source]¶

Update job status

Parameters

status – new status
id_process – system process id

dgenies.lib.latest module¶

class dgenies.lib.latest.Latest[source]¶

Bases: object

Search latest version

load()[source]¶: Load latest version: use cached version (if any) and then sync with Github

update()[source]¶: Get latest version from Github

update_async()[source]¶: Update latest version asynchronously

dgenies.lib.mailer module¶

class dgenies.lib.mailer.Mailer(app)[source]¶

Bases: object

Send mail throw flask app

Parameters: app (Flask) – Flask app object

send_mail(recipients, subject, message, message_html=None)[source]¶

Send mail

Parameters

recipients (list) – list of recipients
subject (str) – mail subject
message (str) – message (text)
message_html (str) – message (html)

dgenies.lib.paf module¶

class dgenies.lib.paf.Paf(paf: str, idx_q: str, idx_t: str, auto_parse: bool = True, mailer=None, id_job=None)[source]¶

Bases: object

Functions applied to PAF files

Parameters

paf (str) – PAF file path
idx_q (str) – query index file path
idx_t (str) – target index file path
auto_parse (bool) – if True, parse PAF file at initialisation
mailer (Mailer) – mailer object, to send mails
id_job (str) – job id

build_list_no_assoc(to)[source]¶

Build list of queries that match with None target, or the opposite

Parameters: to – query or target
Returns: content of the file

build_query_chr_as_reference()[source]¶

Assemble query contigs like reference chromosomes

Returns: path of the fasta file

build_query_on_target_association_file()[source]¶

For each query, get the best matching chromosome and save it to a CSV file. Use the order of queries

Returns: content of the file

build_summary_stats(status_file)[source]¶

Get summary of identity

Returns: table with percents by category

compute_gravity_contigs()[source]¶

Compute gravity for each contig on each chromosome (how many big matches they have). Will be used to find which chromosome has the highest value for each contig

Returns

[0] gravity for each contig and each chromosome:
{contig1: {chr1: value, chr2: value, …}, contig2: …}
[1] For each block save lines inside:
[median_on_query, squared length, median_on_target, x1, x2, y1, y2, length] (x : on target, y: on query)

config = <dgenies.config_reader.AppConfigReader object>¶

get_d3js_data()[source]¶

Build data for D3.js client

Returns

data for d3.js:

y_len: length of query (Bp)
x_len: length of target (Bp)
min_idy: minimum of identity (float)
max_idy: maximum of identity (float)
lines: matches lines, by class of identity (dict)
y_contigs: query contigs definitions (dict)
y_order: query contigs order (list)
x_contigs: target contigs definitions (dict)
x_order: target contigs order (list)
name_y: name of the query (str)
name_x: name of the target (str)
limit_idy: limit for each class of identities (list)

Return type

dict

get_queries_on_target_association()[source]¶

For each target, get the list of queries associated to it

Returns: list of queries associated to each target
Return type: dict

get_query_on_target_association(with_coords=True)[source]¶

For each query, get the best matching chromosome

Returns: query on target association
Return type: dict

get_summary_stats()[source]¶

Load summary statistics from file

Returns: summary object or None if summary not already built
Return type: dict

is_contig_well_oriented(lines, contig, chrom)[source]¶

Returns True if the contig is well oriented. A well oriented contig must have y increased when x increased. We check that only for highest matches (small matches must be ignored)

Parameters

lines (list) – lines inside the contig
contig (str) – query contig name
chrom (str) – target chromosome name

Returns

True if well oriented, False else

Return type

bool

keyerror_message(exception, type_f)[source]¶

Build message if contig not found in query or target

Parameters

exception (KeyError) – exception object
type_f (str) – type of data (query or target)

Returns

error message

Return type

str

limit_idy = [0.25, 0.5, 0.75]¶

max_nb_lines = 100000¶

parse_index(index_o: list, index_c: dict, full_len: int)[source]¶

Parse index and merge too small contigs together

Parameters

index_o (list) – index contigs order
index_c (dict) – index contigs def
full_len (int) – length of the sequence

Returns

(new contigs def, new contigs order)

Return type

(dict, list)

parse_paf(merge_index=True, noise=True)[source]¶

Parse PAF file

Parameters

merge_index (bool) – if True, merge too small contigs in index
noise (bool) – if True, remove noise

static remove_noise(lines, noise_limit)[source]¶

Remove noise from the dot plot

Parameters

lines (dict) – lines of the dot plot, by class
noise_limit (float) – line length limit

Returns

kept lines, by class

Return type

dict

reorient_contigs_in_paf(contigs)[source]¶

Reorient contigs in the PAF file

Parameters: contigs – contigs to be reoriented

reverse_contig(contig_name)[source]¶

Reverse contig

Parameters: contig_name (str) – contig name

save_json(out)[source]¶

Save D3.js data to json

Parameters: out (str) – output file path

set_sorted(is_sorted)[source]¶

Change sorted status

Parameters: is_sorted (bool) – new sorted status

sort()[source]¶: Sort contigs according to reference target and reorient them if needed

dgenies.lib.parsers module¶

Define tools parsers here

Each parser (main function) must have 2 and only 2 arguments: - First argument: input file which is the tool raw output - Second argument: finale PAF file

Returns True if parse succeed, else False

dgenies.lib.parsers.maf(in_maf, out_paf)[source]¶

Maf parser

Parameters

in_maf (str) – input maf file path
out_paf (str) – output paf file path

Returns

True if success, else False

dgenies.lib.parsers.mashmap2paf(in_paf, out_paf)[source]¶

dgenies.lib.upload_file module¶

class dgenies.lib.upload_file.UploadFile(name, type_f=None, size=None, not_allowed_msg='')[source]¶

Bases: object

Manage uploaded files

Parameters

name (str) – File name
type_f (str) – file MIME type
size (int) – file size in bytes
not_allowed_msg (str) – error to add for not allowed file

get_file()[source]¶

Get file object

Returns: file object
Return type: dict

dgenies.lib.validators module¶

Define formats validators here (for alignment files)

Each validator (main function) has a name which is exactly the name of the format in the aln-formats.yaml file. Only 1 argument to this function: - Input file to check

Secondary functions must start with _

Validators for non-mapping files must start with “v_”

Returns True if file is valid, else False

dgenies.lib.validators.maf(in_file)[source]¶

Maf validator

Parameters: in_file (str) – maf file to test
Returns: True if valid, else False
Return type: bool

dgenies.lib.validators.paf(in_file, n_max=None)[source]¶

Paf validator

Parameters

in_file (str) – paf file to test
n_max (int) – number of lines to test (default: None for all)

Returns

True if valid, else False

Return type

bool

dgenies.lib.validators.v_idx(in_file)[source]¶

Index file validator

Parameters: in_file (str) – index file to test
Returns: True if valid, else False
Return type: bool

dgenies.lib package¶

Submodules¶

dgenies.lib.crons module¶

dgenies.lib.decorators module¶

dgenies.lib.drmaasession module¶

dgenies.lib.fasta module¶

dgenies.lib.functions module¶

dgenies.lib.job_manager module¶

dgenies.lib.latest module¶

dgenies.lib.mailer module¶

dgenies.lib.paf module¶

dgenies.lib.parsers module¶

dgenies.lib.upload_file module¶

dgenies.lib.validators module¶

Module contents¶