dgenies.lib package

Submodules

dgenies.lib.crons module

class dgenies.lib.crons.Crons(base_dir, debug)[source]

Bases: object

Manage crontab jobs (webserver mode)

Parameters:
  • base_dir (str) – software base directory path
  • debug (bool) – True to enable debug mode
static _get_python_exec()[source]

Get python executable path

clear(kill_scheduler=True)[source]

Clear all crons

Parameters:kill_scheduler (bool) – if True, kill local scheduler currently running
init_clean_cron()[source]

Initialize clean cron: will clear old jobs. Clean cron is launched at 1h00am each day

init_launch_local_cron()[source]

Try to launch local scheduler (if not already launched)

start_all()[source]

Start all crons

dgenies.lib.decorators module

class dgenies.lib.decorators.Singleton(klass)[source]

Bases: object

Define a singleton (design pattern)

dgenies.lib.drmaasession module

dgenies.lib.fasta module

class dgenies.lib.fasta.Fasta(name, path, type_f, example=False)[source]

Bases: object

Defines a fasta file: name of the sample, path to the fasta file, type of file (URL or local file), …

Parameters:
  • name (str) – sample name
  • path (str) – fasta file path
  • type_f (str) – type of file (local file or URL)
  • example (bool) – is an example job
get_name()[source]

Get sample name

Returns:sample name
Return type:str
get_path()[source]

Get path of the fasta file

Returns:fasta path
Return type:str
get_type()[source]

Get type: URL or local file

Returns:type
Return type:str
is_example()[source]

Return if current sample is an example data

Returns:current sample is an example data
Return type:bool
set_name(name)[source]

Set sample name

Parameters:name (str) – new sample name
set_path(path)[source]

Set path to the fasta file

Parameters:path (str) – new path

dgenies.lib.functions module

class dgenies.lib.functions.Functions[source]

Bases: object

General functions

static _Functions__get_do_sort(fasta, is_sorted)

Check whether query must be sorted (False if already done)

Parameters:
  • fasta (str) – fasta file
  • is_sorted (bool) – True if it’s sorted
Returns:

do sort

Return type:

bool

static _get_jobs_list()[source]

Get list of jobs

Returns:list of valid jobs
Return type:list
static allowed_file(filename, file_formats=('fasta', ))[source]

Check whether a file has a valid format

Parameters:
  • filename – file path
  • file_formats – accepted file formats
Returns:

True if valid format, else False

static compress(filename)[source]

Compress a file with gzip

Parameters:filename (str) – file to compress
Returns:path of the compressed file
Return type:str
static compress_and_send_mail(job_name, fasta_file, index_file, lock_file, mailer)[source]

Compress fasta file and the send mail with its link to the client

Parameters:
  • job_name (str) – job id
  • fasta_file (str) – fasta file path
  • index_file (str) – index file path
  • lock_file (str) – lock file path
  • mailer (Mailer) – mailer object (to send mail)
config = <dgenies.config_reader.AppConfigReader object>
static get_fasta_file(res_dir, type_f, is_sorted)[source]

Get fasta file path

Parameters:
  • res_dir (str) – job results directory
  • type_f (str) – type of file (query or target)
  • is_sorted (bool) – is fasta sorted
Returns:

fasta file path

Return type:

str

Get list of items from the gallery

Returns:list of item of the gallery. Each item is a dict with 7 keys:
  • name : name of the job
  • id_job : id of the job
  • picture : illustrating picture filename (located in gallery folder of the data folder)
  • query : query specie name
  • target : target specie name
  • mem_peak : max memory used for the run (human readable)
  • time_elapsed : time elapsed for the run (human readable)
Return type:list of dict
static get_list_all_jobs(mode='webserver')[source]

Get list of all jobs

Parameters:mode (str) – webserver or standalone
Returns:list of all jobs in standalone mode. Empty list in webserver mode
Return type:list
static get_mail_for_job(id_job)[source]

Retrieve associated mail for a job

Parameters:id_job (int) – job id
Returns:associated mail address
Return type:str
static get_readable_size(size, nb_after_coma=1)[source]

Get human readable size from a given size in bytes

Parameters:
  • size (int) – size in bytes
  • nb_after_coma (int) – number of digits after coma
Returns:

size, human readable

Return type:

str

static get_readable_time(seconds)[source]

Get human readable time

Parameters:seconds (int) – time in seconds
Returns:time, human readable
Return type:str
static get_valid_uploaded_filename(filename, folder)[source]

Check whether uploaded file already exists. If yes, rename it

Parameters:
  • filename (str) – uploaded file
  • folder (str) – folder into save the file
Returns:

unique filename

Return type:

str

Check whether a job is in the gallery

Parameters:
  • id_job (str) – job id
  • mode (str) – webserver or standalone
Returns:

True if job is in the gallery, else False

Return type:

bool

static query_fasta_file_exists(res_dir)[source]

Check if a fasta file exists

Parameters:res_dir (str) – job result directory
Returns:True if file exists and is a regular file, else False
Return type:bool
static random_string(s_len)[source]

Generate a random string

Parameters:s_len (int) – length of the string to generate
Returns:the random string
Return type:str
static read_index(index_file)[source]

Load index of query or target

Parameters:index_file (str) – index file path
Returns:
  • [0] index (size of each chromosome) {dict}
  • [1] sample name {str}
Return type:(dict, str)
static send_fasta_ready(mailer, job_name, sample_name, compressed=False, path='fasta-query', status='success', ext='fasta')[source]

Send link to fasta file when treatment ended

Parameters:
  • mailer (Mailer) – mailer object
  • job_name (str) – job id
  • sample_name (str) – sample name
  • compressed (bool) – is a compressed fasta file
  • path (str) – fasta path
  • status (str) – treatment status
  • ext (str) – file extension
static sort_fasta(job_name, fasta_file, index_file, lock_file, compress=False, mailer=None, mode='webserver')[source]

Sort fasta file according to the sorted index file

Parameters:
  • job_name (str) – job id
  • fasta_file (str) – fasta file path
  • index_file (str) – index file path
  • lock_file (str) – lock file path
  • compress (bool) – compress result fasta file
  • mailer (Mailer) – mailer object (to send mail)
  • mode (str) – webserver or standalone
static uncompress(filename)[source]

Uncompress a gzipped file

Parameters:filename (str) – gzipped file
Returns:path of the uncompressed file
Return type:str

dgenies.lib.job_manager module

class dgenies.lib.job_manager.JobManager(id_job, email=None, query: dgenies.lib.fasta.Fasta = None, target: dgenies.lib.fasta.Fasta = None, mailer=None, tool='minimap2', align: dgenies.lib.fasta.Fasta = None, backup: dgenies.lib.fasta.Fasta = None)[source]

Bases: object

Jobs management

Parameters:
  • id_job (str) – job id
  • email (str) – email from user
  • query (Fasta) – query fasta
  • target (Fasta) – target fasta
  • mailer (Mailer) – mailer object (to send mail throw flask app)
  • tool (str) – tool to use for mapping (choice from tools config)
  • align (Fasta) – alignment file (PAF, MAF, …) as a fasta object
  • backup (Fasta) – backup TAR file
_after_start(success, error_set)[source]

Tasks done after input files downloaded, checked and parsed: if success, set job status to “waiting”. Else, set job error and send mail.

Parameters:
  • success (bool) – job success
  • error_set (bool) – error already set (else, set it now)
_check_url(fasta, formats)[source]

Check if an URL is valid, and if the file is valid too

Parameters:
  • fasta (Fasta) – fasta file object
  • formats (tuple) – allowed file formats
Returns:

True if URL and file are valid, else False

Return type:

bool

_download_file(url)[source]

Download a file from an URL

Parameters:url (str) – url of the file to download
Returns:absolute path of the downloaded file
Return type:str
_end_of_prepare_dotplot()[source]

Tasks done after preparing dot plot data: parse & sort of alignment file

_get_filename_from_url(url)[source]

Retrieve filename from an URL (http or ftp)

Parameters:url (str) – url of the file to download
Returns:filename
Return type:str
_getting_file_from_url(fasta, type_f)[source]

Download file from URL

Parameters:
  • fasta (Fasta) – Fasta object describing the input file
  • type_f (str) – type of the file (query or target)
Returns:

  • [0] True if no error happened, else False
  • [1] If an error happened, True if the error was saved for the job, else False (will be saved later)
  • [2] Finale path of the downloaded file {str}
  • [3] Name of the downloaded file {str}

Return type:

tuple

_getting_local_file(fasta, type_f)[source]

Copy temp file to its final location

Parameters:
  • fasta (Fasta) – fasta file Object
  • type_f (str) – query or target
Returns:

final full path of the file

Return type:

str

_launch_drmaa(batch_system_type)[source]

Launch the mapping step to a cluster

Parameters:batch_system_type – slurm or sge
Returns:True if job succeed, else False
_launch_local()[source]

Launch a job on the current machine

Returns:True if job succeed, else False
Return type:bool
_save_analytics_data()[source]

Save analytics data into the database

_set_analytics_job_status(status)[source]

Change status for a job in analytics database

Parameters:status (str (20)) – new status
check_file(input_type, should_be_local, max_upload_size_readable)[source]

Check if file is correct: format, size, valid gzip

Parameters:
  • input_type – query or target
  • should_be_local – True if job should be treated locally
  • max_upload_size_readable – max upload size human readable
Returns:

(True if correct, True if error set [for fail], True if should be local)

check_job_status_sge()[source]

Check status of a SGE job run

Returns:True if the job jas successfully ended, else False
check_job_status_slurm()[source]

Check status of a SLURM job run

Returns:True if the job has successfully ended, else False
check_job_success()[source]

Check if a job succeed

Returns:status of a job: succeed, no-match or fail
Return type:str
clear()[source]

Remove job dir

delete()[source]

Remove a job

Returns:
  • [0] Success of the deletion
  • [1] Error message, if any (else empty string)
Return type:(bool, str)
do_align()[source]

Check if we have to make alignment

Returns:True if the job is launched with an alignment file
download_files_with_pending(files_to_download, should_be_local, max_upload_size_readable)[source]

Download files from URLs, with pending (according to the max number of concurrent downloads)

Parameters:
  • files_to_download (list of list) – files to download. For each item of the list, it’s a list with 2 elements: first one is the Fasta object, second one the input type (query or target)
  • should_be_local (bool) – True if the job should be run locally (according to input file sizes), else False
  • max_upload_size_readable (str) – Human readable max upload size (to show on errors)
static find_error_in_log(log_file)[source]

Find error in log (for cluster run)

Parameters:log_file – log file of the job
Returns:error (empty if no error)
Return type:str
get_file_size(filepath: str)[source]

Get file size

Parameters:filepath (str) – file path
Returns:file size (bytes)
Return type:int
get_mail_content(status, target_name, query_name=None)[source]

Build mail content for status mail

Parameters:
  • status (str) – job status
  • target_name (str) – name of target
  • query_name (str) – name of query
Returns:

mail content

Return type:

str

get_mail_content_html(status, target_name, query_name=None)[source]

Build mail content as HTML

Parameters:
  • status (str) – job status
  • target_name (str) – name of target
  • query_name (str) – name of query
Returns:

mail content (html)

Return type:

str

get_mail_subject(status)[source]

Build mail subject

Parameters:status (str) – job status
Returns:mail subject
Return type:str
static get_pending_local_number()[source]

Get number of of jobs running or waiting for a run

Returns:number of jobs
Return type:int
get_query_split()[source]

Get query split fasta file

Returns:split query fasta file
Return type:str
get_status_standalone(with_error=False)[source]

Get job status in standalone mode

Parameters:with_error – get also the error
Returns:status (and error, if with_error=True)
Return type:str or tuple (if with_error=True)
getting_files()[source]

Get files for the job

Returns:
  • [0] True if getting files succeed, False else
  • [1] If error happenned, True if error already saved for the job, False else (error will be saved later)
  • [2] True if no data must be downloaded (will be downloaded with pending if True)
Return type:tuple
static is_gz_file(filepath)[source]

Check if a file is gzipped

Parameters:filepath (str) – file to check
Returns:True if gzipped, else False
is_query_filtered()[source]

Check if query has been filtered

Returns:True if filtered, else False
is_target_filtered()[source]

Check if target has been filtered

Returns:True if filtered, else False
Returns:
launch()[source]

Launch a job in webserver mode (asynchronously in a new thread)

launch_standalone()[source]

Launch a job in standalone mode (asynchronously in a new thread)

launch_to_cluster(step, batch_system_type, command, args, log_out, log_err)[source]

Launch a program to the cluster

Parameters:
  • step (str) – step (prepare, start)
  • batch_system_type (str) – slurm or sge
  • command (str) – program to launch (without arguments)
  • args (list) – arguments to use for the program
  • log_out (str) – log file for stdout
  • log_err (str) – log file for stderr
Returns:

True if succeed, else False

Return type:

bool

prepare_data()[source]

Launch preparation of data

prepare_data_cluster(batch_system_type)[source]

Launch of prepare data on a cluster

Parameters:batch_system_type (str) – slurm or sge
Returns:True if succeed, else False
Return type:bool
prepare_data_in_thread()[source]

Prepare data in a new thread

prepare_data_local()[source]

Prepare data locally. On standalone mode, launch job after, if success. :return: True if job succeed, else False :rtype: bool

prepare_dotplot_cluster(batch_system_type)[source]

Prepare data if alignment already done: just index the fasta (if index not given), then parse the alignment

Parameters:batch_system_type (str) – type of cluster (slurm or sge)
prepare_dotplot_local()[source]

Prepare data if alignment already done: just index the fasta (if index not given), then parse the alignment file and sort it.

run_job(batch_system_type)[source]

Run of a job (mapping step)

Parameters:batch_system_type (str) – type of cluster (slurm or sge)
run_job_in_thread(batch_system_type='local')[source]

Run a job asynchronously into a new thread

Parameters:batch_system_type (str) – slurm or sge
search_error()[source]

Search for an error in the log file (for local runs). If no error found, returns a generic error message

Returns:error message to give to the user
Return type:str
send_mail()[source]

Send mail

send_mail_post()[source]

Send mail using POST url (if there is no access to mailer)

set_inputs_from_res_dir()[source]

Sets inputs (query, target, …) from job dir

set_job_status(status, error='')[source]

Change status of a job

Parameters:
  • status (str) – new job status
  • error (str) – error description (if any)
set_status_standalone(status, error='')[source]

Change job status in standalone mode

Parameters:
  • status (str) – new status
  • error (str) – error description (if any)
start_job()[source]

Start job: download, check and parse input files

status()[source]

Get job status and error. In webserver mode, get also mem peak and time elapsed

Returns:status and other informations
Return type:dict
unpack_backup()[source]

Untar backup file

update_job_status(status, id_process=None)[source]

Update job status

Parameters:
  • status – new status
  • id_process – system process id

dgenies.lib.latest module

class dgenies.lib.latest.Latest[source]

Bases: object

Search latest version

_write_update()[source]

Save latest version to a file

load()[source]

Load latest version: use cached version (if any) and then sync with Github

update()[source]

Get latest version from Github

update_async()[source]

Update latest version asynchronously

dgenies.lib.mailer module

class dgenies.lib.mailer.Mailer(app)[source]

Bases: object

Send mail throw flask app

Parameters:app (Flask) – Flask app object
_send_async_email(msg)[source]

Send mail asynchronously

Parameters:msg (Message) – message to send
send_mail(recipients, subject, message, message_html=None)[source]

Send mail

Parameters:
  • recipients (list) – list of recipients
  • subject (str) – mail subject
  • message (str) – message (text)
  • message_html (str) – message (html)

dgenies.lib.paf module

class dgenies.lib.paf.Paf(paf: str, idx_q: str, idx_t: str, auto_parse: bool = True, mailer=None, id_job=None)[source]

Bases: object

Functions applied to PAF files

Parameters:
  • paf (str) – PAF file path
  • idx_q (str) – query index file path
  • idx_t (str) – target index file path
  • auto_parse (bool) – if True, parse PAF file at initialisation
  • mailer (Mailer) – mailer object, to send mails
  • id_job (str) – job id
_add_percents(percents, item)[source]

Update percents with interval

Parameters:
  • percents (dict) – initial percents
  • item (Interval) – interval from IntervalTree
Returns:

new percents

Return type:

dict

static _flush_blocks(index_c, new_index_c, new_index_o, current_block)[source]

When parsing index, build a mix of too small sequential contigs (if their number exceed 5), else just add co to the new index

Parameters:
  • index_c (dict) – current index contigs def
  • new_index_o (list) – new index contigs order
  • new_index_c (dict) – new index contigs def
  • current_block (list) – contigs in the current analyzed block
Returns:

(new index contigs defs, new index contigs order)

Return type:

(dict, list)

_remove_overlaps(position_idy, percents)[source]

Remove overlaps between matches on the diagonal

Parameters:
  • position_idy (IntervalTree) – matches intervals with associated identity category
  • percents (dict) – Percent of matches for each identity category
Returns:

new percents (updated after overlap removing)

Return type:

dict

_update_query_index(contigs_reoriented)[source]

Write new query index file (including new reoriented contigs info)

Parameters:contigs_reoriented (list) – reoriented contigs list
build_list_no_assoc(to)[source]

Build list of queries that match with None target, or the opposite

Parameters:to – query or target
Returns:content of the file
build_query_chr_as_reference()[source]

Assemble query contigs like reference chromosomes

Returns:path of the fasta file
build_query_on_target_association_file()[source]

For each query, get the best matching chromosome and save it to a CSV file. Use the order of queries

Returns:content of the file
build_summary_stats(status_file)[source]

Get summary of identity

Returns:table with percents by category
compute_gravity_contigs()[source]

Compute gravity for each contig on each chromosome (how many big matches they have). Will be used to find which chromosome has the highest value for each contig

Returns:
  • [0] gravity for each contig and each chromosome:
    {contig1: {chr1: value, chr2: value, …}, contig2: …}
  • [1] For each block save lines inside:
    [median_on_query, squared length, median_on_target, x1, x2, y1, y2, length] (x : on target, y: on query)
get_d3js_data()[source]

Build data for D3.js client

Returns:data for d3.js:
  • y_len: length of query (Bp)
  • x_len: length of target (Bp)
  • min_idy: minimum of identity (float)
  • max_idy: maximum of identity (float)
  • lines: matches lines, by class of identity (dict)
  • y_contigs: query contigs definitions (dict)
  • y_order: query contigs order (list)
  • x_contigs: target contigs definitions (dict)
  • x_order: target contigs order (list)
  • name_y: name of the query (str)
  • name_x: name of the target (str)
  • limit_idy: limit for each class of identities (list)
Return type:dict
get_queries_on_target_association()[source]

For each target, get the list of queries associated to it

Returns:list of queries associated to each target
Return type:dict
get_query_on_target_association(with_coords=True)[source]

For each query, get the best matching chromosome

Returns:query on target association
Return type:dict
get_summary_stats()[source]

Load summary statistics from file

Returns:summary object or None if summary not already built
Return type:dict
is_contig_well_oriented(lines, contig, chrom)[source]

Returns True if the contig is well oriented. A well oriented contig must have y increased when x increased. We check that only for highest matches (small matches must be ignored)

Parameters:
  • lines (list) – lines inside the contig
  • contig (str) – query contig name
  • chrom (str) – target chromosome name
Returns:

True if well oriented, False else

Return type:

bool

keyerror_message(exception, type_f)[source]

Build message if contig not found in query or target

Parameters:
  • exception (KeyError) – exception object
  • type_f (str) – type of data (query or target)
Returns:

error message

Return type:

str

limit_idy = [0.25, 0.5, 0.75]
max_nb_lines = 100000
parse_index(index_o: list, index_c: dict, full_len: int)[source]

Parse index and merge too small contigs together

Parameters:
  • index_o (list) – index contigs order
  • index_c (dict) – index contigs def
  • full_len (int) – length of the sequence
Returns:

(new contigs def, new contigs order)

Return type:

(dict, list)

parse_paf(merge_index=True, noise=True)[source]

Parse PAF file

Parameters:
  • merge_index (bool) – if True, merge too small contigs in index
  • noise (bool) – if True, remove noise
static remove_noise(lines, noise_limit)[source]

Remove noise from the dot plot

Parameters:
  • lines (dict) – lines of the dot plot, by class
  • noise_limit (float) – line length limit
Returns:

kept lines, by class

Return type:

dict

reorient_contigs_in_paf(contigs)[source]

Reorient contigs in the PAF file

Parameters:contigs – contigs to be reoriented
reverse_contig(contig_name)[source]

Reverse contig

Parameters:contig_name (str) – contig name
save_json(out)[source]

Save D3.js data to json

Parameters:out (str) – output file path
set_sorted(is_sorted)[source]

Change sorted status

Parameters:is_sorted (bool) – new sorted status
sort()[source]

Sort contigs according to reference target and reorient them if needed

dgenies.lib.parsers module

Define tools parsers here

Each parser (main function) must have 2 and only 2 arguments: - First argument: input file which is the tool raw output - Second argument: finale PAF file

Returns True if parse succeed, else False

dgenies.lib.parsers.maf(in_maf, out_paf)[source]

Maf parser

Parameters:
  • in_maf (str) – input maf file path
  • out_paf (str) – output paf file path
Returns:

True if success, else False

dgenies.lib.parsers.mashmap2paf(in_paf, out_paf)[source]

dgenies.lib.upload_file module

class dgenies.lib.upload_file.UploadFile(name, type_f=None, size=None, not_allowed_msg='')[source]

Bases: object

Manage uploaded files

Parameters:
  • name (str) – File name
  • type_f (str) – file MIME type
  • size (int) – file size in bytes
  • not_allowed_msg (str) – error to add for not allowed file
get_file()[source]

Get file object

Returns:file object
Return type:dict

dgenies.lib.validators module

Define formats validators here (for alignment files)

Each validator (main function) has a name which is exactly the name of the format in the aln-formats.yaml file. Only 1 argument to this function: - Input file to check

Secondary functions must start with _

Returns True if file is valid, else False

dgenies.lib.validators._filter_maf(in_file)[source]

Filter Maf file (remove unused lines)

Parameters:in_file – maf file to filter
dgenies.lib.validators.idx(in_file)[source]

Index file validator

Parameters:in_file (str) – index file to test
Returns:True if valid, else False
Return type:bool
dgenies.lib.validators.maf(in_file)[source]

Maf validator

Parameters:in_file (str) – maf file to test
Returns:True if valid, else False
Return type:bool
dgenies.lib.validators.paf(in_file)[source]

Paf validator

Parameters:in_file (str) – paf file to test
Returns:True if valid, else False
Return type:bool

Module contents