dgenies.lib package¶

Submodules¶

dgenies.lib.crons module¶

class dgenies.lib.crons.Crons(base_dir, debug)[source]¶

Bases: object

Manage crontab jobs (webserver mode)

Parameters:	base_dir (str) – software base directory path debug (bool) – True to enable debug mode

static _get_python_exec()[source]¶: Get python executable path

clear(kill_scheduler=True)[source]¶

Clear all crons

Parameters:	kill_scheduler (bool) – if True, kill local scheduler currently running

init_clean_cron()[source]¶: Initialize clean cron: will clear old jobs. Clean cron is launched at 1h00am each day

init_launch_local_cron()[source]¶: Try to launch local scheduler (if not already launched)

start_all()[source]¶: Start all crons

dgenies.lib.decorators module¶

class dgenies.lib.decorators.Singleton(klass)[source]¶

Bases: object

Define a singleton (design pattern)

dgenies.lib.drmaasession module¶

dgenies.lib.fasta module¶

class dgenies.lib.fasta.Fasta(name, path, type_f, example=False)[source]¶

Bases: object

Defines a fasta file: name of the sample, path to the fasta file, type of file (URL or local file), …

Parameters:	name (str) – sample name path (str) – fasta file path type_f (str) – type of file (local file or URL) example (bool) – is an example job

get_name()[source]¶

Get sample name

Returns:	sample name
Return type:	str

get_path()[source]¶

Get path of the fasta file

Returns:	fasta path
Return type:	str

get_type()[source]¶

Get type: URL or local file

Returns:	type
Return type:	str

is_example()[source]¶

Return if current sample is an example data

Returns:	current sample is an example data
Return type:	bool

set_name(name)[source]¶

Set sample name

Parameters:	name (str) – new sample name

set_path(path)[source]¶

Set path to the fasta file

Parameters:	path (str) – new path

dgenies.lib.functions module¶

class dgenies.lib.functions.Functions[source]¶

Bases: object

General functions

static _Functions__get_do_sort(fasta, is_sorted)¶

Check whether query must be sorted (False if already done)

Parameters:	fasta (str) – fasta file is_sorted (bool) – True if it’s sorted
Returns:	do sort
Return type:	bool

static _get_jobs_list()[source]¶

Get list of jobs

Returns:	list of valid jobs
Return type:	list

static allowed_file(filename, file_formats=('fasta', ))[source]¶

Check whether a file has a valid format

Parameters:	filename – file path file_formats – accepted file formats
Returns:	True if valid format, else False

static compress(filename)[source]¶

Compress a file with gzip

Parameters:	filename (str) – file to compress
Returns:	path of the compressed file
Return type:	str

static compress_and_send_mail(job_name, fasta_file, index_file, lock_file, mailer)[source]¶

Compress fasta file and the send mail with its link to the client

Parameters:	job_name (str) – job id fasta_file (str) – fasta file path index_file (str) – index file path lock_file (str) – lock file path mailer (Mailer) – mailer object (to send mail)

config = <dgenies.config_reader.AppConfigReader object>¶

static get_fasta_file(res_dir, type_f, is_sorted)[source]¶

Get fasta file path

Parameters:	res_dir (str) – job results directory type_f (str) – type of file (query or target) is_sorted (bool) – is fasta sorted
Returns:	fasta file path
Return type:	str

static get_gallery_items()[source]¶

Get list of items from the gallery

Returns:	list of item of the gallery. Each item is a dict with 7 keys: name : name of the job id_job : id of the job picture : illustrating picture filename (located in gallery folder of the data folder) query : query specie name target : target specie name mem_peak : max memory used for the run (human readable) time_elapsed : time elapsed for the run (human readable)
Return type:	list of dict

static get_list_all_jobs(mode='webserver')[source]¶

Get list of all jobs

Parameters:	mode (str) – webserver or standalone
Returns:	list of all jobs in standalone mode. Empty list in webserver mode
Return type:	list

static get_mail_for_job(id_job)[source]¶

Retrieve associated mail for a job

Parameters:	id_job (int) – job id
Returns:	associated mail address
Return type:	str

static get_readable_size(size, nb_after_coma=1)[source]¶

Get human readable size from a given size in bytes

Parameters:	size (int) – size in bytes nb_after_coma (int) – number of digits after coma
Returns:	size, human readable
Return type:	str

static get_readable_time(seconds)[source]¶

Get human readable time

Parameters:	seconds (int) – time in seconds
Returns:	time, human readable
Return type:	str

static get_valid_uploaded_filename(filename, folder)[source]¶

Check whether uploaded file already exists. If yes, rename it

Parameters:	filename (str) – uploaded file folder (str) – folder into save the file
Returns:	unique filename
Return type:	str

static is_in_gallery(id_job, mode='webserver')[source]¶

Check whether a job is in the gallery

Parameters:	id_job (str) – job id mode (str) – webserver or standalone
Returns:	True if job is in the gallery, else False
Return type:	bool

static query_fasta_file_exists(res_dir)[source]¶

Check if a fasta file exists

Parameters:	res_dir (str) – job result directory
Returns:	True if file exists and is a regular file, else False
Return type:	bool

static random_string(s_len)[source]¶

Generate a random string

Parameters:	s_len (int) – length of the string to generate
Returns:	the random string
Return type:	str

static read_index(index_file)[source]¶

Load index of query or target

Parameters:	index_file (str) – index file path
Returns:	[0] index (size of each chromosome) {dict} [1] sample name {str}
Return type:	(dict, str)

static send_fasta_ready(mailer, job_name, sample_name, compressed=False, path='fasta-query', status='success', ext='fasta')[source]¶

Send link to fasta file when treatment ended

Parameters:	mailer (Mailer) – mailer object job_name (str) – job id sample_name (str) – sample name compressed (bool) – is a compressed fasta file path (str) – fasta path status (str) – treatment status ext (str) – file extension

static sort_fasta(job_name, fasta_file, index_file, lock_file, compress=False, mailer=None, mode='webserver')[source]¶

Sort fasta file according to the sorted index file

Parameters:	job_name (str) – job id fasta_file (str) – fasta file path index_file (str) – index file path lock_file (str) – lock file path compress (bool) – compress result fasta file mailer (Mailer) – mailer object (to send mail) mode (str) – webserver or standalone

static uncompress(filename)[source]¶

Uncompress a gzipped file

Parameters:	filename (str) – gzipped file
Returns:	path of the uncompressed file
Return type:	str

dgenies.lib.job_manager module¶

class dgenies.lib.job_manager.JobManager(id_job, email=None, query: dgenies.lib.fasta.Fasta = None, target: dgenies.lib.fasta.Fasta = None, mailer=None, tool='minimap2', align: dgenies.lib.fasta.Fasta = None, backup: dgenies.lib.fasta.Fasta = None)[source]¶

Bases: object

Jobs management

Parameters:

id_job (str) – job id
email (str) – email from user
query (Fasta) – query fasta
target (Fasta) – target fasta
mailer (Mailer) – mailer object (to send mail throw flask app)
tool (str) – tool to use for mapping (choice from tools config)
align (Fasta) – alignment file (PAF, MAF, …) as a fasta object
backup (Fasta) – backup TAR file

_after_start(success, error_set)[source]¶

Tasks done after input files downloaded, checked and parsed: if success, set job status to “waiting”. Else, set job error and send mail.

Parameters:	success (bool) – job success error_set (bool) – error already set (else, set it now)

_check_url(fasta, formats)[source]¶

Check if an URL is valid, and if the file is valid too

Parameters:	fasta (Fasta) – fasta file object formats (tuple) – allowed file formats
Returns:	True if URL and file are valid, else False
Return type:	bool

_download_file(url)[source]¶

Download a file from an URL

Parameters:	url (str) – url of the file to download
Returns:	absolute path of the downloaded file
Return type:	str

_end_of_prepare_dotplot()[source]¶: Tasks done after preparing dot plot data: parse & sort of alignment file

_get_filename_from_url(url)[source]¶

Retrieve filename from an URL (http or ftp)

Parameters:	url (str) – url of the file to download
Returns:	filename
Return type:	str

_getting_file_from_url(fasta, type_f)[source]¶

Download file from URL

Parameters:

fasta (Fasta) – Fasta object describing the input file
type_f (str) – type of the file (query or target)

Returns:

[0] True if no error happened, else False
[1] If an error happened, True if the error was saved for the job, else False (will be saved later)
[2] Finale path of the downloaded file {str}
[3] Name of the downloaded file {str}

Return type:

tuple

_getting_local_file(fasta, type_f)[source]¶

Copy temp file to its final location

Parameters:	fasta (Fasta) – fasta file Object type_f (str) – query or target
Returns:	final full path of the file
Return type:	str

_launch_drmaa(batch_system_type)[source]¶

Launch the mapping step to a cluster

Parameters:	batch_system_type – slurm or sge
Returns:	True if job succeed, else False

_launch_local()[source]¶

Launch a job on the current machine

Returns:	True if job succeed, else False
Return type:	bool

_save_analytics_data()[source]¶: Save analytics data into the database

_set_analytics_job_status(status)[source]¶

Change status for a job in analytics database

Parameters:	status (str (20)) – new status

check_file(input_type, should_be_local, max_upload_size_readable)[source]¶

Check if file is correct: format, size, valid gzip

Parameters:	input_type – query or target should_be_local – True if job should be treated locally max_upload_size_readable – max upload size human readable
Returns:	(True if correct, True if error set [for fail], True if should be local)

check_job_status_sge()[source]¶

Check status of a SGE job run

Returns:	True if the job jas successfully ended, else False

check_job_status_slurm()[source]¶

Check status of a SLURM job run

Returns:	True if the job has successfully ended, else False

check_job_success()[source]¶

Check if a job succeed

Returns:	status of a job: succeed, no-match or fail
Return type:	str

clear()[source]¶: Remove job dir

delete()[source]¶

Remove a job

Returns:	[0] Success of the deletion [1] Error message, if any (else empty string)
Return type:	(bool, str)

do_align()[source]¶

Check if we have to make alignment

Returns:	True if the job is launched with an alignment file

download_files_with_pending(files_to_download, should_be_local, max_upload_size_readable)[source]¶

Download files from URLs, with pending (according to the max number of concurrent downloads)

Parameters:

files_to_download (list of list) – files to download. For each item of the list, it’s a list with 2 elements: first one is the Fasta object, second one the input type (query or target)
should_be_local (bool) – True if the job should be run locally (according to input file sizes), else False
max_upload_size_readable (str) – Human readable max upload size (to show on errors)

static find_error_in_log(log_file)[source]¶

Find error in log (for cluster run)

Parameters:	log_file – log file of the job
Returns:	error (empty if no error)
Return type:	str

get_file_size(filepath: str)[source]¶

Get file size

Parameters:	filepath (str) – file path
Returns:	file size (bytes)
Return type:	int

get_mail_content(status, target_name, query_name=None)[source]¶

Build mail content for status mail

Parameters:	status (str) – job status target_name (str) – name of target query_name (str) – name of query
Returns:	mail content
Return type:	str

get_mail_content_html(status, target_name, query_name=None)[source]¶

Build mail content as HTML

Parameters:	status (str) – job status target_name (str) – name of target query_name (str) – name of query
Returns:	mail content (html)
Return type:	str

get_mail_subject(status)[source]¶

Build mail subject

Parameters:	status (str) – job status
Returns:	mail subject
Return type:	str

static get_pending_local_number()[source]¶

Get number of of jobs running or waiting for a run

Returns:	number of jobs
Return type:	int

get_query_split()[source]¶

Get query split fasta file

Returns:	split query fasta file
Return type:	str

get_status_standalone(with_error=False)[source]¶

Get job status in standalone mode

Parameters:	with_error – get also the error
Returns:	status (and error, if with_error=True)
Return type:	str or tuple (if with_error=True)

getting_files()[source]¶

Get files for the job

Returns:	[0] True if getting files succeed, False else [1] If error happenned, True if error already saved for the job, False else (error will be saved later) [2] True if no data must be downloaded (will be downloaded with pending if True)
Return type:	tuple

static is_gz_file(filepath)[source]¶

Check if a file is gzipped

Parameters:	filepath (str) – file to check
Returns:	True if gzipped, else False

is_query_filtered()[source]¶

Check if query has been filtered

Returns:	True if filtered, else False

is_target_filtered()[source]¶

Check if target has been filtered

Returns:	True if filtered, else False
Returns:

launch()[source]¶: Launch a job in webserver mode (asynchronously in a new thread)

launch_standalone()[source]¶: Launch a job in standalone mode (asynchronously in a new thread)

launch_to_cluster(step, batch_system_type, command, args, log_out, log_err)[source]¶

Launch a program to the cluster

Parameters:	step (str) – step (prepare, start) batch_system_type (str) – slurm or sge command (str) – program to launch (without arguments) args (list) – arguments to use for the program log_out (str) – log file for stdout log_err (str) – log file for stderr
Returns:	True if succeed, else False
Return type:	bool

prepare_data()[source]¶: Launch preparation of data

prepare_data_cluster(batch_system_type)[source]¶

Launch of prepare data on a cluster

Parameters:	batch_system_type (str) – slurm or sge
Returns:	True if succeed, else False
Return type:	bool

prepare_data_in_thread()[source]¶: Prepare data in a new thread

prepare_data_local()[source]¶: Prepare data locally. On standalone mode, launch job after, if success. :return: True if job succeed, else False :rtype: bool

prepare_dotplot_cluster(batch_system_type)[source]¶

Prepare data if alignment already done: just index the fasta (if index not given), then parse the alignment

Parameters:	batch_system_type (str) – type of cluster (slurm or sge)

prepare_dotplot_local()[source]¶: Prepare data if alignment already done: just index the fasta (if index not given), then parse the alignment file and sort it.

run_job(batch_system_type)[source]¶

Run of a job (mapping step)

Parameters:	batch_system_type (str) – type of cluster (slurm or sge)

run_job_in_thread(batch_system_type='local')[source]¶

Run a job asynchronously into a new thread

Parameters:	batch_system_type (str) – slurm or sge

search_error()[source]¶

Search for an error in the log file (for local runs). If no error found, returns a generic error message

Returns:	error message to give to the user
Return type:	str

send_mail()[source]¶: Send mail

send_mail_post()[source]¶: Send mail using POST url (if there is no access to mailer)

set_inputs_from_res_dir()[source]¶: Sets inputs (query, target, …) from job dir

set_job_status(status, error='')[source]¶

Change status of a job

Parameters:	status (str) – new job status error (str) – error description (if any)

set_status_standalone(status, error='')[source]¶

Change job status in standalone mode

Parameters:	status (str) – new status error (str) – error description (if any)

start_job()[source]¶: Start job: download, check and parse input files

status()[source]¶

Get job status and error. In webserver mode, get also mem peak and time elapsed

Returns:	status and other informations
Return type:	dict

unpack_backup()[source]¶: Untar backup file

update_job_status(status, id_process=None)[source]¶

Update job status

Parameters:	status – new status id_process – system process id

dgenies.lib.latest module¶

class dgenies.lib.latest.Latest[source]¶

Bases: object

Search latest version

_write_update()[source]¶: Save latest version to a file

load()[source]¶: Load latest version: use cached version (if any) and then sync with Github

update()[source]¶: Get latest version from Github

update_async()[source]¶: Update latest version asynchronously

dgenies.lib.mailer module¶

class dgenies.lib.mailer.Mailer(app)[source]¶

Bases: object

Send mail throw flask app

Parameters:	app (Flask) – Flask app object

_send_async_email(msg)[source]¶

Send mail asynchronously

Parameters:	msg (Message) – message to send

send_mail(recipients, subject, message, message_html=None)[source]¶

Send mail

Parameters:	recipients (list) – list of recipients subject (str) – mail subject message (str) – message (text) message_html (str) – message (html)

dgenies.lib.paf module¶

class dgenies.lib.paf.Paf(paf: str, idx_q: str, idx_t: str, auto_parse: bool = True, mailer=None, id_job=None)[source]¶

Bases: object

Functions applied to PAF files

Parameters:	paf (str) – PAF file path idx_q (str) – query index file path idx_t (str) – target index file path auto_parse (bool) – if True, parse PAF file at initialisation mailer (Mailer) – mailer object, to send mails id_job (str) – job id

_add_percents(percents, item)[source]¶

Update percents with interval

Parameters:	percents (dict) – initial percents item (Interval) – interval from IntervalTree
Returns:	new percents
Return type:	dict

static _flush_blocks(index_c, new_index_c, new_index_o, current_block)[source]¶

When parsing index, build a mix of too small sequential contigs (if their number exceed 5), else just add co to the new index

Parameters:	index_c (dict) – current index contigs def new_index_o (list) – new index contigs order new_index_c (dict) – new index contigs def current_block (list) – contigs in the current analyzed block
Returns:	(new index contigs defs, new index contigs order)
Return type:	(dict, list)

_remove_overlaps(position_idy, percents)[source]¶

Remove overlaps between matches on the diagonal

Parameters:	position_idy (IntervalTree) – matches intervals with associated identity category percents (dict) – Percent of matches for each identity category
Returns:	new percents (updated after overlap removing)
Return type:	dict

_update_query_index(contigs_reoriented)[source]¶

Write new query index file (including new reoriented contigs info)

Parameters:	contigs_reoriented (list) – reoriented contigs list

build_list_no_assoc(to)[source]¶

Build list of queries that match with None target, or the opposite

Parameters:	to – query or target
Returns:	content of the file

build_query_chr_as_reference()[source]¶

Assemble query contigs like reference chromosomes

Returns:	path of the fasta file

build_query_on_target_association_file()[source]¶

For each query, get the best matching chromosome and save it to a CSV file. Use the order of queries

Returns:	content of the file

build_summary_stats(status_file)[source]¶

Get summary of identity

Returns:	table with percents by category

compute_gravity_contigs()[source]¶

Compute gravity for each contig on each chromosome (how many big matches they have). Will be used to find which chromosome has the highest value for each contig

Returns:	[0] gravity for each contig and each chromosome: {contig1: {chr1: value, chr2: value, …}, contig2: …} [1] For each block save lines inside: [median_on_query, squared length, median_on_target, x1, x2, y1, y2, length] (x : on target, y: on query)

get_d3js_data()[source]¶

Build data for D3.js client

Returns:

data for d3.js:

y_len: length of query (Bp)
x_len: length of target (Bp)
min_idy: minimum of identity (float)
max_idy: maximum of identity (float)
lines: matches lines, by class of identity (dict)
y_contigs: query contigs definitions (dict)
y_order: query contigs order (list)
x_contigs: target contigs definitions (dict)
x_order: target contigs order (list)
name_y: name of the query (str)
name_x: name of the target (str)
limit_idy: limit for each class of identities (list)

Return type: dict

get_queries_on_target_association()[source]¶

For each target, get the list of queries associated to it

Returns:	list of queries associated to each target
Return type:	dict

get_query_on_target_association(with_coords=True)[source]¶

For each query, get the best matching chromosome

Returns:	query on target association
Return type:	dict

get_summary_stats()[source]¶

Load summary statistics from file

Returns:	summary object or None if summary not already built
Return type:	dict

is_contig_well_oriented(lines, contig, chrom)[source]¶

Returns True if the contig is well oriented. A well oriented contig must have y increased when x increased. We check that only for highest matches (small matches must be ignored)

Parameters:	lines (list) – lines inside the contig contig (str) – query contig name chrom (str) – target chromosome name
Returns:	True if well oriented, False else
Return type:	bool

keyerror_message(exception, type_f)[source]¶

Build message if contig not found in query or target

Parameters:	exception (KeyError) – exception object type_f (str) – type of data (query or target)
Returns:	error message
Return type:	str

limit_idy = [0.25, 0.5, 0.75]¶

max_nb_lines = 100000¶

parse_index(index_o: list, index_c: dict, full_len: int)[source]¶

Parse index and merge too small contigs together

Parameters:	index_o (list) – index contigs order index_c (dict) – index contigs def full_len (int) – length of the sequence
Returns:	(new contigs def, new contigs order)
Return type:	(dict, list)

parse_paf(merge_index=True, noise=True)[source]¶

Parse PAF file

Parameters:	merge_index (bool) – if True, merge too small contigs in index noise (bool) – if True, remove noise

static remove_noise(lines, noise_limit)[source]¶

Remove noise from the dot plot

Parameters:	lines (dict) – lines of the dot plot, by class noise_limit (float) – line length limit
Returns:	kept lines, by class
Return type:	dict

reorient_contigs_in_paf(contigs)[source]¶

Reorient contigs in the PAF file

Parameters:	contigs – contigs to be reoriented

reverse_contig(contig_name)[source]¶

Reverse contig

Parameters:	contig_name (str) – contig name

save_json(out)[source]¶

Save D3.js data to json

Parameters:	out (str) – output file path

set_sorted(is_sorted)[source]¶

Change sorted status

Parameters:	is_sorted (bool) – new sorted status

sort()[source]¶: Sort contigs according to reference target and reorient them if needed

dgenies.lib.parsers module¶

Define tools parsers here

Each parser (main function) must have 2 and only 2 arguments: - First argument: input file which is the tool raw output - Second argument: finale PAF file

Returns True if parse succeed, else False

dgenies.lib.parsers.maf(in_maf, out_paf)[source]¶

Maf parser

Parameters:	in_maf (str) – input maf file path out_paf (str) – output paf file path
Returns:	True if success, else False

dgenies.lib.parsers.mashmap2paf(in_paf, out_paf)[source]¶

dgenies.lib.upload_file module¶

class dgenies.lib.upload_file.UploadFile(name, type_f=None, size=None, not_allowed_msg='')[source]¶

Bases: object

Manage uploaded files

Parameters:	name (str) – File name type_f (str) – file MIME type size (int) – file size in bytes not_allowed_msg (str) – error to add for not allowed file

get_file()[source]¶

Get file object

Returns:	file object
Return type:	dict

dgenies.lib.validators module¶

Define formats validators here (for alignment files)

Each validator (main function) has a name which is exactly the name of the format in the aln-formats.yaml file. Only 1 argument to this function: - Input file to check

Secondary functions must start with _

Returns True if file is valid, else False

dgenies.lib.validators._filter_maf(in_file)[source]¶

Filter Maf file (remove unused lines)

Parameters:	in_file – maf file to filter

dgenies.lib.validators.idx(in_file)[source]¶

Index file validator

Parameters:	in_file (str) – index file to test
Returns:	True if valid, else False
Return type:	bool

dgenies.lib.validators.maf(in_file)[source]¶

Maf validator

Parameters:	in_file (str) – maf file to test
Returns:	True if valid, else False
Return type:	bool

dgenies.lib.validators.paf(in_file)[source]¶

Paf validator

Parameters:	in_file (str) – paf file to test
Returns:	True if valid, else False
Return type:	bool

dgenies.lib package¶

Submodules¶

dgenies.lib.crons module¶

dgenies.lib.decorators module¶

dgenies.lib.drmaasession module¶

dgenies.lib.fasta module¶

dgenies.lib.functions module¶

dgenies.lib.job_manager module¶

dgenies.lib.latest module¶

dgenies.lib.mailer module¶

dgenies.lib.paf module¶

dgenies.lib.parsers module¶

dgenies.lib.upload_file module¶

dgenies.lib.validators module¶

Module contents¶