dgenies.lib package

Submodules

dgenies.lib.crons module

class dgenies.lib.crons.Crons(base_dir, debug)[source]

Bases: object

Manage crontab jobs (webserver mode)

Parameters
  • base_dir (str) – software base directory path

  • debug (bool) – True to enable debug mode

clear(kill_scheduler=True, remove_pid_file=True)[source]

Clear all crons

Parameters
  • kill_scheduler (bool) – if True, kill local scheduler currently running

  • remove_pid_file (bool) – if True, remove pid file if local scheduler was killed successfully

init_clean_cron()[source]

Initialize clean cron: will clear old jobs. Clean cron is launched at 1h00am each day

init_launch_local_cron()[source]

Try to launch local scheduler (if not already launched)

start_all()[source]

Start all crons

dgenies.lib.decorators module

class dgenies.lib.decorators.Singleton(klass)[source]

Bases: object

Define a singleton (design pattern)

dgenies.lib.drmaasession module

dgenies.lib.fasta module

class dgenies.lib.fasta.Fasta(name, path, type_f, example=False)[source]

Bases: object

Defines a fasta file: name of the sample, path to the fasta file, type of file (URL or local file), …

Parameters
  • name (str) – sample name

  • path (str) – fasta file path

  • type_f (str) – type of file (local file or URL)

  • example (bool) – is an example job

get_name()[source]

Get sample name

Returns

sample name

Return type

str

get_path()[source]

Get path of the fasta file

Returns

fasta path

Return type

str

get_type()[source]

Get type: URL or local file

Returns

type

Return type

str

is_example()[source]

Return if current sample is an example data

Returns

current sample is an example data

Return type

bool

set_name(name)[source]

Set sample name

Parameters

name (str) – new sample name

set_path(path)[source]

Set path to the fasta file

Parameters

path (str) – new path

dgenies.lib.functions module

class dgenies.lib.functions.Functions[source]

Bases: object

General functions

static allowed_file(filename, file_formats=('fasta',))[source]

Check whether a file has a valid format

Parameters
  • filename – file path

  • file_formats – accepted file formats

Returns

True if valid format, else False

static compress(filename)[source]

Compress a file with gzip

Parameters

filename (str) – file to compress

Returns

path of the compressed file

Return type

str

static compress_and_send_mail(job_name, fasta_file, index_file, lock_file, mailer)[source]

Compress fasta file and the send mail with its link to the client

Parameters
  • job_name (str) – job id

  • fasta_file (str) – fasta file path

  • index_file (str) – index file path

  • lock_file (str) – lock file path

  • mailer (Mailer) – mailer object (to send mail)

config = <dgenies.config_reader.AppConfigReader object>
static get_fasta_file(res_dir, type_f, is_sorted)[source]

Get fasta file path

Parameters
  • res_dir (str) – job results directory

  • type_f (str) – type of file (query or target)

  • is_sorted (bool) – is fasta sorted

Returns

fasta file path

Return type

str

Get list of items from the gallery

Returns

list of item of the gallery. Each item is a dict with 7 keys:

  • name : name of the job

  • id_job : id of the job

  • picture : illustrating picture filename (located in gallery folder of the data folder)

  • query : query specie name

  • target : target specie name

  • mem_peak : max memory used for the run (human readable)

  • time_elapsed : time elapsed for the run (human readable)

Return type

list of dict

static get_list_all_jobs(mode='webserver')[source]

Get list of all jobs

Parameters

mode (str) – webserver or standalone

Returns

list of all jobs in standalone mode. Empty list in webserver mode

Return type

list

static get_mail_for_job(id_job)[source]

Retrieve associated mail for a job

Parameters

id_job (int) – job id

Returns

associated mail address

Return type

str

static get_readable_size(size, nb_after_coma=1, base='B')[source]

Get human readable size from a given size in bytes

Parameters
  • size (int) – size in bytes

  • nb_after_coma (str) – number of digits after coma

  • base – base unit of size, must be either “B”, “KiB”, “MiB” or “GiB”

Returns

size, human readable

Return type

str

static get_readable_time(seconds)[source]

Get human readable time

Parameters

seconds (int) – time in seconds

Returns

time, human readable

Return type

str

static get_valid_uploaded_filename(filename, folder)[source]

Check whether uploaded file already exists. If yes, rename it

Parameters
  • filename (str) – uploaded file

  • folder (str) – folder into save the file

Returns

unique filename

Return type

str

Check whether a job is in the gallery

Parameters
  • id_job (str) – job id

  • mode (str) – webserver or standalone

Returns

True if job is in the gallery, else False

Return type

bool

static query_fasta_file_exists(res_dir)[source]

Check if a fasta file exists

Parameters

res_dir (str) – job result directory

Returns

True if file exists and is a regular file, else False

Return type

bool

static random_string(s_len)[source]

Generate a random string

Parameters

s_len (int) – length of the string to generate

Returns

the random string

Return type

str

static read_index(index_file)[source]

Load index of query or target

Parameters

index_file (str) – index file path

Returns

  • [0] index (size of each chromosome) {dict}

  • [1] sample name {str}

Return type

(dict, str)

static send_fasta_ready(mailer, job_name, sample_name, compressed=False, path='fasta-query', status='success', ext='fasta')[source]

Send link to fasta file when treatment ended

Parameters
  • mailer (Mailer) – mailer object

  • job_name (str) – job id

  • sample_name (str) – sample name

  • compressed (bool) – is a compressed fasta file

  • path (str) – fasta path

  • status (str) – treatment status

  • ext (str) – file extension

static sort_fasta(job_name, fasta_file, index_file, lock_file, compress=False, mailer=None, mode='webserver')[source]

Sort fasta file according to the sorted index file

Parameters
  • job_name (str) – job id

  • fasta_file (str) – fasta file path

  • index_file (str) – index file path

  • lock_file (str) – lock file path

  • compress (bool) – compress result fasta file

  • mailer (Mailer) – mailer object (to send mail)

  • mode (str) – webserver or standalone

static uncompress(filename)[source]

Uncompress a gzipped file

Parameters

filename (str) – gzipped file

Returns

path of the uncompressed file

Return type

str

dgenies.lib.job_manager module

class dgenies.lib.job_manager.JobManager(id_job, email=None, query: Optional[dgenies.lib.fasta.Fasta] = None, target: Optional[dgenies.lib.fasta.Fasta] = None, mailer=None, tool='minimap2', align: Optional[dgenies.lib.fasta.Fasta] = None, backup: Optional[dgenies.lib.fasta.Fasta] = None, options=None)[source]

Bases: object

Jobs management

Parameters
  • id_job (str) – job id

  • email (str) – email from user

  • query (Fasta) – query fasta

  • target (Fasta) – target fasta

  • mailer (Mailer) – mailer object (to send mail throw flask app)

  • tool (str) – tool to use for mapping (choice from tools config)

  • align (Fasta) – alignment file (PAF, MAF, …) as a fasta object

  • backup (Fasta) – backup TAR file

  • options (list) – list of str containing options for the chosen tool

check_file(input_type, should_be_local, max_upload_size_readable)[source]

Check if file is correct: format, size, valid gzip

Parameters
  • input_type – query or target

  • should_be_local – True if job should be treated locally

  • max_upload_size_readable – max upload size human readable

Returns

(True if correct, True if error set [for fail], True if should be local)

check_job_status_sge()[source]

Check status of a SGE job run

Returns

True if the job jas successfully ended, else False

check_job_status_slurm()[source]

Check status of a SLURM job run

Returns

True if the job has successfully ended, else False

check_job_success()[source]

Check if a job succeed

Returns

status of a job: succeed, no-match or fail

Return type

str

clear()[source]

Remove job dir

delete()[source]

Remove a job

Returns

  • [0] Success of the deletion

  • [1] Error message, if any (else empty string)

Return type

(bool, str)

do_align()[source]

Check if we have to make alignment

Returns

True if the job is launched with an alignment file

download_files_with_pending(files_to_download, should_be_local, max_upload_size_readable)[source]

Download files from URLs, with pending (according to the max number of concurrent downloads)

Parameters
  • files_to_download (list of list) – files to download. For each item of the list, it’s a list with 2 elements: first one is the Fasta object, second one the input type (query or target)

  • should_be_local (bool) – True if the job should be run locally (according to input file sizes), else False

  • max_upload_size_readable (str) – Human readable max upload size (to show on errors)

static find_error_in_log(log_file)[source]

Find error in log (for cluster run)

Parameters

log_file – log file of the job

Returns

error (empty if no error)

Return type

str

get_file_size(filepath: str)[source]

Get file size

Parameters

filepath (str) – file path

Returns

file size (bytes)

Return type

int

get_mail_content(status, target_name, query_name=None)[source]

Build mail content for status mail

Parameters
  • status (str) – job status

  • target_name (str) – name of target

  • query_name (str) – name of query

Returns

mail content

Return type

str

get_mail_content_html(status, target_name, query_name=None)[source]

Build mail content as HTML

Parameters
  • status (str) – job status

  • target_name (str) – name of target

  • query_name (str) – name of query

Returns

mail content (html)

Return type

str

get_mail_subject(status)[source]

Build mail subject

Parameters

status (str) – job status

Returns

mail subject

Return type

str

static get_pending_local_number()[source]

Get number of of jobs running or waiting for a run

Returns

number of jobs

Return type

int

get_query_split()[source]

Get query split fasta file

Returns

split query fasta file

Return type

str

get_status_standalone(with_error=False)[source]

Get job status in standalone mode

Parameters

with_error – get also the error

Returns

status (and error, if with_error=True)

Return type

str or tuple (if with_error=True)

getting_files()[source]

Get files for the job

Returns

  • [0] True if getting files succeed, False else

  • [1] If error happenned, True if error already saved for the job, False else (error will be saved later)

  • [2] True if no data must be downloaded (will be downloaded with pending if True)

Return type

tuple

static is_gz_file(filepath)[source]

Check if a file is gzipped

Parameters

filepath (str) – file to check

Returns

True if gzipped, else False

is_query_filtered()[source]

Check if query has been filtered

Returns

True if filtered, else False

is_target_filtered()[source]

Check if target has been filtered

Returns

True if filtered, else False

Returns

launch()[source]

Launch a job in webserver mode (asynchronously in a new thread)

launch_standalone()[source]

Launch a job in standalone mode (asynchronously in a new thread)

launch_to_cluster(step, batch_system_type, command, args, log_out, log_err)[source]

Launch a program to the cluster

Parameters
  • step (str) – step (prepare, start)

  • batch_system_type (str) – slurm or sge

  • command (str) – program to launch (without arguments)

  • args (list) – arguments to use for the program

  • log_out (str) – log file for stdout

  • log_err (str) – log file for stderr

Returns

True if succeed, else False

Return type

bool

prepare_data()[source]

Launch preparation of data

prepare_data_cluster(batch_system_type)[source]

Launch of prepare data on a cluster

Parameters

batch_system_type (str) – slurm or sge

Returns

True if succeed, else False

Return type

bool

prepare_data_in_thread()[source]

Prepare data in a new thread

prepare_data_local()[source]

Prepare data locally. On standalone mode, launch job after, if success. :return: True if job succeed, else False :rtype: bool

prepare_dotplot_cluster(batch_system_type)[source]

Prepare data if alignment already done: just index the fasta (if index not given), then parse the alignment

Parameters

batch_system_type (str) – type of cluster (slurm or sge)

prepare_dotplot_local()[source]

Prepare data if alignment already done: just index the fasta (if index not given), then parse the alignment file and sort it.

run_job(batch_system_type)[source]

Run of a job (mapping step)

Parameters

batch_system_type (str) – type of cluster (slurm or sge)

run_job_in_thread(batch_system_type='local')[source]

Run a job asynchronously into a new thread

Parameters

batch_system_type (str) – slurm or sge

search_error()[source]

Search for an error in the log file (for local runs). If no error found, returns a generic error message

Returns

error message to give to the user

Return type

str

send_mail()[source]

Send mail

send_mail_post()[source]

Send mail using POST url (if there is no access to mailer)

set_inputs_from_res_dir()[source]

Sets inputs (query, target, …) from job dir

set_job_status(status, error='')[source]

Change status of a job

Parameters
  • status (str) – new job status

  • error (str) – error description (if any)

set_status_standalone(status, error='')[source]

Change job status in standalone mode

Parameters
  • status (str) – new status

  • error (str) – error description (if any)

start_job()[source]

Start job: download, check and parse input files

status()[source]

Get job status and error. In webserver mode, get also mem peak and time elapsed

Returns

status and other information

Return type

dict

unpack_backup()[source]

Untar backup file

update_job_status(status, id_process=None)[source]

Update job status

Parameters
  • status – new status

  • id_process – system process id

dgenies.lib.latest module

class dgenies.lib.latest.Latest[source]

Bases: object

Search latest version

load()[source]

Load latest version: use cached version (if any) and then sync with Github

update()[source]

Get latest version from Github

update_async()[source]

Update latest version asynchronously

dgenies.lib.mailer module

class dgenies.lib.mailer.Mailer(app)[source]

Bases: object

Send mail throw flask app

Parameters

app (Flask) – Flask app object

send_mail(recipients, subject, message, message_html=None)[source]

Send mail

Parameters
  • recipients (list) – list of recipients

  • subject (str) – mail subject

  • message (str) – message (text)

  • message_html (str) – message (html)

dgenies.lib.paf module

class dgenies.lib.paf.Paf(paf: str, idx_q: str, idx_t: str, auto_parse: bool = True, mailer=None, id_job=None)[source]

Bases: object

Functions applied to PAF files

Parameters
  • paf (str) – PAF file path

  • idx_q (str) – query index file path

  • idx_t (str) – target index file path

  • auto_parse (bool) – if True, parse PAF file at initialisation

  • mailer (Mailer) – mailer object, to send mails

  • id_job (str) – job id

build_list_no_assoc(to)[source]

Build list of queries that match with None target, or the opposite

Parameters

to – query or target

Returns

content of the file

build_query_chr_as_reference()[source]

Assemble query contigs like reference chromosomes

Returns

path of the fasta file

build_query_on_target_association_file()[source]

For each query, get the best matching chromosome and save it to a CSV file. Use the order of queries

Returns

content of the file

build_summary_stats(status_file)[source]

Get summary of identity

Returns

table with percents by category

compute_gravity_contigs()[source]

Compute gravity for each contig on each chromosome (how many big matches they have). Will be used to find which chromosome has the highest value for each contig

Returns

  • [0] gravity for each contig and each chromosome:

    {contig1: {chr1: value, chr2: value, …}, contig2: …}

  • [1] For each block save lines inside:

    [median_on_query, squared length, median_on_target, x1, x2, y1, y2, length] (x : on target, y: on query)

config = <dgenies.config_reader.AppConfigReader object>
get_d3js_data()[source]

Build data for D3.js client

Returns

data for d3.js:

  • y_len: length of query (Bp)

  • x_len: length of target (Bp)

  • min_idy: minimum of identity (float)

  • max_idy: maximum of identity (float)

  • lines: matches lines, by class of identity (dict)

  • y_contigs: query contigs definitions (dict)

  • y_order: query contigs order (list)

  • x_contigs: target contigs definitions (dict)

  • x_order: target contigs order (list)

  • name_y: name of the query (str)

  • name_x: name of the target (str)

  • limit_idy: limit for each class of identities (list)

Return type

dict

get_queries_on_target_association()[source]

For each target, get the list of queries associated to it

Returns

list of queries associated to each target

Return type

dict

get_query_on_target_association(with_coords=True)[source]

For each query, get the best matching chromosome

Returns

query on target association

Return type

dict

get_summary_stats()[source]

Load summary statistics from file

Returns

summary object or None if summary not already built

Return type

dict

is_contig_well_oriented(lines, contig, chrom)[source]

Returns True if the contig is well oriented. A well oriented contig must have y increased when x increased. We check that only for highest matches (small matches must be ignored)

Parameters
  • lines (list) – lines inside the contig

  • contig (str) – query contig name

  • chrom (str) – target chromosome name

Returns

True if well oriented, False else

Return type

bool

keyerror_message(exception, type_f)[source]

Build message if contig not found in query or target

Parameters
  • exception (KeyError) – exception object

  • type_f (str) – type of data (query or target)

Returns

error message

Return type

str

limit_idy = [0.25, 0.5, 0.75]
max_nb_lines = 100000
parse_index(index_o: list, index_c: dict, full_len: int)[source]

Parse index and merge too small contigs together

Parameters
  • index_o (list) – index contigs order

  • index_c (dict) – index contigs def

  • full_len (int) – length of the sequence

Returns

(new contigs def, new contigs order)

Return type

(dict, list)

parse_paf(merge_index=True, noise=True)[source]

Parse PAF file

Parameters
  • merge_index (bool) – if True, merge too small contigs in index

  • noise (bool) – if True, remove noise

static remove_noise(lines, noise_limit)[source]

Remove noise from the dot plot

Parameters
  • lines (dict) – lines of the dot plot, by class

  • noise_limit (float) – line length limit

Returns

kept lines, by class

Return type

dict

reorient_contigs_in_paf(contigs)[source]

Reorient contigs in the PAF file

Parameters

contigs – contigs to be reoriented

reverse_contig(contig_name)[source]

Reverse contig

Parameters

contig_name (str) – contig name

save_json(out)[source]

Save D3.js data to json

Parameters

out (str) – output file path

set_sorted(is_sorted)[source]

Change sorted status

Parameters

is_sorted (bool) – new sorted status

sort()[source]

Sort contigs according to reference target and reorient them if needed

dgenies.lib.parsers module

Define tools parsers here

Each parser (main function) must have 2 and only 2 arguments: - First argument: input file which is the tool raw output - Second argument: finale PAF file

Returns True if parse succeed, else False

dgenies.lib.parsers.maf(in_maf, out_paf)[source]

Maf parser

Parameters
  • in_maf (str) – input maf file path

  • out_paf (str) – output paf file path

Returns

True if success, else False

dgenies.lib.parsers.mashmap2paf(in_paf, out_paf)[source]

dgenies.lib.upload_file module

class dgenies.lib.upload_file.UploadFile(name, type_f=None, size=None, not_allowed_msg='')[source]

Bases: object

Manage uploaded files

Parameters
  • name (str) – File name

  • type_f (str) – file MIME type

  • size (int) – file size in bytes

  • not_allowed_msg (str) – error to add for not allowed file

get_file()[source]

Get file object

Returns

file object

Return type

dict

dgenies.lib.validators module

Define formats validators here (for alignment files)

Each validator (main function) has a name which is exactly the name of the format in the aln-formats.yaml file. Only 1 argument to this function: - Input file to check

Secondary functions must start with _

Validators for non-mapping files must start with “v_

Returns True if file is valid, else False

dgenies.lib.validators.maf(in_file)[source]

Maf validator

Parameters

in_file (str) – maf file to test

Returns

True if valid, else False

Return type

bool

dgenies.lib.validators.paf(in_file, n_max=None)[source]

Paf validator

Parameters
  • in_file (str) – paf file to test

  • n_max (int) – number of lines to test (default: None for all)

Returns

True if valid, else False

Return type

bool

dgenies.lib.validators.v_idx(in_file)[source]

Index file validator

Parameters

in_file (str) – index file to test

Returns

True if valid, else False

Return type

bool

Module contents