dgenies.bin package

Submodules

dgenies.bin.clean_jobs module

dgenies.bin.clean_jobs.parse_data_folders(app_data, gallery_jobs, now, max_age, fake=False)[source]

Parse data folder and remove too old jobs

Parameters:
  • app_data – folder where jobs are stored
  • gallery_jobs (list) – id of jobs which are inside the gallery
  • now (float) – current timestamp
  • max_age (dict) – remove all files & folders older than this age. Define it for each category (uploads, data, error, …)
  • fake (bool) – if True, just print files to delete, without delete them
Returns:

dgenies.bin.clean_jobs.parse_database(app_data, max_age, fake=False)[source]

Parse database and remove too old jobs (from database and from disk)

Parameters:
  • app_data (str) – folder where jobs are stored
  • max_age (dict) – remove all files & folders older than this age. Define it for each category (uploads, data, error, …)
  • fake (bool) – if True, just print files to delete, without delete them
Returns:

id jobs which are in the gallery (not removed independently of their age)

Return type:

list

dgenies.bin.clean_jobs.parse_upload_folders(upload_folder, now, max_age, fake=False)[source]

Parse upload folders and remove too old files and folders

Parameters:
  • upload_folder (str) – upload folder path
  • now (float) – current timestamp
  • max_age (dict) – remove all files & folders older than this age. Define it for each category (uploads, data, error, …)
  • fake (bool) – if True, just print files to delete, without delete them

dgenies.bin.filter_contigs module

class dgenies.bin.filter_contigs.Filter(fasta, index_file, type_f, min_filtered=0, split=False, out_fasta=None, replace_fa=False)[source]

Bases: object

Filter of a fasta file: remove too small contigs

Parameters:
  • fasta (str) – fasta file path
  • index_file (str) – index file path
  • type_f (str) – type of sample (query or target)
  • min_filtered (int) – minimum number of large contigs to allow filtering
  • split (bool) – are contigs split
  • out_fasta (str) – output fasta file path
  • replace_fa (bool) – if True, replace fasta file
_check_filter()[source]

Load index of fasta file, and determine contigs which must be removed. Remove them only in the index

Returns:list of contigs which must be removed
Return type:list
_filter_out(f_outs)[source]

Remove too small contigs from Fasta file

Parameters:f_outs (list) – contigs which must be filtered out
filter()[source]

Run filter of contigs

Returns:True if success, else False
Return type:bool

dgenies.bin.index module

class dgenies.bin.index.Index[source]

Bases: object

Manage Fasta Index

static load(index_file, merge_splits=False)[source]

Load index

Parameters:
  • index_file – index file path
  • merge_splits (bool) – if True, merge split contigs together
Returns:

  • [0] sample name
  • [1] contigs order
  • [2] contigs size
  • [3] reversed status for each contig
  • [4] absolute start position for each contig
  • [5] total len of the sample

Return type:

(str, list, dict, dict, dict, int)

static save(index_file, name, contigs, order, reversed_c)[source]

Save index

Parameters:
  • index_file (str) – index file path
  • name (str) – sample name
  • contigs (dict) – contigs size
  • order (list) – contifs order
  • reversed_c (dict) – reversed status for each contig
dgenies.bin.index.index_file(fasta_path, fasta_name, out, write_fa=None)[source]

Index fasta file

Parameters:
  • fasta_path (str) – fasta file path
  • fasta_name (str) – sample name
  • out (str) – output index file
  • write_fa (str) – file path of the new fasta file to write, None to don’t save fasta in a new file
Returns:

  • [0] True if success, else False
  • [1] Number of contigs
  • [2] Error message

Return type:

(bool, int, str)

dgenies.bin.local_scheduler module

dgenies.bin.local_scheduler._printer(*messages)[source]

print messages to stdout or to a file (according to LOG_FILE global constant)

Parameters:messages – messages to print
dgenies.bin.local_scheduler.cleaner()[source]

Exit DRMAA session at program exit

dgenies.bin.local_scheduler.get_prep_scheduled_jobs()[source]

Get list of jobs ready to be prepared (all data is downloaded and parsed)

Returns:list of jobs
Return type:list
dgenies.bin.local_scheduler.get_preparing_jobs_cluster_nb()[source]

Get number of jobs in preparation step (for cluster runs)

Returns:number of jobs
Return type:int
dgenies.bin.local_scheduler.get_preparing_jobs_nb()[source]

Get number of jobs in preparation step (for local runs)

Returns:number of jobs
Return type:int
dgenies.bin.local_scheduler.get_scheduled_cluster_jobs()[source]

Get list of jobs ready to be started (for cluster runs)

Returns:list of jobs
Return type:list
dgenies.bin.local_scheduler.get_scheduled_local_jobs()[source]

Get list of jobs ready to be started (for local runs)

Returns:list of jobs
Return type:list
dgenies.bin.local_scheduler.move_job_to_cluster(id_job)[source]

Change local job to be run on the cluster

Parameters:id_job
Returns:
dgenies.bin.local_scheduler.parse_args()[source]

Parse command line arguments and define DEBUG and LOG_FILE constants

dgenies.bin.local_scheduler.parse_started_jobs()[source]

Parse all started jobs: check all is OK, change jobs status if needed. Look for died jobs

Returns:(list of id of jobs started locally, list of id of jobs started on cluster)
Return type:(list, list)
dgenies.bin.local_scheduler.parse_uploads_asks()[source]

Parse asks for an upload: allow new uploads when other end, remove expired sessions, …

dgenies.bin.local_scheduler.prepare_job(id_job)[source]

Launch job preparation of data

Parameters:id_job (str) – job id
dgenies.bin.local_scheduler.start_job(id_job, batch_system_type='local')[source]

Start a job (mapping step)

Parameters:
  • id_job (str) – job id
  • batch_system_type (str) – local, slurm or sge

dgenies.bin.merge_splitted_chrms module

class dgenies.bin.merge_splitted_chrms.Merger(paf_in, paf_out, query_in, query_out, debug=False)[source]

Bases: object

Merge splitted contigs together in PAF file

Parameters:
  • paf_in (str) – input PAF file path
  • paf_out (str) – output PAF file path
  • query_in (str) – input query index file path
  • query_out (str) – output query index file path
  • debug (bool) – True to enable debug mode
static _get_sorted_splits(contigs_split, all_contigs)[source]

For each contigs_split, save how many base we will must add to each line of the corresponding split contig in PAF file. Also, save the final merged contig size in all contig dict

Parameters:
  • contigs_split (dict) – split contigs
  • all_contigs (dict) – all and final contigs
Returns:

all contigs and new split contigs with start of each split contig set

Return type:

(dict, dict)

_printer(message)[source]

Print debug messages if debug mode enabled

Parameters:message (str) – message to print
load_query_index(index)[source]

Load query index

Parameters:index (str) – index file path
Returns:
  • [0] contigs length
  • [1] splitted contigs length
  • [2] sample name
Return type:(dict, dict, str)
merge()[source]

Launch the merge

static merge_paf(paf_in, paf_out, contigs, contigs_split)[source]

Do merge PAF staff

Parameters:
  • paf_in (str) – path of input PAF with split contigs
  • paf_out (str) – path of output PAF where split contigs are now merged together
  • contigs (dict) – contigs size
  • contigs_split (dict) – split contigs size
static write_query_index(index, contigs, q_name)[source]

Save new query index

Parameters:
  • index (str) – index file path
  • contigs (dict) – contigs size
  • q_name (str) – sample name
dgenies.bin.merge_splitted_chrms.parse_args()[source]

Parse command line arguments

Returns:arguments
Return type:argparse.Namespace

dgenies.bin.sort_paf module

class dgenies.bin.sort_paf.Sorter(input_f, output_f)[source]

Bases: object

Sort PAF file by match size

Parameters:
  • input_f (str) – input fasta file path
  • output_f (str) – output fasta file path
_get_sorted_paf_lines()[source]

Get sorted PAF

Returns:sorted PAF lines
_sort_lines(lines)[source]

Sort lines staff

Parameters:lines (_io.TextIO) – lines of PAF file to be sorted
Returns:sorted lines
Return type:list
sort()[source]

Launch sort staff

dgenies.bin.split_fa module

class dgenies.bin.split_fa.Splitter(input_f, name_f, output_f, size_c=10000000, query_index='query_split.idx', debug=False)[source]

Bases: object

Split large contigs in smaller ones

Parameters:
  • input_f (str) – input fasta file path
  • name_f (str) – sample name
  • output_f (str) – output fasta file path
  • size_c (int) – size of split contigs
  • query_index (str) – index file path for query
  • debug (bool) – True to enable debug mode
flush_contig(fasta_str, chr_name, size_c, enc, index_f)[source]
split()[source]

Split contigs in smaller ones staff

Returns:True if the input Fasta is correct, else False
static split_contig(name, sequence, block_sizes)[source]
static write_contig(name, fasta, o_file)[source]
dgenies.bin.split_fa.parse_args()[source]

Module contents