Data structures

Last updated: 2020-10-10

Checks: 7 0

Knit directory: bioinformatics_tips/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200503)

The command set.seed(20200503) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 8520292

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 8520292. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rproj.user/

Unstaged changes:
    Modified:   bioinformatics_tips.Rproj
    Modified:   code/perl/getopts.pl
    Modified:   code/r/optparse.R
    Modified:   code/r/run_param.sh
    Modified:   code/unix/get_option.sh

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/data_struct.Rmd) and HTML (docs/data_struct.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	8520292	Dave Tang	2020-10-10	Start page on data structures

Introduction

Learn about data structures.

Lists

Arrays

Stacks

The structure of a stack can be imagined as a pile of objects stacked vertically. When extracting these objects, they are extracted from the top to the bottom. When adding data to a stack, the data is put into the lowest available location; this is called a “push”. When extracting from a stack, the most recently added data is removed first; this is called a “pop”. This method of extracting the most recently added data first is called “Last In First Out” (LIFO).

Queues

Hash tables

Heaps

Heaps are one type of tree data structure and are used when implementing a priority queue, which is another type of data structure. In a priority queue, data can be added in any order. When extracting data, the smallest values are chosen first. This property of being able to freely add data and then extracting the smallest values first defines a priority queue.

As a rule of heaps, a child number is always greater than its parent number. If a number is added to the tree and its number is smaller, then the child and parent swap. This operation repeats until no additional swaps occur. When extracting a number from a heap, the number on the top of the tree is removed. In a heap, the smallest value is held at the top of the tree. When the top value is extracted, the heap’s structure needs to be reorganised. The number at the end of the line moves to the top and if one of the child numbers is less than the parent, the lowest of the adjacent child numbers swap with the parent. This repeats until no additional swaps occur.

Heaps can be used to quickly extract the smallest data but extraction of data in the middle of the tree can not be performed.

Binary search trees

Binary search trees have two properties:

All nodes are greater than the nodes in their left sub-tree
All nodes are smaller than the nodes in their right sub-tree

Due to these two properties the following are true:

A binary search tree’s smallest node is located at the end of the leftmost sub-tree line stemming from the top-most node
A binary search tree’s largest node is located at the end of the rightmost sub-tree line stemming from the top-most node

To add a number to a binary search tree, we start at the top-most node. If the number to be added is smaller, it proceeds to the left. This operation is repeated for all nodes and if it is smaller than all current nodes, it is added as a new node. If the number to be added is larger than the current node, it proceeds to the right and continues traversing down the tree.

Binary search trees are used to efficiently search for numbers. Self-balancing binary search trees are well-balanced to maintain search efficiency.

sessionInfo()

R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.1252    
system code page: 932

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] workflowr_1.6.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6    rprojroot_1.3-2 digest_0.6.25   later_1.1.0.1  
 [5] R6_2.4.1        backports_1.1.7 git2r_0.27.1    magrittr_1.5   
 [9] evaluate_0.14   stringi_1.4.6   rlang_0.4.6     fs_1.4.1       
[13] promises_1.1.1  whisker_0.4     rmarkdown_2.3   tools_4.0.2    
[17] stringr_1.4.0   glue_1.4.1      httpuv_1.5.4    xfun_0.15      
[21] yaml_2.2.1      compiler_4.0.2  htmltools_0.5.0 knitr_1.29