Content
Part1: Some useful R commands
1.1 Command Warm Up
1.2 Data objects: Vectors
1.3 Data objects: Matrices
1.4 Data objects: Data frame
1.5 Data sorting
1.6 Matrix calculations
1.7 Reading and writing data
1.8 Re-directing output and file management
Part1: Some useful R commands
1.1 Command Warm Up
- To launch a Web browser that allows to show the help pages type:
help.start()
- To obtain help on particular topic (e.g. ar: to fit an autoregressive time series model to the data) type:
?ar # or help(ar)
- Assignment operator is
<-
or->
(less used). To type<-
, useAlt + -
in Windows orOption + -
in macOS
x <- 10
x
5 -> x
x
- The difference of
=
and<-
: the later one explictly declare a variable in environment, see below:
median(x = 1:5)
x ## Error: object 'x' not found
median(x <- 1:5)
x
Extended: <<-
is another operator which is useful in R Language Object-oriented programming (OOP), it will change the parent state variables in inheritance
- R is an object oriented program. It handles many types of object. Objects are created and stored by name. To display the names of (most of) the objects which are currently stored within R, the R command:
objects()
- Remove variables:
rm(x) #to remove the object obj type
rm(list=ls(all=TRUE)) #to remove all objects > objects()
- To use commands (e.g. functions) stored in an external file, e.g. commands.R in the current working directory work, type:
source("commands.R")
abcfunction() #here we assume abcfuntion() is defined and stored in commands.R
- Use
()
for variable will print the variable out:
(daisy <- "42029ef215256f8fa9fedb53542ee6553eef76027b116f8fac5346211b1e473c")
- Quit R:
q() # no need to try :)
1.2 Data objects: Vectors
- Use
c()
to create a vector:
value.num <- c(3,4,2,6,20)
value.char <- c("math","cs","finance")
value.logical <- c(F,F,T,T)
- The
rep
function replicates elements of vector:
(value <- rep(5,6))
- The seq function creates a regular sequence of values to form a vector:
seq(from=2,to=12,by=2)
- The functions can be used in combination:
value <- c(1,2,5,rep(3,4),seq(from=1,to=6,by=3)); value
- The scan function is used to enter data at the terminal:
value <- scan() #press "Esc" to exit
- Vector operations
x <- runif(10) # generates random vector of length 10 independent, uniformly distributed
x
y <- 10*x + 1
y
z <- (x-mean(x))/sd(x)
z
mean(x)
sd(x)
1.3 Data objects: Matrices
- A matrix may be created from a vector by using
dim
:
value <- rnorm(10) # generates random vector of length 10 independent, normal distributed
dim(value) <- c(2,5) #2× 5 matrix
value
dim(value) <- NULL # back to vector
value
- It may also be created from a vector by using matrix:
value1 <- matrix(value,2,5); value1 #2,5 is the dimension of the matrix
matrix(value,2,5,byrow=T) #type ?matrix to see the difference
- To bind a row onto an already existing matrix, the rbind function can be used:
value2 <- rbind(value1,c(1,1,2,2,3)) # add one row
- To bind a column onto an already existing matrix, the cbind function can be used:
value3 <- cbind(value2,c(1,1,2)) # add one column
value3
1.4 Data objects: Data frame
- The function data.frame converts a matrix or collection of vectors into a data frame:
value3 <- data.frame(value3)
value3
value4 <- data.frame(rnorm(3),runif(3))
value4
- To view the row and column names of a data frame:
names(value4)
row.names(value4)
- Alternative labels can be assigned by doing the following:
names(value4) <- c("C1","C2")
row.names(value3) <- c("R1","R2","R3")
- Names can also be specified within the data.frame function itself:
data.frame(C1=rnorm(3),C2=runif(3),row.names=c("R1","R2","R3"))
- The following example is to show how to access elements of a vector or matrix:
x <- sample(1:5, 10, rep=T) #produces a random sample of values between one and five, ten times
x
ones <- (x == 1); ones #check if all the elements of x are equal to 1
x[ones] <- 0
x
others <- (x > 1)
y <- x[others] #stores the values greater than 1 into y.
y
which(x > 1) #finds indices of elements bigger than 1
y <- x[-(1:5)] #copies x without the first 5 elements. To exclude values, negative index vectors are used
y
1.5 Data sorting
The command order allows sorting with tie-breaking: Find an index vector that arranges the first of its arguments in increasing order. Ties are broken by the second argument and any remaining ties are broken by a third argument.
x <- sample(1:5, 20, rep=T)
y <- sample(1:5, 20, rep=T)
z <- sample(1:5, 20, rep=T)
xyz <- rbind(x, y, z)
dimnames(xyz)[[2]] <- letters[1:20] #names the columns by the first 20 letters
xyz
o <- order(x, y, z) #orders the matrix xyz first by x, then by y and at last by z
xyz[, o]
1.6 Matrix calculations
A*B #is the matrix of element by element products
A %*% B #is the matrix product.
x %*% A %*% x #is a quadratic form, if x is a vector
(mat1 <- matrix(c(1,0,1,1), nrow=2))
(mat2 <- matrix(c(1,1,0,1), nrow=2))
solve(mat1) # inverts the matrix
#Matrix operation
mat1 %*% mat2 # product
mat1 + mat2 # Matrix addition
t(mat1) # Matrix transposition
det(mat1) # Matrix determinant
# diag()
(A<-diag(c(1,2))) # input a vector
(diag(A)) # input a matrix
(diag(4)) # input a number
1.7 Reading and writing data
For reading and writing in files, R uses the working directory, so make sure you either set the data file path or put data file at your working space.
getwd() # check current work space
setwd("your working space path") # set your working space
There are several ways to read and load data into the R working space, depending on the data format. For simple text data, the command is read.table
. For .csv files, the command is read.csv
. The data file is specified in either a single or double quotes; see examples below and the R commands of Lecture 1 available on IVLE.
R treats the data as an object and refer to them by the assigned name. For both loading commands, R stores the data in a matrix framework. As such, one can use the command dim
(i.e., dimension) to see the size of the data.
read.table("http://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat") # read via http
read.table("AAPL.txt") # read via local file; make sure AAPL.txt is under your work space
read.csv("AAPL.csv")
With the growing volume in dataset, the above commands may behave bad if data is larger than 500MB. We recommend to use fread
instead. fread
is a function in data.table
package.
Extension: both fread
and read.csv
are written via C language, but fread
memory maps the file into memory and then iterates through the file using pointers. Whereas read.csv
reads the file into a buffer via a connection.
install.packages("data.table")
require(data.table)
system.time(fread("AAPL.csv")) # use system.time to check the performance of a function
fread("AAPL.txt")
Also, fwrite
is superior to write.csv
, use fwrite
instead. *Use ?fread
and fwrite
to see more info.
1.8 Re-directing output and file management
By default, the output are showed in the R working space. However, you can re-direct the output to a file in your current working directory. See the following example:
print("hello")
sink("out.file")
print('hello')
sink()
file.show('out.file') # shows the file
file.remove('out.file') # removes out.file
list.files() # no out.file any more
Part2: Financial data analysis
Download the financial data of Leture 1 from IVLE, then analyse them (mean, variance, test, plot, etc) as shown on the Lecture by using the R commands.
#2.1 load data
require(data.table)
rate <- fread("EURUSD.csv")
#2.2 calculation
mean(rate$rate)
var(rate$rate)
var(rate$return[-1]) # omit NA in first row
hist(rate$rate)
hist(rate$return)
summary(rate)
fBasics::basicStats(rate$rate) ## need to install.packages("fBasics")
#2.3 plot
require(ggplot2)
qplot(rate$time, rate$rate, rate,
colour = I("darkblue"),
xlab = "time",
ylab = "EU/USD Rate",
geom = "line")
qplot(rate$time, rate$return, rate,
colour = I("darkred"),
xlab = "time",
ylab = "EU/USD return",
geom = "line")
# for digital asset
require(coinmarketcapr) ## install.packages("coinmarketcapr")
## Loading required package: coinmarketcapr
## Warning: package 'coinmarketcapr' was built under R version 3.4.4
require(treemap) ## install.packages("treemap")
## Loading required package: treemap
## Warning: package 'treemap' was built under R version 3.4.4
plot_top_5_currencies()
market_today <- get_marketcap_ticker_all()
head(market_today[,1:8])
## id name symbol rank price_usd price_btc
## 1 bitcoin Bitcoin BTC 1 4068.53744485 1.0
## 2 ethereum Ethereum ETH 2 139.334327768 0.03427599
## 3 ripple XRP XRP 3 0.3094146194 0.00007612
## 4 eos EOS EOS 4 4.2554984289 0.00104684
## 5 litecoin Litecoin LTC 5 60.9539009492 0.01499455
## 6 bitcoin-cash Bitcoin Cash BCH 6 169.678170527 0.04174052
## X24h_volume_usd market_cap_usd
## 1 9992201807.77 71670997597.0
## 2 4499761131.45 14690390321.0
## 3 759836701.765 12904620810.0
## 4 2190870174.15 3856524674.0
## 5 1867734118.86 3723986898.0
## 6 562465066.029 3003023649.0
df1 <- na.omit(market_today[,c('id','market_cap_usd')])
df1$market_cap_usd <- as.numeric(df1$market_cap_usd)
df1$formatted_market_cap <- paste0(df1$id,'\n','$',format(df1$market_cap_usd,big.mark = ',',scientific = F, trim = T))
treemap(df1, index = 'formatted_market_cap', vSize = 'market_cap_usd', title = 'Cryptocurrency Market Cap', fontsize.labels=c(12, 8), palette='RdYlGn')