Download Files

  download.file(fileUrl="someurl", destfile="somename", method="curl")
  datedownloaded = date()

Flat Files

Loading flat files:

    read.table(file="somefile", sep=",", header=TRUE)

read.csv sets sep="," and header=TRUE

Other paramenters:

quote -> any quote. “” means no quote.

na.string -> character that represents missing value

nrows -> how many rows to read from file

skip -> number of lines to skip before starting to read

Fixed Width Text

x <- read.fwf(file="somefile", skip= 4, widths = c(1,3,5,6,12))

Excel

library(xlxs)
xData <- read.xlsx("filename", sheetIndex=1, header=TRUE)

Spedific rows and columns:

library(xlxs)
xData <- read.xlsx("filename", sheetIndex=1, colIndex=2:3, rowIndex=1:4)

write.xlsx function will write out an Excel file.

XML

JSON

Mysql

HDF5

Hierachical data format.

Groups contains zero or more data sets and metada

  • Have a group header with group name and list of attributes
  • Have a group symbol table with a list of objects in group

Datasets multidimensional array of data elmentes with metadata

  • Have a header with name, datatype, dataspeace, and storage layout
  • Have a data array with the data
source("http://bioconductor.org/biocLite.R")
biocLite("rhdf5")

library(rhdf5)
h5file = h5createFile("example.h5")

Create groups

h5createGroup("example.h5","g")

Write to groups

a= matrix(1:10,nr-5,nc=2)
h5write(A,"example.h5","g/a")

WEB

APIs

Read from twitter

myapp = outh_app("twitter", key="yourConsumerKeyHere",
        secret="yourConsumerSecretHere")
sig = sign_oauth1.0(myapp, token = "yourTokenHere", 
      token_secret = "yourTokenSecretHere")
homeTL = GET("https://api.twitter.com/1.1/statuses/home_timeline.json", sig)

Read from github

myapp = outh_app("github", key="yourConsumerKeyHere", 
    secret="yourConsumerSecretHere")
sig = sign_oauth1.0(myapp, token = "yourTokenHere", 
    token_secret = "yourTokenSecretHere")
homeTL = GET("somegithubaddress", sig)

Converting the json object

library(jsonlite)
json1 = content(homeTL)
json2 = jsonlite::fromJSON(toJSON(json1)) ## Convert to dataframe

Other sources