talk1 (62 min): Data Science in the Tidyverse – Hadley Wickham link
- goal:solve complex problems by combining simple uniform pieces
- consistent functions
- a command function performs an action (print, plot, write_csv, <-)
- a query function computes a value (summarise, mutate, geom_line)
- pipe code
- first argument is the data
- the data is the same type across a family of functions Tidy data is a consistent way of storing data
- Each dataset goes in a data frame
- Each variable goes in a column
- for list columns use tidy tibbles instead of tidy data frames
- biggest difference: tibbles are data frames that are lazy and surely
- no character to factor,
- no partial matching,
- better support for lists (defining in the creator for example)
applications: tidy text, sf (successor of the sp, uses list columns), cross validation (with list columns), tidy_quant (tidy financial timeseries), maybe ml with caret (or mlr) pipelearner, …
- 4 principles:
- each function encapsulates one task
- and is either a query or a command
- Functions are composed with
%>%
- and use tidy tibbles as primary data structure
talk 2 (117 min): Building Dashboards with Shiny Tutorial – Joe Cheng & Winston Chang link
- user vs data driven events
- invalidate later() is ok but has overhead
- reactivefilereader just looks every i. e. milisecond on the timestamp of a file and the read functions are only called when the timestamp changes.
- first argument must be the path to data. but works just for data in files on disk. not for databases or apis
- reactivePoll is 2 functions first is checkfun (httr::head(api) head request for apis to see if anything changed), 2nd is valueFunc. Both are necessary for api or db data. no data is returned, but a reactive expression that returns a data frame.
- how do we filter arrange etc while data changes underneath… just to understand…it is done more or less automatically
- performance
- Cache Results
- speed up with logic, parallel or rcpp
- scale hardware up/out
- remove functionality
- dynamic (in clientside with html widgets) vs stable dashboards
- flexdashboard (publish on server, RPubs, RStudoConnect, works with html_widgets, crosstalk and shiny, also auch mit reactive inputs im letzten Fall, galube ich…)
- shinydashboard
(of course one can build a shiny dashboard without these packages)
- deploy to server ,rstudioapps.io, rstudioconnect
talk 3 (130 min): happy R users purrr - Tutorial (charlotte wickham) link
lapply(people, function(x) length(x$starships))
is equivalent to:
map(mtcars, ~ length(.x$starships) # map(.x, .f)
- other types of output
- map always returns a list
- some helpers for atomic returns use: map_lgl, map_int, map_dbl, map_chr (return has the same length as x). use them to have typestable output.
- u can use readr::parse_number for also telling what are
NAs
when converting to numeric
- when you want nothing at all, use a function for its sisde effects:
walk()
- set names with set_names instead of names of setNames
- other ways of specifying .f
- .f can be an integer or string, i.e.
f = "some_name"
instead of .x[["some_name"]]
- go in little steps: map(people, “starships”) %>% map_int(length)
other iteration functions …
- when you have more than one information for a cell. try to use a list column.
- try to use it inside a tibble instead in a data.frame.
- when building the tibble from scratch with map, there is no nice functionality to turn the list to a tibble, but you really have to build every column from scratch indepentently…you can use transpose to try it, but at the moment there is no “safe” way to do this in short
convenient functions for missing values map_chr("Species", .null = NA_character_)
- use map inside mutate to manipulate list columns inside a tibble/data.frame
…sometimes it’s nice to use lookup tables inside map…
- collapse a character list column, so that each element is a paseted strings with an appropriate separator (use collapse) -> sth like
mutate(new_col = paste(old_col, collapse = ", "))
- so the following should work:
map_chr(old_column, paste(.x, collapse ", "))
- before pasting one can sort and so on of course.
walk
is like map
, but you get nothing back (you get it back invisible, so you can do some side effect inside a pipe and continue the pipeline in the same go).
side effects = “printing to screen, plotting to gr dev, file manipulation (save, write, move, etc), system calls”
map2
iterates over to lists -> map2(.x, .y, .f)
- there are also
- walk2, map2_lgl, map2_int, map2_dbl, map2_chr
- use map2 for writing, downloading etc different objects to different files…
- use pmap, it is like map and map2, just for 3 or more arguments
use invoke map to apply many different functions to one argument
- many helpers for lists and functions..
- safely takes a function and returns a function
- transpose takes and returns a list…
the combination of the last two is very good for dealing with errorcatching…
talk 4 (24 min): What’s new with Shiny – Joe Cheng link
near future: - automated testing - shiny tests - write tests - run tests - update in a sandbox and run tests - compare tests - decide if you can take the update or have to debug + test event recorder to record named tests for the userinterface -> json of the shiny states during the recording/snapshot, and also png to get the overall picture of the bug - api endpoints
after that: - asynchronous tasks
limitations: - not suitable for all cases. for example not for random actions (maybe set.seed) - api might change (its not on cran yet) - some dependencies (cran again)
- possibility to catch values with api that were created by interactive use of shiny
- usage /api/… you get the data as csv, json or whatever you define
- readable from python, r, c#, whatever
- possibility for api to activate the interactive widgets
- at the moment only data retrieving and no authentification at the moment
talk 5 (22 min): Database Best Practices – Jim Hester link
shows latest work at dbi, odbc, pool
- dbi is a unified interface for databases
- (was alos available for s+)
- rstudio took over maintenance
- will be in next dplyr release
- databasebackends just have to work with dbi interface,
- no need anymore to customize dplyr backends specifically
dbitest for tests…
- odbc will be on cran
- is dbi compatible
- is a rewrite of the rodbc package
- odbc standard is separate from this “odbc” package
- every odbc connectable database backend might be used with this package
- native support for timestamps and raw binary formats (no coercing date-> string necessary …)
- supports batch queries, which makes it a bit (~ 2times) faster than rodbc
- supports parameterized queries
- includes some wrappers for strings to avoid dropping tables in the db-backend
insert into xyz (?,?),dbBind() (so you can set r objects as parameters, “?” are placeholders)
- u can set a knitr option (connection = con)
- u can use sql code chunks in knitr, it will know the connection if set in options
- u can write queries as functions and give arguments to interact with the result… (maybe it would be good to catch the result, will have to look at the memory issues)
u can build a shiny app to interact with the database
- pool makes it possible to let many people interact with a database (maybe interacting with the db)
- issues before, r is single threaded, so everybody has to wait. other solution: new connection for each person, but there is a limit for maximal open connection.
- pool opens some connections and gives connections that are already open.
- pool is faster, because …
tries to reestablish connection (and you dont have to care about too many details, when failure occur)
future: - generic connection tab in rstudio (like the one for sparklyr) - reestablish previous connections - viewer for available drivers - view sources - view tables - view schemes -> within rstudio v 1.1 <3
rstudio server pro: - easy setup for a wide variety of db (50+, including hive, impala, postgresq, mysql) - improved performance - improved error messaging - kerberos support
talk 7 (23 min): R and Spark – Javier Luraschi link
- test locally, spark_install(), spark_connect(master = “local”)
- copy data to spark, use dplyr, …
- for modelling use sparkfunctions
- dont need dplyr, you can also use dbi package and use sql statements
- supports extentions: run scala from sparklyr, via invoke(context, version), runs on the cluster
- use scalafiles, get results to r interface (advanced)
- more functions
- more r behaviour, df, and NAs
- experimental livy support (to connect remote from rstudio)
- rstudio and shiny server pro certified with cloudera
- config[[spark-shell-memory]] or so to give more memory then 500mb default or so
talk 8 (24 min): Dynamic Shiny Interfaces – Bárbara Borges Ribeiro link
- dynamic ui means that you return a plot, a summary or a table depending on the choose of a user
- there are not too many use cases, since normally one works only on a table or only on a table…
- should one use render or input ui
- render ui generates a slot for the output and however we choose the input, we just get the specific output
- basically you use just regular shiny objects
- there are limitations for render ui. if you have 30 inputs and maybe want to see some of them at the same time. its possible, but then you had to provide outputslots for them, which you would like to do independently.
- example: add independently different functions on different datasets
- instertUI: works like renderUI. Needs an action button. you have to define where new output is added (usually starting with some placeholder). it doesnt look like usual shiny. no ui-server pairs.Wrap things into tag list and div.
- summary:
- renderUI, feels more like “shiny” and is a bit safer, brings less trouble, doesn’t need events (has reactivity by default), easily bookmarkable
- insertUI is a lot trickier, you have to know a bit what you are doing. Is a lot more flexible. You can do everything like with renderUI, but also much more, longer code, harder to debug and to bookmark, it is nice to compare different plots or other things side by side.
- there are also other ways for dynamic shiny, like conditional panels or any javascript stuff.
- you should use renderUI, when reactive ui is enough.
talk 9 (24 min): Using Web APIs from R – Amanda Gadrow link
talk 10 (25 min): Bookmarking Shiny State (Finally!) – Winston Chang link
- it is about sharing results of a shiny app. you can already share the app. but the results of an app couldn’t be shared before bookmarkable state. You always had to write the way to the results.
- requires shiny 0.14 (september 16). actual version (1/17) is 1.0.
- ui has to be a function with argument request, you need to include bookmarkButton(). Within shinyApp you can set enableBookmarking = “url” (you can set this to sth else also).
- you should add a global.r to enable bookmarking
- there is also the possibility to store the bookmarked page on the server via url =“server”
- short url on server, url might be very long if not on server (this can lead to problems in (maybe older versions) of internet explorer).
- cant save data on url encoded app, but you can if you host it on a server (which needs shiny server).
- the url encoded version does not need shiny server.
- bookmarking via the 3 lines mentioned above doesnt work for all shiny apps. sometimes you have to do more work.
- specifically, it’s easy, if the ouput just depends on the actual settings of the app and not on the past settings during the session (“partly input dependent applications”).
- also dependence on external data sources might need a little bit more work to bookmark.
- to solve these issues, there are callbackfunctions: onBookmark() and onRestore(). details are explained in the talk…
- documented on the shiny website -> bookmarkable state articles…
- there might be ways to go back from the bookmarked app to the start of the app, even this is not intended at the moment…
- for the “savin on a file with the shiny server way” it would be nice to give the file with the bookmarking to the user, but that is not possible, at least at the moment.
talk 11 (17 min): Competitive Modeling of Outcomes for Prediction – Max Kuhn link
talk 13 (26 min): Writing Readable Code with Pipes – Bob Rudis link
- 81 Packages on Cran that export the pipe operator
- use ndjson for faster json parsing
- use anytime for faster date time stuff
- use stop_for_status() to look into the risky data stuff within the pipeline
- use list.files %>% map_df
talk 14 (20 min): Mapping in R with Leaflet – Bhaskar Karambelkar link
- based on version 0.7 of the leaflet javascript library. The most actual one is version 1.0
- pacakges of the presentation: sp, rgdal, rgeos, raster, rmapshaper, tigris, acs, sf, mapview, geojson, geojsonio
- the wholel grafic is rendered as a svg
- many controls, buttons and plugins (the latter is outstanding and special within leaflet).
Adv. Concepts - various panes with different visibility. Order of mapping is not important, cause this is done internally. - u can catch events for html stuff or shiny - u can use groups and players - huge mass of plugins, a whole ecosystem (like also d3 has).
- leafletProxy is used within shiny to add sth to already existing maps
- leaflet/adxxx have options arguments, which can be quite nice
- you can use tiles from cartodb
- addLayersControl(“Dark”, “Light”)
new markers: addAwesomeMarkers, addLabelOnlyMarkers
- more hover options, also for polygons…
- better and more projections like mercator…
use shiny to capture events,..
- leaflet.extras, for added plugins…
- at the moment on github. on cran soon…
- leaflet will be stable
leaflet.extras will be dynamic
- topojason for dynamic chloropleth maps is awesome…
for now everything thrown into leaflet has to be in lat/lng. however this is discussed and might be more flexible someday
talk 15 (19 min): Putting square pegs in round holes: Using list-cols in your dataframe – Jenny Bryan link
- feels awkward, but you feel like a programming god in the end.
- you need purr (by hadley and lionel henry)
- works great with dplyr, broom, tibble,…
- vectors don’t have to be atomic, so you can also put lists into a data.frame
- why?
- regex with sometimes more than one match
- some json or xml from an api which can be a nested list
- split apply combine problems (lapply,…)
- in a data frame its cooler, since you think about filtering, arranging,…
- you can use the existing toolkin (dplyr, …)
- keep multiple vectors intact and in sync
- purrr replaces plyr in some way
- how to index, inspect, compute, simplify these awkward structures
- links for 2-3 examples are in the slides, but were cancelled from the talk, despite of time problems
- -> api example
- repursive package on github, which has some nice list examples to practice
- usual to call list columns “stuff”
- it is nicer to have a tibble instead of a data.frame, cause these can be easier created with lists, worked with and printed.
- dont forget [ and [[ to figure out list elements, also the listviewer package has a nice widget to explore lists.
- nice way to use map variants in mutate to get some useful informations from a list columns
- really nice example with a template for strings (in the talk)
- use unnest() to go from list columns to data.frame without listcolums
- use group_by %>% nest, to get a list data.frame within a pipeline
- of course you can get list columns from list columns, depending on the operations you do…
- if you have many models in your columns, just use the broom package (tidy, augment, …)
- inspections: View (sometimes), listviewer, str with list.level = 1, or list.level = 10 (for example) and maybe a new tool in Rstudio (not yet announced) can help to view list columns
- work with listcolumns: use map in dplyr functions
- when nested data.frames, when listed columns? -> nested data.frames are special cases of list columns.
- they can replace ddply and are cool for split apply combine
- when list colums, when group_by %>% do()
- jenny bryan thinks do() is complicated somehow and she just likes the other workflow more
- might this be a possibility to reduce the complexity in many bioconductor packages easier with this approach?
- there are many ideas, but at the moment they dont seem to converge directly into this direction.
talk 17 (23 min): Text Mining the tidy Way – Julia Silge link
- get a text, safe the line, use unnest_tokens(word, text) and get the format of one word per row.
- stop words: tidy_books %>% dplyr::anti_join(stop_words)
- counts: tidy_books %>% dplyr::cout(word, sort = TRUE)
- sentiments analysis: tidy_books %>% inner_join(get_sentiments(“bing”))
- what is a document about: inverse document frequency (better than removing stop words): calculate for each book how often each words occurs (in percent). tf_idf: …
- if you have tagged articles it is nice to sort the included words by the heights of their tf_idf’s
- ngrams, networs, negates (skipped)
- convert between tidy and non tidy formats. allows you to use operations on classical textmining datastructures with the classical implemented methods. than you can switch back to the tidy format.
- book: tidy text mining (includes case studies)
- if you have different speakers, extend then to a column, before unnesting into the tidy format
talk 18 (25 min): TrelliscopeJS – Ryan Hafen link
- interactive displays of small multiples
- based on the javascript package from the same author
- facet_trelliscope fits in the ggplot workflow, but gives you pages of facets, which can lead to a better overview.
- it also gives you filters and sorts
- facets can include plotly graphics (so widgets inside a widget)
- plots list columns of plots (interactive)
- fits well with tidyverse workflow
- works good with sparklyr
- “kind of a database of images that you can query”
- a lot more to come, especially filters for more datatypes
- should work with base, lattice, ggplot2 and any html widget, bokeh, plotly, etc
- uses crossfilter behind the scenes (should work for 1 million rows (or more?))
- works in browser, so there are limitations…
- fit in one markdown vis
- has bookmarkable/sharable state
talk 19 (19 min): Teaching Introductory Statistics Using the tidyverse via bookdown – Chester Ismay link
- bringing R to different subjects in school
- writing a book to lower the barrier to return to the language
- writing a book to use the interactive and build up a formula paradigm
talk 20 (48 min): Lightning Talks – User Submitted Talks link
- easyMake (package)
- tries to build dependency graphs of analysis
- tries to build make files upon
- also has an RStudio plugin
- gives a working makefile, not a perfect one
- ROpenSci packages every muggle should have heard about (Kathik Ram)
- Magick package, helps you with images transformate, read, write, some magic stuff, …
- perfect with gganimate
- pipefriendly
- hunspell
- spellcheking in R for text and textanalysis
- lots of advanced functionality
- Tesseract
- Travis + tic
- add tic to your travis yaml
- Scalable Data Science with R and Spark – Best Practices and Lessons Learned
- Rdd
- DataFrames
- Transformers - Actions
ggedit: interactive ggplot aesthetic and theme editor
- Exploring correlations in a tidy R framework with corrr
- correlations in data frames instead of a matrix
- use stretch() for long format
- everything pipeable
- fcts for printing, plotting, clustering
- recomments widyr for structures that are not perfect for tidy format
- Exploration of Literature Databases with Shiny
Business intelligence with R bi plattform ending in R backend + data (googlesheets, mysql, python, crm, …) -> googlebigquery -> flexdashboard (incl. shiny, used a child rmd file -> every page has its own file) + highcharter
- bsplus: Using Bootstrap to extend your Shiny app
- you want to put more stuff into your shiny app
- a lot of additional stuff depends on the ui, not the serverside
- everything pipeable
- is coming to cran
Bringing R Into Dev: Playing Nice With Others use shiny, bash and feather. the latter plays nicely with python and julia
- FlashR: Parallelize and Scale R Machine
- redefines r matrix functions to work with bigger data (overwrites them)
- mainly switches functions that create r objects into functions that create flashR objects
- uses same api as base r
- brings parallelization out of the box
- executes (and stores) out of memory
- outperforms revolution r :D
- easy to use, fast and you can test it on the webpage flashx.io
talk21 (68 min): Finding and Telling Stories with R – Andrew Flowers link
subtitle “6 types of data stories and how to find them” - good journalism is good storytelling
talk 22 (114 min): Advanced R Markdown Tutorial – Yihui Xie link
- rmarkdown = knitr + pandoc
- raw (outside of a chunk) latex or html will only work in the specific documenttype
- converts to markdown and markdown passes arguments to pandoc
- outputformate können über output_format’s basetype argument angepasst werden
- yaml is translated into rmarkdown::render arguments
- you can set some yml options to null, like theme bootstrap (default) can changed and that makes output sometimes smaller
- can pass own css styles
- use developer tools (for example in chrome) . you can customize the html tag p for paragraph within the browser (as experimentation add for example color: red;) and later change this directly in rstudio via copy paste and saving the css as external css file and apply this to yml via css:path/of/css/file. you can do the same with javascript. you can do this as a vector; css: [path1, path2, path3]
- you can also customize via the template option. defaults for templates are on github.
- only output field of yml is for markdown. the rest is for pandoc and can be found in its documentation
- some nice deeper customization is available via in.header, before.body, after.body
- you can write your own package to extend further options
- how markdown handles this internally is explained in the talk
- via pre and postproces, you can change some things, that can’t be done with pandoc.
- this is for example used to preserve html widgets content in rmarkdown
- most important for using/providing a template is the yml output (2 approaches are shown [jss rticlle, tufte handout])
- xaringan ports remark.js but some markdown shoudnt be touched by pandoc. therefore you can hide somehow some part of it via the preprocessor step explained before. (the document /html parts can be broken via some line with a split…)
- it is an awesome presentation framework, (haked by yihui within 3 days)
- bookdown (worked on it whole 2016). outputs: pdf, html, ebooks. writing on a specific postprocessor took most of the time, cause many features had to be synchronized between different output formats (the real challenge is to make sth work for multiple outputformats at the same time.). makes extensive use of regular expressions.
- three tips from his life as a software developer:
- you cant make everyone happy. focus first on making one person veryvery happy.
- use humor and provide little easter eggs. think differently
- stand on the shoulders of giants. you dont have to know about c, c++, python, …, but you can reuse frameworks and build up on them or their ideas.
- keep calm, say no, say sorry, if it is too complex problem. and has a very easy practical workaround that is easy for the user. it will be ok for them.
- reask the user why he wants sth. maybe his intention is strange and afterwards he doesnt have the feature request anymore.
- be very open and engagng to users in the open source and pull requests. maybe others can contribute and work further on your package
talk 23 (131): Happy Git and Gihub for the useR Tutorial – Jenny Bryan link
- uses allice barlet s slides for her tutorial a lot
- r is mirrorerd on gh, by winson chang and cran also, because of gabor csardi
- install git, configure, make sure rstudio can find it
- there will be a git button in the gui
- in rstudio you can do 90% of you r git work, like: make commits, look into your git history, look at gists, pull and push, access other branches than the master branch (but can’t create other branches)
- everything that was committed can be revisited
- some git clients: rstudio (you can use command line for missing features, also mixing both at the same time), git, sourcetree (preferred by jenny brian), github desktop (is not recommended by jenny bryan), git cracking
- rstudio creates gitignore by default
- devtools can care abouto license stuff, so you don’t have to specify these thins when creating a repo
- you can use ssh, or https (github suggests this) [depends if you cached your credentials or used ssh keys]
- always pull before work further or push (after committing), since it’s nicer to prevent than to resolve merge conflicts.
- git documentatin is really bad!!!
- starlogs.net shows the beginning of a git history as a star wars episode start, with music and so on …
- ust github, lab or bucket, …
- when you have unresolvable merge conflicts with a remote repo, you can burn the local repo down and recreate it from remote.
- keep_md: yes or output: markdown in the yml (or html_document to github_document, which changes rmd to md in the output, so also creates md on github when you push), lets you prevent github from rendering to html or rmarkdown.
- rmarkdown rendered as html on github since around 1 year
- edit, creations and also deletions have to be staged/committed
- you can put yml on top of regular r files also, and create md files instead of r files on github.
- github can render repository as a website, since december 16, you just need to have some thins as markdown and data as csv or tsv.
- you can create indexes in every hierarchy of a file. these can link to other places.
- just activate github pages within repository settings
- github pages is a jekyll powered service
- under source choose master branch and safe.
- github can show you differences in commited data analysis results, like diffs in data or images.
- github enterprise lets you run github on premise
- oh shit git webpage is a nice page for common github issues
- you have to pull always when there is sometihin on gh that you dont have. otherwise you cant push
talk 24 (25 min): Opinionated Analysis Development Hilary Parker link
~ try to blame the process and not the person -> when errors occur, optimise the process
talk 25 (26 min): What’s New with the IDE – Kevin Ushey link
- uses up to date version 1.0.136 in the talk
- lots of autocompletion via tab, in different linces, in and outside of functions, fuzzily, for paths, …
- works also nice when implementing shiny stuff
- is smart to know, when u want…for example an environment, options, …
- autocompletes the methods for object.method syntax within rcpp
- there is a new “Rstudio Home button” in the rstudio-file-pane
- diagnostics
- u get symbols noting when code is expected to fail, because of some syntax related errors,
- but there are some other cool thins, like unknown function arguments, no definition in scope, defined, but not used
- configurable inside options -> diagnostics
- some small things, like defining whitespaces around binary operators.
- command + enter executes a whole expression
- cmd + alt + shift + up/down to expand selections, for example if statements
- rename in (parent/bracket) scope (can be done via the gui, but there is also a shortcut)
- some nice stuff for roxygen
- some nice stuff for s4 classes
- highlight ugly code and press cmd + shift + a to format to nice code
- strg + alt gives you multiple cursors
- there is a document outline view in the gui for rmarkdown now
- strg + alt + i to use code chunks
- inline latex $$
- can execute and render python (and many other) chunks (uses default engine, but you can set this path)
- alt + shift + k for shortcuts
talk 26 (22 min): Dashboards made easy – Sean Lopp link
- two patterns that many db have in common: some imfo updates in real time, some only at specific timepoints
- interactivity
- use parameterized dashboards (access via rmarkdown render function)
- bring it into a flexdashboard
- also differences between different parameter results will be visible
- Dashboard + Shiny, enables you to include changes at runtime, and more user interactivity.
- difference for shiny: now we are creating an app instead of an html. We need to host it.
- cool thing about shiny and rmarkdown in flexdashboard: you don’t have a separate ui and server function (internal this is a trick via rmarkdown::shiny_ui / rmarkdown::shiny_server)
- therefore the flexdashboards are harder to debug and don’t scale that well.
- many of this will be fixed via runtime:shiny_prerendered, more upcoming in 2017
- invalidateLater function uses userinput and also listens to api every n seconds.
- demo: “weatehr updates every day, and everything is pulled in once a day”
- in future: use shiny_prerendered for calling apis on a regular basis for some dashboard…
talk 28 (33 min): Making Websites with R – Yihui Xie link
- (a sneak peak of blogdown)
- early beta, test it!
- install from github
- based on static sidegenerator hugo
- really easy to get started.
- call new_site function in an empty director or empty rstudio project: install hugo automatically, downloads a hugo theme from github, loads some sample posts, starts a local webbrowser to preview.
- at hugowebsite is a quickstartguide with 12 steps
- for blogdown just ::new_site
- the output is build on top of bookdown
- so you can use many features like crossreferences
- you can create a project from rsutio: website useing blogdown
- on hugo documentation you can find a lot of themes and they are easy to change
- its not only for blogs, but also for general purpose websites
- he thinks rmarkdown could be like the new php (but it might not be the best id)
- he wanted to write everything in r, but actually it was possible via hugo to reuse a lot of functionality from there
- hugo is written in “go” (via this, you could make r the new php, but might not make that much sense…)
- funtcion install_hugo in blogdown package
- it takes 1ms to render a page in hugo…but for blogdown, a bit of optimisation work has to be done.
- many more static website generators available: jekyll (is very too slow)
- hugo has and one can create different themes and templates. But lacks rmarkdown. so yihui added it.
- structure:
- content
- themes
- static (js, css files, …)
- public directory (ready to be published on servers, s3, github,… contains generated htmls,…)
- config.tml contains name, theme, …., google analytics decision…
- can be easier to use markdown or rmakdown. for first not pandoc is used, instead a go package called black friday, that somehow behaves a bit different.
- helpers: new_site, install_hugo, install_theme, (default: hugo litium theme), serve_site rebuilds (calls surfside) and shows preview however, there is an add in “surf side” shown in rstudio,new_post (the only funtion that you use more than once, whenever you make a new post…).
- output: blogdown::html_page , wrapped from bookdown (gives a lot of these functionality), tables, citations, titles, …
- two addins: view new site and new post
- more general than rmarkdown website mode
- you can enable cache
- supports rss feeds -> rbloggers :)
- hugo has everything we want, except rmarkdown…before blogdown…
- if a theme does not work with blogdown, it might afford one or small simple tweaks, to make it work.
talk 29 (120 min): Customizing and Extending R Markdown – Yihui Xie link
- most important field in yml is “output”
- before customizing, check out the available options in the documentation or from the gear button within rstudio
- toc options, css, …
- ouput somehow influences the possibilities regarding knitr options, pandoc options, pre and postprocessing of html stuff (look at yihuis other talk to get the details of this), other options
- can be created via rmarkdown::output_format() function.
- important argument is baseformat (for example = html_document, which enables you to override some default html format), if you want to build up on some available format. otherwise you can ignore this argument.
- an output format function returns a list (including all options of the format)
- the output looks different to some pandoc outputs, since a lot of default cusomizing via integration of other stuff was added by yihui
- use developer tools from your browser, to see how you can customize via css for example and the save these in context of your document and pass it to the css option in the yaml.
- you can use some pandoc template to build up another template, which makes you much more flexible. read the pandoc documentation minimum once before you do, to know what pandoc options are available.
- some examples of extensions are: bookdown, xaringan (remark.js translation to r [for cooler slides]), flexdashboard, tufte, rticles, prettydoc, blogdown
talk 31 (28 min): Extending R with C++: A Brief Introduction to Rcpp – Dirk Eddelbuettel link
- computer age statistical inference by hastie and efron (book recommendation)
- in this state of the art book, everything is done with r :)
- extendig r by j. m. chambers, opening in chapter 1:
- everything in r is an object
- everything that happens in r is a function call
- interfaces to other software are a part of r
- r is a c program (and r and fortran, …) and you can expand this, so there is already an api, via the .call interface, .c is deprecated.
- you always get an SEXP back (mapping from r objects to c), we can have this for all r objects.
- dont need to do memory allocation, just allocate a vector…
- library(“Rcpp”)
- evalcpp(“2+2”) # testcase
- cppfunction(“some code”, plugins = c(“cpp11”))
- sourceCpp is workhorse behind evlCpp function
- rcpp function initialisation does compile, link, load
- use it in/with packages. do “package with rcpp” when creating a package
- u can extend your packages with templates for rcpp, eigen, armadillo, devtools
- 3 ways to extend r with rcpp
- just use the rcpp objects
- use LinkingTo for other only header pacakges like, eigen, armadillo, BH
- doable: external libraries may require a little bit more work but entirely feasible
- many rcpp versions of ml algorithms available
- gallery rcpp with 100 examples, + book, + website
talk 32 (21 min): R Notebook Workflows – Jonathan McPherson link
- Jonathan McPherson is the “leader of the IDE”
- nb is just rmarkdown, but you can execute the chunks individually (one at a time)
- outputs appear in the document and are saved with it.
- there is already an rstudio webinar with an introduction
- there is an outline view
- cmd + alt + shift + j pick and select a section quickly
- 2 hidden cmds, you can bind keyboard sequences to run between two chunks
- you can click on the navigation bar and land in the chunk that is currently running
- there is a shortcut to collapse all codechunks at once
- the notebookchunks may not be reproducible, because they can have been run in any order. workarounds:
- knit from a fresh rstudio version
- before you run a chunk, run all chunks before it
- when chunks are run, the output lands into the chache. When you safe a notebook, the cache is combined with the document. you get 1. the cache, 2. the document, 3. the underlying code (all three in one notebook)
- it is an html file (selfcontained)
- you can render to a different outputformat
- publish to another format.
- nb can be notebooks and html
- nb can be notebooks and pdf
- nb can be notebooks and github…, …
- one of the formats can be run iterative and the other format can be guaranted to be reproducible
- publish and collaborate from rstudio connect
- when you open an old notebook within rstudio, you can get the saved output cache within your rstudio session.
- via download rmd you can get the code from the notebook.
- version control: 2 different options.
- put notebook into gitignore and only version control the rmarkdown. this ensures reproducibility.
- check the nb in version control. then input and output are versioned together. sometimes its nice, since not everything has to be executed again. If you do this, be aware that this can give conflicts in the output, which can’t be resolved. This gives priority to the collaborators changes on your original work.
- the r code in the notebook depends on version of r and packages. how to resolve this is still not clear, but for now it is recommended to use a packrat snapshot before sharing. Collaborator has to unbundle the (before bundled) nb file
- if you don’t want that a chunk is run be a collaborateur by default, you can just set eval = FALSE, but of course this chunk can still be ran manually.
talk 33 (18 min): R’s Role in Data Science – Joseph Rickert link
Talk 34 (66 min): All Things R and RStudio, Q & A with J.J. Allaire, Hadley Wickham & Joe Cheng, Moderator: Joseph Rickert link
upcoming:
- interface to tensorflow deep learning and machine learning
- thoughts about default parallelization…problems about api, because parallelisation ways differ so much on different os’ses.
Needed in base R:
- 64bit integers
- out of memory vectors (pointers)