rstudio_conf_notes.utf8

talk1 (62 min): Data Science in the Tidyverse – Hadley Wickham link
talk 2 (117 min): Building Dashboards with Shiny Tutorial – Joe Cheng & Winston Chang link
talk 3 (130 min): happy R users purrr - Tutorial (charlotte wickham) link
talk 4 (24 min): What’s new with Shiny – Joe Cheng link
talk 5 (22 min): Database Best Practices – Jim Hester link
talk 6 (18 min): Push-Button Publish in RStudio Connect – Jeff Allen link
talk 7 (23 min): R and Spark – Javier Luraschi link
talk 8 (24 min): Dynamic Shiny Interfaces – Bárbara Borges Ribeiro link
talk 9 (24 min): Using Web APIs from R – Amanda Gadrow link
talk 10 (25 min): Bookmarking Shiny State (Finally!) – Winston Chang link
talk 11 (17 min): Competitive Modeling of Outcomes for Prediction – Max Kuhn link
talk 12 (22min): Fun with htmlwidgets:3D interactive network visualization with threejs and R – Bryan Lewis link
talk 13 (26 min): Writing Readable Code with Pipes – Bob Rudis link
talk 14 (20 min): Mapping in R with Leaflet – Bhaskar Karambelkar link
talk 15 (19 min): Putting square pegs in round holes: Using list-cols in your dataframe – Jenny Bryan link
talk 16 (10 min): Linking HTML Widgets with Crosstalk – Joe Cheng link
talk 17 (23 min): Text Mining the tidy Way – Julia Silge link
talk 18 (25 min): TrelliscopeJS – Ryan Hafen link
talk 19 (19 min): Teaching Introductory Statistics Using the tidyverse via bookdown – Chester Ismay link
talk 20 (48 min): Lightning Talks – User Submitted Talks link
talk21 (68 min): Finding and Telling Stories with R – Andrew Flowers link
talk 22 (114 min): Advanced R Markdown Tutorial – Yihui Xie link
talk 23 (131): Happy Git and Gihub for the useR Tutorial – Jenny Bryan link
talk 24 (25 min): Opinionated Analysis Development Hilary Parker link
talk 25 (26 min): What’s New with the IDE – Kevin Ushey link
talk 26 (22 min): Dashboards made easy – Sean Lopp link
talk 27 (21 min): RStudio Server Pro Power Tools – Jonathan McPherson link
talk 28 (33 min): Making Websites with R – Yihui Xie link
talk 29 (120 min): Customizing and Extending R Markdown – Yihui Xie link
talk 30 (24 min): Understand Code Performance with the profiler – Winston Chang link
talk 31 (28 min): Extending R with C++: A Brief Introduction to Rcpp – Dirk Eddelbuettel link
talk 32 (21 min): R Notebook Workflows – Jonathan McPherson link
talk 33 (18 min): R’s Role in Data Science – Joseph Rickert link
Talk 34 (66 min): All Things R and RStudio, Q & A with J.J. Allaire, Hadley Wickham & Joe Cheng, Moderator: Joseph Rickert link

talk1 (62 min): Data Science in the Tidyverse – Hadley Wickham link

goal:solve complex problems by combining simple uniform pieces
consistent functions
- a command function performs an action (print, plot, write_csv, <-)
- a query function computes a value (summarise, mutate, geom_line)
pipe code
- first argument is the data
- the data is the same type across a family of functions Tidy data is a consistent way of storing data
Each dataset goes in a data frame
Each variable goes in a column
for list columns use tidy tibbles instead of tidy data frames
- biggest difference: tibbles are data frames that are lazy and surely
  - no character to factor,
  - no partial matching,
  - better support for lists (defining in the creator for example)
applications: tidy text, sf (successor of the sp, uses list columns), cross validation (with list columns), tidy_quant (tidy financial timeseries), maybe ml with caret (or mlr) pipelearner, …
4 principles:

each function encapsulates one task
and is either a query or a command
Functions are composed with %>%
and use tidy tibbles as primary data structure

workflow usually involves naming intermediate results, nesting and the pipe
when working with many data frames it mitght be not alway the best idea
dbi interface for the odbc will be worked on

talk 2 (117 min): Building Dashboards with Shiny Tutorial – Joe Cheng & Winston Chang link

first part joe cheng, server side
what sets a dashboard apart from ohter apps
- automatic updating
- potentially many viewers looking at the same data
- may or may not be interactive
- “Ten foot” user interface (designed to be seen from a distance or from mobile, …)
techniques:
- 1. reactive file reader/pull functions
- 1. optimizing performance

user vs data driven events
- invalidate later() is ok but has overhead
- reactivefilereader just looks every i. e. milisecond on the timestamp of a file and the read functions are only called when the timestamp changes.
- first argument must be the path to data. but works just for data in files on disk. not for databases or apis
- reactivePoll is 2 functions first is checkfun (httr::head(api) head request for apis to see if anything changed), 2nd is valueFunc. Both are necessary for api or db data. no data is returned, but a reactive expression that returns a data frame.

how do we filter arrange etc while data changes underneath… just to understand…it is done more or less automatically

performance
- Cache Results
- speed up with logic, parallel or rcpp
- scale hardware up/out
- remove functionality

maybe rxtools package is coming to handle special functionality for reactivity, but maybe under a different name
maybe some background job functionality will be included in shiny
logging was not well introduced via shiny
Userinterface (W. Chang)

dynamic (in clientside with html widgets) vs stable dashboards
flexdashboard (publish on server, RPubs, RStudoConnect, works with html_widgets, crosstalk and shiny, also auch mit reactive inputs im letzten Fall, galube ich…)
shinydashboard
(of course one can build a shiny dashboard without these packages)

deploy to server ,rstudioapps.io, rstudioconnect

talk 3 (130 min): happy R users purrr - Tutorial (charlotte wickham) link

lapply(people, function(x) length(x$starships)) is equivalent to:
map(mtcars, ~ length(.x$starships) # map(.x, .f)
other types of output
map always returns a list
some helpers for atomic returns use: map_lgl, map_int, map_dbl, map_chr (return has the same length as x). use them to have typestable output.
u can use readr::parse_number for also telling what are NAs when converting to numeric
when you want nothing at all, use a function for its sisde effects: walk()
set names with set_names instead of names of setNames
other ways of specifying .f
- .f can be an integer or string, i.e. f = "some_name" instead of .x[["some_name"]]
go in little steps: map(people, “starships”) %>% map_int(length)
other iteration functions …
when you have more than one information for a cell. try to use a list column.
- try to use it inside a tibble instead in a data.frame.
- when building the tibble from scratch with map, there is no nice functionality to turn the list to a tibble, but you really have to build every column from scratch indepentently…you can use transpose to try it, but at the moment there is no “safe” way to do this in short
convenient functions for missing values map_chr("Species", .null = NA_character_)
use map inside mutate to manipulate list columns inside a tibble/data.frame
…sometimes it’s nice to use lookup tables inside map…
collapse a character list column, so that each element is a paseted strings with an appropriate separator (use collapse) -> sth like
mutate(new_col = paste(old_col, collapse = ", "))
so the following should work:
- map_chr(old_column, paste(.x, collapse ", "))
- before pasting one can sort and so on of course.
walk is like map, but you get nothing back (you get it back invisible, so you can do some side effect inside a pipe and continue the pipeline in the same go).
side effects = “printing to screen, plotting to gr dev, file manipulation (save, write, move, etc), system calls”
map2 iterates over to lists -> map2(.x, .y, .f)
there are also
- walk2, map2_lgl, map2_int, map2_dbl, map2_chr
use map2 for writing, downloading etc different objects to different files…
use pmap, it is like map and map2, just for 3 or more arguments
use invoke map to apply many different functions to one argument
many helpers for lists and functions..
safely takes a function and returns a function
transpose takes and returns a list…
the combination of the last two is very good for dealing with errorcatching…

talk 4 (24 min): What’s new with Shiny – Joe Cheng link

near future: - automated testing - shiny tests - write tests - run tests - update in a sandbox and run tests - compare tests - decide if you can take the update or have to debug + test event recorder to record named tests for the userinterface -> json of the shiny states during the recording/snapshot, and also png to get the overall picture of the bug - api endpoints

after that: - asynchronous tasks

limitations: - not suitable for all cases. for example not for random actions (maybe set.seed) - api might change (its not on cran yet) - some dependencies (cran again)

possibility to catch values with api that were created by interactive use of shiny
usage /api/… you get the data as csv, json or whatever you define
readable from python, r, c#, whatever
possibility for api to activate the interactive widgets
at the moment only data retrieving and no authentification at the moment

talk 5 (22 min): Database Best Practices – Jim Hester link

shows latest work at dbi, odbc, pool
dbi is a unified interface for databases
(was alos available for s+)
rstudio took over maintenance
will be in next dplyr release
databasebackends just have to work with dbi interface,
no need anymore to customize dplyr backends specifically
dbitest for tests…
odbc will be on cran
is dbi compatible
is a rewrite of the rodbc package
odbc standard is separate from this “odbc” package
every odbc connectable database backend might be used with this package
native support for timestamps and raw binary formats (no coercing date-> string necessary …)
supports batch queries, which makes it a bit (~ 2times) faster than rodbc
supports parameterized queries
includes some wrappers for strings to avoid dropping tables in the db-backend
insert into xyz (?,?),dbBind() (so you can set r objects as parameters, “?” are placeholders)
u can set a knitr option (connection = con)
u can use sql code chunks in knitr, it will know the connection if set in options
u can write queries as functions and give arguments to interact with the result… (maybe it would be good to catch the result, will have to look at the memory issues)
u can build a shiny app to interact with the database
pool makes it possible to let many people interact with a database (maybe interacting with the db)
issues before, r is single threaded, so everybody has to wait. other solution: new connection for each person, but there is a limit for maximal open connection.
pool opens some connections and gives connections that are already open.
pool is faster, because …
tries to reestablish connection (and you dont have to care about too many details, when failure occur)

future: - generic connection tab in rstudio (like the one for sparklyr) - reestablish previous connections - viewer for available drivers - view sources - view tables - view schemes -> within rstudio v 1.1 <3

rstudio server pro: - easy setup for a wide variety of db (50+, including hive, impala, postgresq, mysql) - improved performance - improved error messaging - kerberos support

talk 6 (18 min): Push-Button Publish in RStudio Connect – Jeff Allen link

how to safe achievements?

email, sharepoint, … -> rstudio connect one button from rstudio to rstudio connect and managing the publish stuff
self managed content like markdown, shiny, plots
scheduled rendering
enterprise security
on premise, commercial support
creates the same environment on the server like the one which was used locally (all packages,…)
publishing options private, public, …
returns a url, which will be refreshed when reports are updated
schedule rmarkdown reports
good idea to use parameterized rmarkdown, can be defined in the yaml block of rmd files -> knit with parameters… these will also work, when published. Then the reports can be filtered by users and regenerated per push button by the user. You get the full functionality without thinking about shiny.
you can get emailed copies of the reports
also able to host shiny, with multiple processes and loadbalancers
you can get the logs (on r studio pro)
to rebuild the environment, packrat is used.
it-friendly (enterprise), auth, passwords, monitoring,…
beta for a year now

next: - self content organization - choosing the hierarchy - usage of tags - kerberos support - bundle management - rollback

feature to load up source code
free day 45 trial
also hosted by rstudio
available on linux/virtual machine

talk 7 (23 min): R and Spark – Javier Luraschi link

test locally, spark_install(), spark_connect(master = “local”)
copy data to spark, use dplyr, …
for modelling use sparkfunctions
dont need dplyr, you can also use dbi package and use sql statements
supports extentions: run scala from sparklyr, via invoke(context, version), runs on the cluster
use scalafiles, get results to r interface (advanced)
more functions
more r behaviour, df, and NAs
experimental livy support (to connect remote from rstudio)
rstudio and shiny server pro certified with cloudera
config[[spark-shell-memory]] or so to give more memory then 500mb default or so

talk 8 (24 min): Dynamic Shiny Interfaces – Bárbara Borges Ribeiro link

dynamic ui means that you return a plot, a summary or a table depending on the choose of a user
there are not too many use cases, since normally one works only on a table or only on a table…
should one use render or input ui
render ui generates a slot for the output and however we choose the input, we just get the specific output
basically you use just regular shiny objects
there are limitations for render ui. if you have 30 inputs and maybe want to see some of them at the same time. its possible, but then you had to provide outputslots for them, which you would like to do independently.
example: add independently different functions on different datasets
instertUI: works like renderUI. Needs an action button. you have to define where new output is added (usually starting with some placeholder). it doesnt look like usual shiny. no ui-server pairs.Wrap things into tag list and div.
summary:
renderUI, feels more like “shiny” and is a bit safer, brings less trouble, doesn’t need events (has reactivity by default), easily bookmarkable
insertUI is a lot trickier, you have to know a bit what you are doing. Is a lot more flexible. You can do everything like with renderUI, but also much more, longer code, harder to debug and to bookmark, it is nice to compare different plots or other things side by side.
there are also other ways for dynamic shiny, like conditional panels or any javascript stuff.
you should use renderUI, when reactive ui is enough.

talk 9 (24 min): Using Web APIs from R – Amanda Gadrow link

some nice api packages: aws.s3, RGoogleAnalytics, acs, etc.
httr for requests (lots of functionality and very consistent)
xml2, jsonlite for parsing the response
goal
- verb (get for example)
- endpoint (url)
- parameters (keyword, tag,… -> query)
usage:
- GET(), content(), fromJSON, stop_for_status() (in pipes to invest the header),
- writing helpers to parse sometimes, GET(page$next) to go through pages, possible to write this as a loop via while (!is.null(page$next))
you can schedule api calls via rstudioconnect (or other ways),
you can also run them regular from s3

talk 10 (25 min): Bookmarking Shiny State (Finally!) – Winston Chang link

it is about sharing results of a shiny app. you can already share the app. but the results of an app couldn’t be shared before bookmarkable state. You always had to write the way to the results.
requires shiny 0.14 (september 16). actual version (1/17) is 1.0.
ui has to be a function with argument request, you need to include bookmarkButton(). Within shinyApp you can set enableBookmarking = “url” (you can set this to sth else also).
you should add a global.r to enable bookmarking
there is also the possibility to store the bookmarked page on the server via url =“server”
short url on server, url might be very long if not on server (this can lead to problems in (maybe older versions) of internet explorer).
cant save data on url encoded app, but you can if you host it on a server (which needs shiny server).
the url encoded version does not need shiny server.
bookmarking via the 3 lines mentioned above doesnt work for all shiny apps. sometimes you have to do more work.
specifically, it’s easy, if the ouput just depends on the actual settings of the app and not on the past settings during the session (“partly input dependent applications”).
also dependence on external data sources might need a little bit more work to bookmark.
to solve these issues, there are callbackfunctions: onBookmark() and onRestore(). details are explained in the talk…
documented on the shiny website -> bookmarkable state articles…
there might be ways to go back from the bookmarked app to the start of the app, even this is not intended at the moment…
for the “savin on a file with the shiny server way” it would be nice to give the file with the bookmarking to the user, but that is not possible, at least at the moment.

talk 11 (17 min): Competitive Modeling of Outcomes for Prediction – Max Kuhn link

gain & lift…

talk 12 (22min): Fun with htmlwidgets:3D interactive network visualization with threejs and R – Bryan Lewis link

introduced 2 years ago
now first major update
less to learn, less overhead …
good combi with other packages especially igraph
can do graphs on globes
scatterplot3js is the internal working horse
new: network visualisation with graphjs function, similar to igraph (extends it)
easy animation
3d layouts
easy to play with really big graphs

talk 13 (26 min): Writing Readable Code with Pipes – Bob Rudis link

81 Packages on Cran that export the pipe operator
use ndjson for faster json parsing
use anytime for faster date time stuff
use stop_for_status() to look into the risky data stuff within the pipeline
use list.files %>% map_df

talk 14 (20 min): Mapping in R with Leaflet – Bhaskar Karambelkar link

based on version 0.7 of the leaflet javascript library. The most actual one is version 1.0
pacakges of the presentation: sp, rgdal, rgeos, raster, rmapshaper, tigris, acs, sf, mapview, geojson, geojsonio
the wholel grafic is rendered as a svg
many controls, buttons and plugins (the latter is outstanding and special within leaflet).

Adv. Concepts - various panes with different visibility. Order of mapping is not important, cause this is done internally. - u can catch events for html stuff or shiny - u can use groups and players - huge mass of plugins, a whole ecosystem (like also d3 has).

leafletProxy is used within shiny to add sth to already existing maps
leaflet/adxxx have options arguments, which can be quite nice
you can use tiles from cartodb
addLayersControl(“Dark”, “Light”)
new markers: addAwesomeMarkers, addLabelOnlyMarkers
more hover options, also for polygons…
better and more projections like mercator…
use shiny to capture events,..
leaflet.extras, for added plugins…
at the moment on github. on cran soon…
leaflet will be stable
leaflet.extras will be dynamic
topojason for dynamic chloropleth maps is awesome…
for now everything thrown into leaflet has to be in lat/lng. however this is discussed and might be more flexible someday

talk 15 (19 min): Putting square pegs in round holes: Using list-cols in your dataframe – Jenny Bryan link

feels awkward, but you feel like a programming god in the end.
you need purr (by hadley and lionel henry)
works great with dplyr, broom, tibble,…
vectors don’t have to be atomic, so you can also put lists into a data.frame
why?
- regex with sometimes more than one match
- some json or xml from an api which can be a nested list
- split apply combine problems (lapply,…)
in a data frame its cooler, since you think about filtering, arranging,…
you can use the existing toolkin (dplyr, …)
keep multiple vectors intact and in sync
purrr replaces plyr in some way
how to index, inspect, compute, simplify these awkward structures
links for 2-3 examples are in the slides, but were cancelled from the talk, despite of time problems
-> api example
repursive package on github, which has some nice list examples to practice
usual to call list columns “stuff”
it is nicer to have a tibble instead of a data.frame, cause these can be easier created with lists, worked with and printed.
dont forget [ and [[ to figure out list elements, also the listviewer package has a nice widget to explore lists.
nice way to use map variants in mutate to get some useful informations from a list columns
really nice example with a template for strings (in the talk)
use unnest() to go from list columns to data.frame without listcolums
use group_by %>% nest, to get a list data.frame within a pipeline
of course you can get list columns from list columns, depending on the operations you do…
if you have many models in your columns, just use the broom package (tidy, augment, …)
inspections: View (sometimes), listviewer, str with list.level = 1, or list.level = 10 (for example) and maybe a new tool in Rstudio (not yet announced) can help to view list columns
work with listcolumns: use map in dplyr functions
when nested data.frames, when listed columns? -> nested data.frames are special cases of list columns.
- they can replace ddply and are cool for split apply combine
when list colums, when group_by %>% do()
- jenny bryan thinks do() is complicated somehow and she just likes the other workflow more
might this be a possibility to reduce the complexity in many bioconductor packages easier with this approach?
- there are many ideas, but at the moment they dont seem to converge directly into this direction.

talk 16 (10 min): Linking HTML Widgets with Crosstalk – Joe Cheng link

crosstalk is somehow the webbased extension of cranvas and/or ggobi (coordinated multiple views)
htmlwidgets is unopinionated, you have all freedom to access any js library, …
three approaches to this: crosstalk, shiny, js (robservable)
filtering and linked brushing for data based on data frames (crosstalk) syntaxdifferences: data <- quakes; leaflet(data) %>% addMarkers; datatable(data) vs data <- SharedData$new(data); leaflet(data) %>% addMarkers; datatable(data)

talk 17 (23 min): Text Mining the tidy Way – Julia Silge link

get a text, safe the line, use unnest_tokens(word, text) and get the format of one word per row.
stop words: tidy_books %>% dplyr::anti_join(stop_words)
counts: tidy_books %>% dplyr::cout(word, sort = TRUE)
sentiments analysis: tidy_books %>% inner_join(get_sentiments(“bing”))
what is a document about: inverse document frequency (better than removing stop words): calculate for each book how often each words occurs (in percent). tf_idf: …
if you have tagged articles it is nice to sort the included words by the heights of their tf_idf’s
ngrams, networs, negates (skipped)
convert between tidy and non tidy formats. allows you to use operations on classical textmining datastructures with the classical implemented methods. than you can switch back to the tidy format.
book: tidy text mining (includes case studies)
if you have different speakers, extend then to a column, before unnesting into the tidy format

talk 18 (25 min): TrelliscopeJS – Ryan Hafen link

interactive displays of small multiples
based on the javascript package from the same author
facet_trelliscope fits in the ggplot workflow, but gives you pages of facets, which can lead to a better overview.
it also gives you filters and sorts
facets can include plotly graphics (so widgets inside a widget)
plots list columns of plots (interactive)
fits well with tidyverse workflow
works good with sparklyr
“kind of a database of images that you can query”
a lot more to come, especially filters for more datatypes
should work with base, lattice, ggplot2 and any html widget, bokeh, plotly, etc
uses crossfilter behind the scenes (should work for 1 million rows (or more?))
works in browser, so there are limitations…
fit in one markdown vis
has bookmarkable/sharable state

talk 19 (19 min): Teaching Introductory Statistics Using the tidyverse via bookdown – Chester Ismay link

bringing R to different subjects in school
writing a book to lower the barrier to return to the language
writing a book to use the interactive and build up a formula paradigm

talk 20 (48 min): Lightning Talks – User Submitted Talks link

easyMake (package)

tries to build dependency graphs of analysis
tries to build make files upon
also has an RStudio plugin
gives a working makefile, not a perfect one

ROpenSci packages every muggle should have heard about (Kathik Ram)

Magick package, helps you with images transformate, read, write, some magic stuff, …
- perfect with gganimate
- pipefriendly
hunspell
- spellcheking in R for text and textanalysis
- lots of advanced functionality
Tesseract
Travis + tic
- add tic to your travis yaml

Scalable Data Science with R and Spark – Best Practices and Lessons Learned

Rdd
DataFrames
Transformers - Actions

ggedit: interactive ggplot aesthetic and theme editor
Exploring correlations in a tidy R framework with corrr

correlations in data frames instead of a matrix
use stretch() for long format
everything pipeable
fcts for printing, plotting, clustering
recomments widyr for structures that are not perfect for tidy format

Exploration of Literature Databases with Shiny

shiny in production

Business intelligence with R bi plattform ending in R backend + data (googlesheets, mysql, python, crm, …) -> googlebigquery -> flexdashboard (incl. shiny, used a child rmd file -> every page has its own file) + highcharter
bsplus: Using Bootstrap to extend your Shiny app

you want to put more stuff into your shiny app
a lot of additional stuff depends on the ui, not the serverside
everything pipeable
is coming to cran

Bringing R Into Dev: Playing Nice With Others use shiny, bash and feather. the latter plays nicely with python and julia
FlashR: Parallelize and Scale R Machine

redefines r matrix functions to work with bigger data (overwrites them)
mainly switches functions that create r objects into functions that create flashR objects
uses same api as base r
brings parallelization out of the box
executes (and stores) out of memory
outperforms revolution r :D
easy to use, fast and you can test it on the webpage flashx.io

talk21 (68 min): Finding and Telling Stories with R – Andrew Flowers link

subtitle “6 types of data stories and how to find them” - good journalism is good storytelling

talk 22 (114 min): Advanced R Markdown Tutorial – Yihui Xie link

rmarkdown = knitr + pandoc
raw (outside of a chunk) latex or html will only work in the specific documenttype
converts to markdown and markdown passes arguments to pandoc
outputformate können über output_format’s basetype argument angepasst werden
yaml is translated into rmarkdown::render arguments
you can set some yml options to null, like theme bootstrap (default) can changed and that makes output sometimes smaller
can pass own css styles
use developer tools (for example in chrome) . you can customize the html tag p for paragraph within the browser (as experimentation add for example color: red;) and later change this directly in rstudio via copy paste and saving the css as external css file and apply this to yml via css:path/of/css/file. you can do the same with javascript. you can do this as a vector; css: [path1, path2, path3]
you can also customize via the template option. defaults for templates are on github.
only output field of yml is for markdown. the rest is for pandoc and can be found in its documentation
some nice deeper customization is available via in.header, before.body, after.body
you can write your own package to extend further options
how markdown handles this internally is explained in the talk
via pre and postproces, you can change some things, that can’t be done with pandoc.
this is for example used to preserve html widgets content in rmarkdown
most important for using/providing a template is the yml output (2 approaches are shown [jss rticlle, tufte handout])
xaringan ports remark.js but some markdown shoudnt be touched by pandoc. therefore you can hide somehow some part of it via the preprocessor step explained before. (the document /html parts can be broken via some line with a split…)
it is an awesome presentation framework, (haked by yihui within 3 days)
bookdown (worked on it whole 2016). outputs: pdf, html, ebooks. writing on a specific postprocessor took most of the time, cause many features had to be synchronized between different output formats (the real challenge is to make sth work for multiple outputformats at the same time.). makes extensive use of regular expressions.
three tips from his life as a software developer:

you cant make everyone happy. focus first on making one person veryvery happy.
use humor and provide little easter eggs. think differently
stand on the shoulders of giants. you dont have to know about c, c++, python, …, but you can reuse frameworks and build up on them or their ideas.
keep calm, say no, say sorry, if it is too complex problem. and has a very easy practical workaround that is easy for the user. it will be ok for them.
reask the user why he wants sth. maybe his intention is strange and afterwards he doesnt have the feature request anymore.
be very open and engagng to users in the open source and pull requests. maybe others can contribute and work further on your package

talk 23 (131): Happy Git and Gihub for the useR Tutorial – Jenny Bryan link

uses allice barlet s slides for her tutorial a lot
r is mirrorerd on gh, by winson chang and cran also, because of gabor csardi
install git, configure, make sure rstudio can find it
there will be a git button in the gui
in rstudio you can do 90% of you r git work, like: make commits, look into your git history, look at gists, pull and push, access other branches than the master branch (but can’t create other branches)
everything that was committed can be revisited
some git clients: rstudio (you can use command line for missing features, also mixing both at the same time), git, sourcetree (preferred by jenny brian), github desktop (is not recommended by jenny bryan), git cracking
rstudio creates gitignore by default
devtools can care abouto license stuff, so you don’t have to specify these thins when creating a repo
you can use ssh, or https (github suggests this) [depends if you cached your credentials or used ssh keys]
always pull before work further or push (after committing), since it’s nicer to prevent than to resolve merge conflicts.
git documentatin is really bad!!!
starlogs.net shows the beginning of a git history as a star wars episode start, with music and so on …
ust github, lab or bucket, …
when you have unresolvable merge conflicts with a remote repo, you can burn the local repo down and recreate it from remote.
keep_md: yes or output: markdown in the yml (or html_document to github_document, which changes rmd to md in the output, so also creates md on github when you push), lets you prevent github from rendering to html or rmarkdown.
rmarkdown rendered as html on github since around 1 year
edit, creations and also deletions have to be staged/committed
you can put yml on top of regular r files also, and create md files instead of r files on github.
github can render repository as a website, since december 16, you just need to have some thins as markdown and data as csv or tsv.
- you can create indexes in every hierarchy of a file. these can link to other places.
- just activate github pages within repository settings
- github pages is a jekyll powered service
- under source choose master branch and safe.
- github can show you differences in commited data analysis results, like diffs in data or images.
github enterprise lets you run github on premise
oh shit git webpage is a nice page for common github issues
you have to pull always when there is sometihin on gh that you dont have. otherwise you cant push

talk 24 (25 min): Opinionated Analysis Development Hilary Parker link

~ try to blame the process and not the person -> when errors occur, optimise the process

talk 25 (26 min): What’s New with the IDE – Kevin Ushey link

uses up to date version 1.0.136 in the talk
- lots of autocompletion via tab, in different linces, in and outside of functions, fuzzily, for paths, …
- works also nice when implementing shiny stuff
- is smart to know, when u want…for example an environment, options, …
- autocompletes the methods for object.method syntax within rcpp
there is a new “Rstudio Home button” in the rstudio-file-pane
diagnostics
- u get symbols noting when code is expected to fail, because of some syntax related errors,
- but there are some other cool thins, like unknown function arguments, no definition in scope, defined, but not used
- configurable inside options -> diagnostics
- some small things, like defining whitespaces around binary operators.
command + enter executes a whole expression
cmd + alt + shift + up/down to expand selections, for example if statements
rename in (parent/bracket) scope (can be done via the gui, but there is also a shortcut)
some nice stuff for roxygen
some nice stuff for s4 classes
highlight ugly code and press cmd + shift + a to format to nice code
strg + alt gives you multiple cursors
there is a document outline view in the gui for rmarkdown now
strg + alt + i to use code chunks
inline latex $$
can execute and render python (and many other) chunks (uses default engine, but you can set this path)
alt + shift + k for shortcuts

talk 26 (22 min): Dashboards made easy – Sean Lopp link

two patterns that many db have in common: some imfo updates in real time, some only at specific timepoints
interactivity
1. use parameterized dashboards (access via rmarkdown render function)
1. bring it into a flexdashboard
also differences between different parameter results will be visible
1. Dashboard + Shiny, enables you to include changes at runtime, and more user interactivity.
difference for shiny: now we are creating an app instead of an html. We need to host it.
cool thing about shiny and rmarkdown in flexdashboard: you don’t have a separate ui and server function (internal this is a trick via rmarkdown::shiny_ui / rmarkdown::shiny_server)
therefore the flexdashboards are harder to debug and don’t scale that well.
many of this will be fixed via runtime:shiny_prerendered, more upcoming in 2017
invalidateLater function uses userinput and also listens to api every n seconds.
demo: “weatehr updates every day, and everything is pulled in once a day”
in future: use shiny_prerendered for calling apis on a regular basis for some dashboard…

talk 27 (21 min): RStudio Server Pro Power Tools – Jonathan McPherson link

big, complex product, many people use it because of one or two features and have no idea that you can do a bunch of cool stuff with it.
customizable login page (branding for company for example or message of the day,…)
highly available, load balancer, all peers, (in the background)
log stuff that goes to the r console, by r or by the user. configurable from the configuration file (r console = input). Data comes out in csv.
multiple versions of R.
choose project, when signing in
you can get notifications, if the server goes down. (some more health checking report functionality available).
customize session launch (environment stuff with clusters and so on [include it in a file…])
disabeling publishing and downloading
give power user profiles and give them more power (cpu)
use specific versions of r for a specific version of r
store new projects in a specific directory (for users). dont now if this isnt also available in desktop version.
stable mapping of users to load balancers nodes (same user to same computer [user hash based load balancer algo])
impersonate a user (take over their action) kind of superuser admin privileges, can do specific things, to help, troubleshoot,…
set defaults for new users (be careful with some options!)
log r session activity to analyze and security stuff…
immortal sessions (default is two hours per session). if you don’t want a session to sleep, you can control this individually per user (set the value in their profile).
more than one session at the same time running (like multiple RStudio Desktops open).
See another sessions output (build into R initially via sink() function)
Graphite monitoring (really easy configurable) gives you lots of monitoring data and options
paired projects an code reviewing. shared sessions can be not optimal, but in RStudio Server Pro you have an option “follow their cursor”.

talk 28 (33 min): Making Websites with R – Yihui Xie link

(a sneak peak of blogdown)
- early beta, test it!
- install from github
- based on static sidegenerator hugo
- really easy to get started.
- call new_site function in an empty director or empty rstudio project: install hugo automatically, downloads a hugo theme from github, loads some sample posts, starts a local webbrowser to preview.
- at hugowebsite is a quickstartguide with 12 steps
- for blogdown just ::new_site
- the output is build on top of bookdown
- so you can use many features like crossreferences
- you can create a project from rsutio: website useing blogdown
- on hugo documentation you can find a lot of themes and they are easy to change
- its not only for blogs, but also for general purpose websites
he thinks rmarkdown could be like the new php (but it might not be the best id)
he wanted to write everything in r, but actually it was possible via hugo to reuse a lot of functionality from there
hugo is written in “go” (via this, you could make r the new php, but might not make that much sense…)
funtcion install_hugo in blogdown package
it takes 1ms to render a page in hugo…but for blogdown, a bit of optimisation work has to be done.
many more static website generators available: jekyll (is very too slow)
hugo has and one can create different themes and templates. But lacks rmarkdown. so yihui added it.
structure:
- content
- themes
- static (js, css files, …)
- public directory (ready to be published on servers, s3, github,… contains generated htmls,…)
- config.tml contains name, theme, …., google analytics decision…
can be easier to use markdown or rmakdown. for first not pandoc is used, instead a go package called black friday, that somehow behaves a bit different.
helpers: new_site, install_hugo, install_theme, (default: hugo litium theme), serve_site rebuilds (calls surfside) and shows preview however, there is an add in “surf side” shown in rstudio,new_post (the only funtion that you use more than once, whenever you make a new post…).
output: blogdown::html_page , wrapped from bookdown (gives a lot of these functionality), tables, citations, titles, …
two addins: view new site and new post
more general than rmarkdown website mode
you can enable cache
supports rss feeds -> rbloggers :)
hugo has everything we want, except rmarkdown…before blogdown…
if a theme does not work with blogdown, it might afford one or small simple tweaks, to make it work.

talk 29 (120 min): Customizing and Extending R Markdown – Yihui Xie link

most important field in yml is “output”
before customizing, check out the available options in the documentation or from the gear button within rstudio
toc options, css, …
ouput somehow influences the possibilities regarding knitr options, pandoc options, pre and postprocessing of html stuff (look at yihuis other talk to get the details of this), other options
can be created via rmarkdown::output_format() function.
important argument is baseformat (for example = html_document, which enables you to override some default html format), if you want to build up on some available format. otherwise you can ignore this argument.
an output format function returns a list (including all options of the format)
the output looks different to some pandoc outputs, since a lot of default cusomizing via integration of other stuff was added by yihui
use developer tools from your browser, to see how you can customize via css for example and the save these in context of your document and pass it to the css option in the yaml.
you can use some pandoc template to build up another template, which makes you much more flexible. read the pandoc documentation minimum once before you do, to know what pandoc options are available.
some examples of extensions are: bookdown, xaringan (remark.js translation to r [for cooler slides]), flexdashboard, tufte, rticles, prettydoc, blogdown

talk 30 (24 min): Understand Code Performance with the profiler – Winston Chang link

hot about how to make r faster
about why is my code slow
example of normalization via apply and subtraction of means
- benchmark with system.time
- another way of benchmarking is Rprof() some code Rprof(NULL),
  - every 20 milliseconds you get a snapshot of the call tree
to get a better resolution use profvis (on cran) use profvis({some code}) or use the rstudio ide select some lines and select profile -> profile selected lines from the menu -> you get different interactive visualisations/apps/widgets which visualise the timeings the call stack and even the garbage collections during the execution
you can use this like microbenchmark:
- profvis({alternative 1 alternative 2 alternative 3})
shows some interesting details from the implementation as example -> detail: data.frame creation is relatively slow. he solved the vectors in a list instead of creating a data.frame, which gave good speedup
you can publish your profvis outputs to rstudioconnect or rpubs
or you can safe your output also (it is an html file with extension rprofvis [for rstudio]) but you can reextend it to .html to open it in a browser
you can profile a shiny app via profile and saving the ouput to disk and looking at it afterwards.
you can source within profvis, to profile stuff from separate files easily

talk 31 (28 min): Extending R with C++: A Brief Introduction to Rcpp – Dirk Eddelbuettel link

computer age statistical inference by hastie and efron (book recommendation)
in this state of the art book, everything is done with r :)
extendig r by j. m. chambers, opening in chapter 1:
- everything in r is an object
- everything that happens in r is a function call
- interfaces to other software are a part of r
r is a c program (and r and fortran, …) and you can expand this, so there is already an api, via the .call interface, .c is deprecated.
you always get an SEXP back (mapping from r objects to c), we can have this for all r objects.
dont need to do memory allocation, just allocate a vector…
library(“Rcpp”)
evalcpp(“2+2”) # testcase
cppfunction(“some code”, plugins = c(“cpp11”))
sourceCpp is workhorse behind evlCpp function
rcpp function initialisation does compile, link, load
use it in/with packages. do “package with rcpp” when creating a package
u can extend your packages with templates for rcpp, eigen, armadillo, devtools
3 ways to extend r with rcpp
- just use the rcpp objects
- use LinkingTo for other only header pacakges like, eigen, armadillo, BH
- doable: external libraries may require a little bit more work but entirely feasible
many rcpp versions of ml algorithms available
gallery rcpp with 100 examples, + book, + website

talk 32 (21 min): R Notebook Workflows – Jonathan McPherson link

Jonathan McPherson is the “leader of the IDE”
nb is just rmarkdown, but you can execute the chunks individually (one at a time)
outputs appear in the document and are saved with it.
there is already an rstudio webinar with an introduction
there is an outline view
cmd + alt + shift + j pick and select a section quickly
2 hidden cmds, you can bind keyboard sequences to run between two chunks
you can click on the navigation bar and land in the chunk that is currently running
there is a shortcut to collapse all codechunks at once
the notebookchunks may not be reproducible, because they can have been run in any order. workarounds:
- knit from a fresh rstudio version
- before you run a chunk, run all chunks before it
when chunks are run, the output lands into the chache. When you safe a notebook, the cache is combined with the document. you get 1. the cache, 2. the document, 3. the underlying code (all three in one notebook)
it is an html file (selfcontained)
you can render to a different outputformat
publish to another format.
- nb can be notebooks and html
- nb can be notebooks and pdf
- nb can be notebooks and github…, …
one of the formats can be run iterative and the other format can be guaranted to be reproducible
publish and collaborate from rstudio connect
when you open an old notebook within rstudio, you can get the saved output cache within your rstudio session.
via download rmd you can get the code from the notebook.
version control: 2 different options.
1. put notebook into gitignore and only version control the rmarkdown. this ensures reproducibility.
2. check the nb in version control. then input and output are versioned together. sometimes its nice, since not everything has to be executed again. If you do this, be aware that this can give conflicts in the output, which can’t be resolved. This gives priority to the collaborators changes on your original work.
the r code in the notebook depends on version of r and packages. how to resolve this is still not clear, but for now it is recommended to use a packrat snapshot before sharing. Collaborator has to unbundle the (before bundled) nb file
if you don’t want that a chunk is run be a collaborateur by default, you can just set eval = FALSE, but of course this chunk can still be ran manually.

talk 33 (18 min): R’s Role in Data Science – Joseph Rickert link

distinction of data engineering vs data science: “most of the time engineers know what they are doing”
ds:bring scientific method into engineering …

Talk 34 (66 min): All Things R and RStudio, Q & A with J.J. Allaire, Hadley Wickham & Joe Cheng, Moderator: Joseph Rickert link

upcoming:

interface to tensorflow deep learning and machine learning
thoughts about default parallelization…problems about api, because parallelisation ways differ so much on different os’ses.

Needed in base R:

64bit integers
out of memory vectors (pointers)