My Website
  • Home
  • About
  • rstudioconf
  • user2017
  • H2OWorld2017
  • lightning talks 3.01 - bigstatsr package (out of memory stats on big matrix objects) - lightning talks 3.02
  • lightning talks 2.02 - graphiT (nette Idee eine Shiny GGplot GUI zu entwickeln) https://analytics.huma-num.fr/Lise.Vaudor/graphiT/ - heatmaply: Tal mentions viridis colourscales in his talk :) - R Tags, widget to load spreadsheet data (looks nice) - Seans packrat talk. Passive approach überlegen und generell packrat::set_opts(use.cache = TRUE) evtl. in Zukunft nutzen.
  • lightning talks 4.02 - erster talk western union -> elena zeigen - interessanter workflow für models in production (vestas) - idee data labeln als spiel, um einfacher labels zu bekommen - merkwürdiger aber interessanter talk über rkdb, das anscheinend schneller als data. table ist!?
  • lightning talks 2.01
  • lightning talks 3.02
  • lightning talks 4.01
  • R and Haskell (Combining best of both worlds)
  • The R6 class system
  • efficient r
    • matrix objekt statt data.frame beim subsetten viel schneller sein.
    • evtl. einen Teil in cmpfun schubsen :)
    • aufpassen bei bind rows, wahrscheinlich nicht wichtig
  • sponsor talk mango solutions
  • Sponsor talk Rconsortium
  • Sponsor talk RStudio
  • Sponsor talk Data Camp
  • Morphological Analysis with R
    • “an ordered way of looking at things”
    • ein htmlwidget um ergebnisse von verschiedenen kombinationen von features direkt zu highlighten
  • Closing
  • Community-based learning and knowledge sharing
  • Navigating the R package universe
  • Exploring and presenting maps with tmap
    • “ggplot for maps”
    • easy to switch from static to interactive
    • interactive mode based on leaflet and mapview
  • We R What We Ask: The Landscape of R Users on Stack Overflow
  • Maps are data, so why plot data on a map? [NICE-TOP]
    • packages: osmdata/osmplotr
    • get data from openstreetmap
    • extract/plot/highlight data from open street map
    • interactive (zooming + more?)
  • Sponsor Talk - Microsoft
  • Text mining, the tidy way
    • tidytext package
    • 1 token (word), no punctuation, lowercase
  • ReinforcementLearning: A package for replicating human behavior in R
  • brms: Bayesian Multilevel Models using Stan [NICE-TOP]
    • ohne multilevel (mixed) models bekommen wir zu enge konfidenzbänder (hier) um die koeffizienten
    • why bayesian? one reason is: man bekommt die gesammte posterior distribution der koeffizienten, nicht nur die punktschätzer (+ standardabweichung).
      • bei standardnormalverteilung der parameter ist das kein großer gewinn, aber bei schiefer verteilung ist das schon ein großer unterschied.
      • nachteil: mehr rechenaufwand. (wegen kompilierung meist mindestens 30 Sek)
  • papr: Tinder for pre-prints, a Shiny Application for collecting gut-reactions to…
    • basierend auf ganz interessanten shiny geschichten
  • Teaching psychometrics and analysing educational tests with ShinyItemAnalysis
  • Rcpp: From Simple Examples to Machine Learning
  • The growing popularity of R in data journalism
  • countreg: Tools for count data regression
    • usually poisson or negative binomial (when 60-80% zeros, then often zero inflation or hurdal models or other options mentioned)
  • Modules in R
    • managing dependencies between different r scripts
    • options for organising functions in larger codebases
    • module examples: scripts, modules, packages
    • basic idea is that you can put functions into modules that are organised as a lower hierarchy than packages.
    • another cool idea is the import package where you can just load a function from a package instead of the whole pkg via library
  • KEYNOTE: R tools for the analysis of complex heterogeneous data
  • The analysis of R learning styles with R PLENARY [NICE]
  • ompr: an alternative way to model mixed-integer linear programs [TOP]
    • easy interface for optimisation
  • Interfacing Google’s spherical geometry library (S2) for spatial data in R
  • Modelling the environment in R: from small-scale to global applications
    • solving ODE’s and PDE’s in R
  • Show Me Your Model: tools for visualisation of statistical models [NICE]
    • basically a quick tour through model vis packages.
    • statement: create plots, when you intend to answer a specific question about your model.
  • Sports Betting and R: How R is changing the sports betting world [NICE]
    • really cool, if one wants to understand the way betting companies work
    • also interesting from an infrastructure point of view.
  • Neural Embeddings and NLP with R and Spark
    • nice idea: neural embeddings: representation of word as vector: “chocolate” = (0.1, 0.7, …). So words can be compared.
    • interesting. should watch again, when I work on NLP-problems.
  • jamovi: a spreadsheet for R
    • nettes tool für analysen in spreadsheets [NICE]. Entwicklung sollte man definitiv in nochmal checken in einiger Zeit
  • mapedit - interactive manipulation of spatial objects
    • addition on leaflet and/or mapview!?
    • enables you editing/drawing, updating, reshaping, selecting, deleting,… interactively and saving back to the r session
    • implemented as shiny module
    • currently works well with sf-pkg
  • Package ggiraph: a ggplot2 Extension for Interactive Graphics [NICE]
    • give a tooltip, id and click events to a ggplot object.
    • also good to use with shiny
    • supercool bonus example. sound as clickevent on a graphic
  • Scraping data with rvest and purrr
    • standard selector gadget showcase on a fake money movie rating homepage
    • based on secure information in this example you know how the moving average of an artist value changes and can buy his fantasy shares…
    • also shows scraping and gaming strategy for fantasy football with real money prizes
  • bradio: Add data music widgets to your business intelligence dashboards
    • visualisations like numbers, tables, piecharts … are not the only way to consume numbers and data. we could also use other senses, like … ears, to listen to data music traditional pkg: tuneR. Shows a little better (?) approach. Also some other tools exist within R. In genreal: 3 approaches, but all have difficulties. Slow, hard to configure, no concurrency. solution: bradio, a shiny widget to insert music widgets into bi-dashboards.
    • idea: think about using datamusic to consume data and get insights.
  • moodler: A new R package to easily fetch data from Moodle
    • moodle is a learning platform
    • it is modular and you can build courses there
    • community driven, lots of functionality via plugins
    • database + filestorage… onlinux: mysql, php
    • logs, discussion forums, quizzes information with timestamps with the database
    • after db connection via dbi etc. you can explore moodle’s database via moodler pkg
  • Transformation Forests (theoretical, but NICE in the end)
    • method where within the splits not only the mean, but also higher moments, like variance etc are taken into account and distributions for each leaf of a try are the result
    • nice properties: will end in prediction intervals and almost all stuff that you get by classical likelihood based models.
    • with several trees (forest) it’s easier to view the variable importance.
    • nice visualisation: partial deciles
    • currently not implemented in R.
  • Daff: diff, patch and merge for data.frames
    • can show any difference in data frames
    • diffobj-pkd for other r objects than data frames
  • Automatically archiving reproducible studies with Docker
    • nice intro into docker
    • dockerfile -> use command line to create an image -> build dockerfile: create a container -> u can do whatever you want with your container
    • usecase: one project/customer/whatever per container
    • easy to integrate into production
    • rocker images for different r versions or rstudio, easy to setup
    • also bioconductor provides a lot of images.
    • own pkg: containerit: creates a container for the current project, from within R.
    • can also just create a container for a specific script
    • related to packrat, checkpoint, switchr/Granbase, harbor etc.
    • also related to other container systems apart from docker
  • Updates to the Documentation System for R [TOP]
    • complete update. everything that already exists, but also more.
    • as r objects. readable and so on
    • big design principles
    • more ways to document and more ways to read documentation
    • parsing, translating between languages etc.
    • you can create functions, data frames etc.
    • in progress
    • pkgnames: parsetools, documentation
  • Computer Vision and Image Recognition algorithms for [TOP]
    • lots of image pkgs already. some also very domain specific. some interfacing.
    • also deep learning.
    • imagecornerdetectionf9: gets corners of greyscale images
    • image.cannyedges: removes noise, rms gradients, keeps edges
    • image.lineSegmentDetector: finds lines of i.e. buildings within pictures
    • image.ContourDetector
    • use magick pkg to greyscale a pgm picture
    • image.dlib
    • image.darknet interface to deep learning. detects positions of certain objects in pictures
  • R4ML: A Scalable R for Machine Learning
    • r package for data mining on spark
  • Implementing Predictive Analytics projects in corporate environments
  • Show me the errors you didn’t look for
    • idea: check every variable and make tables, plots, notes to handout and recommunicate with the client, to ask them. Does this makes sense? How is this ment? And so on.
    • pkg contains functions/framework to contro specific data types for specific common problems
  • implyr: A dplyr Backend for a Apache Impala
    • sparklyr partnership with rstudio
    • cloudera data science workbench
    • implyr: layer between dplyr and impala
    • 10 rules for creating a dplyr SQL backend
      1. determine if it is necessary. Maybe you just can make a small contribution to dbplyr
      2. persist. You will have to ask embarassing questions.
      3. know the database inside and out.
      4. know dbplyr inside and out. Will help you to troubleshoot problems. Fork the repository from github. Set breakpoints to understand everything.
      5. stay idiomatic / stay tidy. Try to use the conventions, which are already there
      6. Create automated tests. DB tests need to be skippe, when send to CRAN, since they won’t have access to the database.
      7. run a dedicated database instance. Just for tests. Try to use travis CI. But they shouldn’t need to always install a db. Instead give them access to your db hosted in the cloud.
      8. Allow different ways of connecting. JDBC vs ODBC
      9. Evolve with dbplyr (there might be breaking changes)
      10. Evolve with the database (also changes in the database might happen)
  • Rc2: an Environment for Running R and Spark in Docker Containers
    • idea: orchestra all components, which could be in dockercontainers, via cubernetes
    • RC^2 helps to spin up these containers in the right order and so on
    • nice to manage containers and logs for each one during a session. also have an own gui.
  • The renjin package: Painless Just-in-time Compilation for High Performance R [NICE]
    • an alternative interpreter built on java vm. (optimized and easy to integr. in other systems like DBs and java base systems like flink, …). AST-interpreter (substitute(x)), Vector pipeliner (sum(vector)), JIT loop compiler (compiles big for loobs jit so it’s faster). Vector pipeliner sees independent calculations and can automatically parallelize
    • one goal: being compatible to gnuR. So make it possible to exchange interpreters on the fly.
    • idea: maybe for now just switch to renjin where you have bottlenecks in your code.
    • install: source(“http://renjin.org/install.R”). Then library(renjin) -> renjin({}).
    • optimizes loops only, when there is not used substitute, deparse, eval, source. otherwise you’ll get a warning.
    • lastly you can use it instead of writing c++ code.
  • R-based computing with big data on disk [NICE]
    • matter pkg
    • introduce s4 classes matter and atom.
    • objects are just parts of datastructures on disk.
    • on bigger data matter becomes faster and less memory hungry than ff- and bigmemory-pkgs
  • Bayesian social network analysis with Bergm