- lightning talks 3.01 - bigstatsr package (out of memory stats on big matrix objects) - lightning talks 3.02
- lightning talks 2.02 - graphiT (nette Idee eine Shiny GGplot GUI zu entwickeln) https://analytics.huma-num.fr/Lise.Vaudor/graphiT/ - heatmaply: Tal mentions viridis colourscales in his talk :) - R Tags, widget to load spreadsheet data (looks nice) - Seans packrat talk. Passive approach überlegen und generell packrat::set_opts(use.cache = TRUE) evtl. in Zukunft nutzen.
- lightning talks 4.02 - erster talk western union -> elena zeigen - interessanter workflow für models in production (vestas) - idee data labeln als spiel, um einfacher labels zu bekommen - merkwürdiger aber interessanter talk über rkdb, das anscheinend schneller als data. table ist!?
- lightning talks 2.01
- lightning talks 3.02
- lightning talks 4.01
- R and Haskell (Combining best of both worlds)
- The R6 class system
- efficient r
- matrix objekt statt data.frame beim subsetten viel schneller sein.
- evtl. einen Teil in cmpfun schubsen :)
- aufpassen bei bind rows, wahrscheinlich nicht wichtig
- sponsor talk mango solutions
- Sponsor talk Rconsortium
- Sponsor talk RStudio
- Sponsor talk Data Camp
- Morphological Analysis with R
- “an ordered way of looking at things”
- ein htmlwidget um ergebnisse von verschiedenen kombinationen von features direkt zu highlighten
- Closing
- Community-based learning and knowledge sharing
- Navigating the R package universe
- Exploring and presenting maps with tmap
- “ggplot for maps”
- easy to switch from static to interactive
- interactive mode based on leaflet and mapview
- We R What We Ask: The Landscape of R Users on Stack Overflow
- Maps are data, so why plot data on a map? [NICE-TOP]
- packages: osmdata/osmplotr
- get data from openstreetmap
- extract/plot/highlight data from open street map
- interactive (zooming + more?)
- Sponsor Talk - Microsoft
- Text mining, the tidy way
- tidytext package
- 1 token (word), no punctuation, lowercase
- ReinforcementLearning: A package for replicating human behavior in R
- brms: Bayesian Multilevel Models using Stan [NICE-TOP]
- ohne multilevel (mixed) models bekommen wir zu enge konfidenzbänder (hier) um die koeffizienten
- why bayesian? one reason is: man bekommt die gesammte posterior distribution der koeffizienten, nicht nur die punktschätzer (+ standardabweichung).
- bei standardnormalverteilung der parameter ist das kein großer gewinn, aber bei schiefer verteilung ist das schon ein großer unterschied.
- nachteil: mehr rechenaufwand. (wegen kompilierung meist mindestens 30 Sek)
- papr: Tinder for pre-prints, a Shiny Application for collecting gut-reactions to…
- basierend auf ganz interessanten shiny geschichten
- Teaching psychometrics and analysing educational tests with ShinyItemAnalysis
- Rcpp: From Simple Examples to Machine Learning
- The growing popularity of R in data journalism
- countreg: Tools for count data regression
- usually poisson or negative binomial (when 60-80% zeros, then often zero inflation or hurdal models or other options mentioned)
- Modules in R
- managing dependencies between different r scripts
- options for organising functions in larger codebases
- module examples: scripts, modules, packages
- basic idea is that you can put functions into modules that are organised as a lower hierarchy than packages.
- another cool idea is the import package where you can just load a function from a package instead of the whole pkg via library
- KEYNOTE: R tools for the analysis of complex heterogeneous data
- The analysis of R learning styles with R PLENARY [NICE]
- ompr: an alternative way to model mixed-integer linear programs [TOP]
- easy interface for optimisation
- Interfacing Google’s spherical geometry library (S2) for spatial data in R
- Modelling the environment in R: from small-scale to global applications
- solving ODE’s and PDE’s in R
- Show Me Your Model: tools for visualisation of statistical models [NICE]
- basically a quick tour through model vis packages.
- statement: create plots, when you intend to answer a specific question about your model.
- Sports Betting and R: How R is changing the sports betting world [NICE]
- really cool, if one wants to understand the way betting companies work
- also interesting from an infrastructure point of view.
- Neural Embeddings and NLP with R and Spark
- nice idea: neural embeddings: representation of word as vector: “chocolate” = (0.1, 0.7, …). So words can be compared.
- interesting. should watch again, when I work on NLP-problems.
- jamovi: a spreadsheet for R
- nettes tool für analysen in spreadsheets [NICE]. Entwicklung sollte man definitiv in nochmal checken in einiger Zeit
- mapedit - interactive manipulation of spatial objects
- addition on leaflet and/or mapview!?
- enables you editing/drawing, updating, reshaping, selecting, deleting,… interactively and saving back to the r session
- implemented as shiny module
- currently works well with sf-pkg
- Package ggiraph: a ggplot2 Extension for Interactive Graphics [NICE]
- give a tooltip, id and click events to a ggplot object.
- also good to use with shiny
- supercool bonus example. sound as clickevent on a graphic
- Scraping data with rvest and purrr
- standard selector gadget showcase on a fake money movie rating homepage
- based on secure information in this example you know how the moving average of an artist value changes and can buy his fantasy shares…
- also shows scraping and gaming strategy for fantasy football with real money prizes
- bradio: Add data music widgets to your business intelligence dashboards
- visualisations like numbers, tables, piecharts … are not the only way to consume numbers and data. we could also use other senses, like … ears, to listen to data music traditional pkg: tuneR. Shows a little better (?) approach. Also some other tools exist within R. In genreal: 3 approaches, but all have difficulties. Slow, hard to configure, no concurrency. solution: bradio, a shiny widget to insert music widgets into bi-dashboards.
- idea: think about using datamusic to consume data and get insights.
- moodler: A new R package to easily fetch data from Moodle
- moodle is a learning platform
- it is modular and you can build courses there
- community driven, lots of functionality via plugins
- database + filestorage… onlinux: mysql, php
- logs, discussion forums, quizzes information with timestamps with the database
- after db connection via dbi etc. you can explore moodle’s database via moodler pkg
- Transformation Forests (theoretical, but NICE in the end)
- method where within the splits not only the mean, but also higher moments, like variance etc are taken into account and distributions for each leaf of a try are the result
- nice properties: will end in prediction intervals and almost all stuff that you get by classical likelihood based models.
- with several trees (forest) it’s easier to view the variable importance.
- nice visualisation: partial deciles
- currently not implemented in R.
- Daff: diff, patch and merge for data.frames
- can show any difference in data frames
- diffobj-pkd for other r objects than data frames
- Automatically archiving reproducible studies with Docker
- nice intro into docker
- dockerfile -> use command line to create an image -> build dockerfile: create a container -> u can do whatever you want with your container
- usecase: one project/customer/whatever per container
- easy to integrate into production
- rocker images for different r versions or rstudio, easy to setup
- also bioconductor provides a lot of images.
- own pkg: containerit: creates a container for the current project, from within R.
- can also just create a container for a specific script
- related to packrat, checkpoint, switchr/Granbase, harbor etc.
- also related to other container systems apart from docker
- Updates to the Documentation System for R [TOP]
- complete update. everything that already exists, but also more.
- as r objects. readable and so on
- big design principles
- more ways to document and more ways to read documentation
- parsing, translating between languages etc.
- you can create functions, data frames etc.
- in progress
- pkgnames: parsetools, documentation
- Computer Vision and Image Recognition algorithms for [TOP]
- lots of image pkgs already. some also very domain specific. some interfacing.
- also deep learning.
- imagecornerdetectionf9: gets corners of greyscale images
- image.cannyedges: removes noise, rms gradients, keeps edges
- image.lineSegmentDetector: finds lines of i.e. buildings within pictures
- image.ContourDetector
- use magick pkg to greyscale a pgm picture
- image.dlib
- image.darknet interface to deep learning. detects positions of certain objects in pictures
- R4ML: A Scalable R for Machine Learning
- r package for data mining on spark
- Implementing Predictive Analytics projects in corporate environments
- Show me the errors you didn’t look for
- idea: check every variable and make tables, plots, notes to handout and recommunicate with the client, to ask them. Does this makes sense? How is this ment? And so on.
- pkg contains functions/framework to contro specific data types for specific common problems
- implyr: A dplyr Backend for a Apache Impala
- sparklyr partnership with rstudio
- cloudera data science workbench
- implyr: layer between dplyr and impala
- 10 rules for creating a dplyr SQL backend
- determine if it is necessary. Maybe you just can make a small contribution to dbplyr
- persist. You will have to ask embarassing questions.
- know the database inside and out.
- know dbplyr inside and out. Will help you to troubleshoot problems. Fork the repository from github. Set breakpoints to understand everything.
- stay idiomatic / stay tidy. Try to use the conventions, which are already there
- Create automated tests. DB tests need to be skippe, when send to CRAN, since they won’t have access to the database.
- run a dedicated database instance. Just for tests. Try to use travis CI. But they shouldn’t need to always install a db. Instead give them access to your db hosted in the cloud.
- Allow different ways of connecting. JDBC vs ODBC
- Evolve with dbplyr (there might be breaking changes)
- Evolve with the database (also changes in the database might happen)
- Rc2: an Environment for Running R and Spark in Docker Containers
- idea: orchestra all components, which could be in dockercontainers, via cubernetes
- RC^2 helps to spin up these containers in the right order and so on
- nice to manage containers and logs for each one during a session. also have an own gui.
- The renjin package: Painless Just-in-time Compilation for High Performance R [NICE]
- an alternative interpreter built on java vm. (optimized and easy to integr. in other systems like DBs and java base systems like flink, …). AST-interpreter (substitute(x)), Vector pipeliner (sum(vector)), JIT loop compiler (compiles big for loobs jit so it’s faster). Vector pipeliner sees independent calculations and can automatically parallelize
- one goal: being compatible to gnuR. So make it possible to exchange interpreters on the fly.
- idea: maybe for now just switch to renjin where you have bottlenecks in your code.
- install: source(“http://renjin.org/install.R”). Then library(renjin) -> renjin({}).
- optimizes loops only, when there is not used substitute, deparse, eval, source. otherwise you’ll get a warning.
- lastly you can use it instead of writing c++ code.
- R-based computing with big data on disk [NICE]
- matter pkg
- introduce s4 classes matter and atom.
- objects are just parts of datastructures on disk.
- on bigger data matter becomes faster and less memory hungry than ff- and bigmemory-pkgs
- Bayesian social network analysis with Bergm