Getting Started with baseballr
Saiem Gilani
2022-04-30
Source:vignettes/baseballr.Rmd
baseballr.Rmd
Welcome folks,
I’m Saiem Gilani, one of the authors of baseballr
, and I hope to give the community a high-quality resource for accessing men’s baseball data for statistical analysis, baseball research, and more. I am excited to show you some of what you can do with this edition of the package.
Installing R and RStudio
- Head to https://cran.r-project.org
- Select the appropriate link for your operating system (Windows, Mac OS X, or Linux)
- Windows - Select base and download the most recent version
- Mac OS X - Select Latest Release, but check to make sure your OS is the correct version. Look through Binaries for Legacy OS X Systems if you are on an older release
- Linux - Select the appropriate distro and follow the installation instructions
- Head to RStudio.com
- Follow the associated download and installation instructions for RStudio.
- Start peering over the RStudio IDE Cheatsheet. An IDE is an integrated development environment.
- For Windows users: I recommend you install Rtools. This is not an R package! It is “a collection of resources for building packages for R under Microsoft Windows, or for building R itself”. Go to https://cran.r-project.org/bin/windows/Rtools/ and follow the directions for installation.
Install baseballr
# You can install using the pacman package using the following code:
if (!requireNamespace('pacman', quietly = TRUE)){
install.packages('pacman')
}
pacman::p_load_current_gh("billpetti/baseballr")
The Data
There are generally speaking eight men’s baseball data sources accessible from this package:
-
baseballr-data
repo - MLB Stats API
- Baseball Savant’s Statcast
- Chadwick Bureau’s Public Register of Baseball Players
- Baseball Reference
- FanGraphs
- Retrosheet
- NCAA
Function names indicate the data source
As of baseballr v1.0.0, a function naming convention was implemented to have the data source indicator appear at the start of the function name:
Functions that use the
baseballr-data
repository will containload_
orupdate_
in the function name and would be considered loading functions for the play-by-play data, team box scores, and player box scores.Functions that use the MLB Stats API start with
mlb_
by convention and should be assumed asget
functions. As ofbaseballr
version 1.2.0, the package exports ~88 functions covering the MLB Stats API.Functions that use one of Baseball Savant’s Statcast APIs start with
statcast_
by convention and should be assumed asget
functions. These functions allow for live access to Statcast data for the MLB games in-progress. As ofbaseballr
version 1.2.0, the package exports ~5 Statcast-related functions.Functions that use Chadwick Bureau’s Public Register of Baseball Players start with
chadwick_
,playerid_
, orplayername_
by convention and should be assumed asget
functions. These functions allow for access to the Bureau’s public register of baseball players. As ofbaseballr
version 1.2.0, the package exports 3 functions sourced using the Chadwick Bureau’s public register of baseball players.Functions that use Baseball Reference’s website start with
bref_
by convention and should be assumed asget
functions. As ofbaseballr
version 1.2.0, the package exports ~4 functions covering Baseball Reference.Functions that use FanGraphs’s baseball website start with
fg_
by convention and should be assumed asget
functions. As ofbaseballr
version 1.2.0, the package exports ~11 functions covering FanGraphs.com.Functions that use Retrosheet’s baseball data start with
retrosheet_
by convention and should be assumed asget
functions. As ofbaseballr
version 1.2.0, the package exports 1 function for Retrosheet Data.Functions that use the NCAA website start with
ncaa_
by convention and should be assumed asget
functions. As ofbaseballr
version 1.2.0, the package exports ~8 function covering the NCAA Stats portal.