CGPlot2/Plotnine
SYLLABUS: The Grammar of Graphics, Creating Plots, Changing Geoms, Stats, Faceting,
Coordinates, Annotations, Scaling, Themes, Legends, and Palettes, Visualization Examples.
Grammar of Graphics
The Grammar of Graphics is a systematic approach to data visualization that allows users to
construct graphs by combining multiple independent components. Instead of relying on
predefined chart types, this framework focuses on how data variables are mapped to visual
properties and how different layers interact to produce a complete visualization. In Python,
the library Plotnine implements this concept, inspired by ggplot2 from R.
In this approach, a plot begins with a dataset and a mapping of variables to aesthetics such as
position (x and y axes), color, size, and shape. These mappings define how the data will be
visually represented. The next step involves adding geometric objects, known as “geoms,”
which determine the type of visualization such as points, lines, or bars. Statistical
transformations may also be applied to summarize or modify the data before plotting.
Additional components such as scales, coordinate systems, facets, and themes further refine
the visualization.
The strength of the Grammar of Graphics lies in its flexibility and modularity. Each
component can be independently modified, making it easy to experiment with different visual
representations while maintaining consistency in structure.
Example 1
!pip install plotnine
from plotnine import *
from plotnine.data import mtcars
ggplot(mtcars, aes(x='wt', y='mpg')) + geom_point()
,Example 2
ggplot(mtcars, aes(x='factor(cyl)', y='mpg')) + geom_boxplot()
Creating Plots
Creating plots in Plotnine involves initializing a plotting object using the ggplot() function
and then progressively adding layers to it. The function takes a dataset and an aesthetic
mapping as its primary arguments. The aesthetic mapping defines how variables in the
dataset correspond to visual elements such as axes, colors, or shapes.
Once the base plot is created, geometric layers are added using the + operator. Each layer
contributes to the final visualization by adding specific graphical elements. This layered
approach allows users to build complex visualizations step by step while maintaining clarity
in code structure.
The syntax is intuitive and readable, making it easier for users to understand how each
component contributes to the overall plot. This method also promotes reusability and
flexibility in visualization design.
Example 1
ggplot(mtcars, aes(x='hp', y='mpg')) + geom_point()
, Example 2
ggplot(mtcars, aes(x='wt', y='mpg')) + geom_line()