Introduction
Data visualization, a key component in data science, simplifies complex information, enabling quick comprehension and pattern recognition. Charts and graphs make data analysis easier, supporting strategic decisions and driving business growth. Versatile in application, they assist in sales forecasting, stock analysis, project management, and more. Like technology, visuals have become essential in modern life, enhancing our understanding of the world. Our eyes are naturally adept at extracting insights from visuals, which communicate efficiently and capture attention. While spreadsheets store data, visual representations transform it into accessible and meaningful narratives. In this article, we will explore creating interactive plots using Plotly in Python.
Learning Objectives
- Grasp the importance of data visualization and its diverse applications.
- Discover how interactive visualization enhances data engagement and exploration.
- Acquire hands-on skills in data visualization in Python using libraries such as pandas, Plotly, and others.
Table of contents
- Introduction
- What Is Data Visualization?
- Some Practical Uses of Data Visualization
- Increasing Importance of Data Analytics
- How Can Data Visualization Help?
- Sample Problem Statement
- History of Data Visualization
- Use of Python in Data Visualization
- Libraries in Python for Data Analysis
- Getting Started With Plotting in Plotly
- Scatter Plots Using Plotly
- Line Plots Using Plotly
- Bar Plots Using Plotly
- Pie Chart Using Plotly
- Bubble Charts Using Plotly
- Dot Plots Using Plotly
- Horizontal Bar Chart Using Plotly
- Gantt Chart
- Box Plots Using Plotly
- Histograms
- Conclusion
What Is Data Visualization?
Data visualization involves representing data through familiar graphics, such as charts, plots, infographics, and even animations. For data to be impactful, findings and visuals must be well-presented. Effective data visuals capture the audience's attention and communicate the intended message clearly. Across all departments and sectors, from finance to marketing, sales, technology, engineering, research, and human resources, high-quality data analysis and visualization will continue to be in demand. These visuals are essential for conveying specific messages and clarifying complex information, making data interpretation straightforward and accessible.
Just as machine learning supports predictive analytics and forecasting, strong visualizations facilitate data exploration and understanding.
Some Practical Uses of Data Visualization
There are numerous types of data visualizations, each designed to serve specific purposes and meet distinct needs.
Tracking Changes in Data Overtime
Data with an associated timestamp is known as time-series data. Examples include stock prices, sales data over time, rainfall and temperature records, and road traffic at specific locations. This type of data is valuable for tracking trends and observing changes over time. By analyzing time-series data, we can gain insights, such as fluctuations in stock prices or identifying peak traffic days. Observing trends over time allows for informed analysis and decision-making.
Understanding Correlations
Data is often interrelated. For example, a supermarket's sales may be closely tied to the vehicular traffic on the road outside, and students’ test scores may increase with more hours of study. Effective data visualizations should reveal these relationships, enabling analysts and users to understand and explore the connections within the data.
Determining Frequency
Tracking the count and frequency of items is essential to understand how often certain events or patterns occur. Many types of data visualizations are specifically designed to display frequency, helping us monitor and analyze recurring trends effectively.
Analysing Importance/ Risk/ Value
Many data visualizations are designed to examine the distribution of specific variables, revealing potential insights, risks, or values within the data. By applying various metrics and plotting techniques, data can be assessed to understand key parameters and patterns. Effective data visualization is a vital step in the analytics process, facilitating deeper insights and informed decision-making.
Increasing Importance of Data Analytics
The amount of data available on the internet is expanding rapidly, with nearly every online action—whether logging into a website, making a purchase, booking a ride, or ordering food—being stored as data. We are indeed entering the age of Big Data, where vast data volumes demand effective processing and analysis. To harness its full potential, data must be made more understandable, readable, and interpretable. Real-life applications of effective data visualization are extensive; a data-driven organization that leverages insights for decision-making is poised to outperform those that don’t.
With advances in large-scale data storage, data is now accessible for diverse uses. Major organizations like Google, Facebook, and Amazon utilize their data for a range of purposes, where data-driven decisions directly contribute to better business outcomes.
How Can Data Visualization Help?
- Capture Attention: Good visuals immediately grab the audience's attention and help them understand complex information more effectively.
- Simplify Data Exploration: Visualizations make it easier and more accessible to explore large datasets, allowing users to quickly identify trends, patterns, and outliers.
- Easy Sharing of Insights: Data insights and visuals can be shared effortlessly, promoting collaboration and ensuring that key information is communicated clearly.
- Improve Understanding: Visual representations make it easier for users to comprehend data, even when dealing with complex or abstract concepts.
- Enhance Insights: Well-designed visuals provide deeper insights into the data, helping users uncover important findings and relationships.
- Enable Swift Decision-Making: With clear and impactful visualizations, organizations can make quick, data-driven decisions, improving efficiency and outcomes.
Sample Problem Statement
Let us consider a hypothetical scenario. A teacher has the marks of all students in her class. Along with it, she has data for students’ past marks and other grades. All of this data is, however, in spreadsheets. Now, she wants to analyze from her data which students are performing the best, which students’ exam performance has improved, and which students’ performance has decreased, and so on. All this might be possible with spreadsheets, but the amount of effort that needs to be given is too high.
Thankfully, excel has in-built data visualization tools, and the data can be analyzed simply and easily. The teacher can easily check all the data, find out who had the highest score, etc. Data Analysis tools are there to help us in such aspects. Nowadays, we are at liberty to use Excel, Power BI, and Tableau as no-code solutions, and we can also use Python and R if we want custom solutions and data pipelines. These tools serve the purpose of processing the data and making our desired visuals. The use of such tools helps us in automating the data visualization process.
History of Data Visualization
Python is an exceptional tool for data visualization and has a wide range of applications across various domains, including statistical analysis, machine learning, deep learning, web development, and more. Its versatility, ease of use, and extensive library support make Python an excellent choice for performing complex numeric and scientific calculations. Its popularity continues to rise, particularly in the fields of data analysis and data science.
Being open-source and free, Python provides access to numerous libraries, resources, and support, making it a cost-effective solution for data analysis. The vast Python community and the abundance of online resources make it easy to learn and implement. Python is flexible, scalable, and regularly updated, which reduces the cost and complexity of data analysis. Unlike proprietary tools like Power BI and Tableau, which can be expensive, Python’s libraries are continuously evolving, making data analysis processes faster and simpler.
While tools like Power BI, Tableau, and R are valuable for data analysis, Python should also be a core part of any data analyst’s toolkit. Its hyper-flexibility, wide adoption in the industry, and large selection of integrated development environments (IDEs) like Google Colab, Kaggle Kernel, and Jupyter Notebooks make Python ideal for data visualization. Python's graphical options are intuitive, and its constant evolution ensures it remains a powerful, feature-rich, and highly functional tool for data analysis.
Use of Python in Data Visualization
Python is an excellent tool for data visualization, offering a wide range of applications across various domains, such as statistical analysis, machine learning, deep learning, and web development. Its simplicity, combined with a rich ecosystem of libraries, makes it ideal for performing complex numeric and scientific calculations. Python's versatility and increasing popularity make it a go-to tool for many data professionals.
As an open-source and free language, Python provides access to numerous libraries and resources, making it a cost-effective choice for data analysis. It runs on multiple platforms and has a strong support network, with forums and help readily available online. The vibrant community and the abundance of learning resources make Python a great investment for anyone looking to delve into data analysis. With its flexibility, scalability, and continuous updates, Python can help reduce data analysis costs, especially when compared to expensive software like Power BI and Tableau. Its evolving libraries streamline the process and simplify tasks, making data analysis more efficient.
While tools like Power BI, Tableau, Excel, and R are also commonly used in data analysis, Python should be an essential part of every data analyst’s toolkit. Its flexibility, coupled with its popularity among data analysts and data scientists, makes it highly valuable. Python supports various IDEs and environments such as Google Colab, Kaggle Kernel, and Jupyter Notebooks, which allow users to easily visualize data. The built-in graphical options in Python further enhance its suitability for data analysis, and its constant evolution ensures that it remains a powerful, multi-featured, and highly functional tool for data visualization.
Libraries in Python for Data Analysis
Python originally started as a general-purpose programming language, but its improved readability and simple syntax quickly made it a powerful tool for data analysis. The ease with which data can be manipulated and visualized in Python, coupled with its extensive libraries and frameworks, has made it one of the most popular languages for data science and analytics. The clear and concise nature of Python code allows data analysts and scientists to focus on solving problems, making it an ideal choice for exploring and analyzing data.
Matplotlib
One of the best tools for data analysis in Python is **Matplotlib**. Introduced in 2002 by John Hunter, Matplotlib is a powerful library primarily used for 2-dimensional data analysis and basic plotting, charting, and data representation. Its simplicity and flexibility have made it a foundational tool in Python for research, data analysis, and engineering.
Matplotlib allows users to create a wide variety of visualizations, such as bar graphs, line charts, scatter plots, and more. The library's ability to produce clear and interpretable visuals has made it popular among data analysts and scientists. Its ease of use and extensive customization options make it a go-to solution for visualizing data in a straightforward and effective manner.
Also read: Introduction to Plotting with Matplotlib in Python
Seaborn
Seaborn is a powerful visualization library in Python, specifically designed for statistical data visualization. It excels at plotting complex relationships and statistical models, making it easy to create advanced visualizations like Heatmaps, Relational Plots, Categorical Plots, and Regression Plots. Seaborn simplifies the process of visualizing complex datasets, offering high-level functions that make it easier to execute sophisticated data analysis and visualization.
However, both Seaborn and Matplotlib have limitations, particularly in terms of interactivity. These libraries produce static plots, meaning the visualizations are rendered as images. This prevents users from interacting with the plots—such as hovering over data points to view exact values. Additionally, they are not suitable for creating interactive plots for use on websites.
Also read: Python Seaborn Tutorial For Beginners: Easy Data Visualization
Plotly
Plotly is a Montreal-based AI and analytics company specializing in the development of advanced analytics tools, such as **Dash** and **Chart Studio**. In addition to these, they have released the free and open-source plotting library **Plotly** for Python, R, MATLAB, and Julia, which has become highly popular in the data science community.
Plotly stands out by enabling the creation of **interactive graphs** that can be embedded in websites. Its robust and high-quality plots offer a wide variety of complex visualization options, which makes it accessible for a diverse audience, from data analysts to web developers. These visualizations are not only visually appealing but also easy to read and interpret, providing valuable insights into the data.
Plotly supports a wide range of charts, including basic charts like bar graphs and line charts, more advanced statistical visualizations, maps, 3D charts, subplots, and much more. Its flexibility and interactivity make it a great choice for anyone looking to create dynamic, user-friendly visualizations that can be shared and explored online.
Getting Started With Plotting in Plotly
I have prepared the syntax in a Kaggle Notebook and will provide the GitHub link later. Please refer to it once I share the link, so you can better understand the code. To start, we first import the necessary libraries.
import numpy as np import pandas as pd import plotly.express as px
Now, we read some data we will be using.
The two datasets used here are:
- Melbourne Housing Snapshot
- Superstore Sales Dataset
Both datasets are excellent for beginners, offering a wealth of information and a variety of data fields. The Melbourne Housing dataset includes various real estate data points, focusing on the housing and commercial property sector. It provides insights into property prices, locations, and other key real estate metrics.
The Superstore dataset, on the other hand, pertains to sales and the retail sector. It encompasses various aspects of sales, such as product categories, order details, and customer information, giving us a comprehensive view of the retail business.
Now, let's proceed with reading the data.
melb= pd.read_csv("/kaggle/input/melbourne-housing-snapshot/melb_data.csv") sales=pd.read_csv("/kaggle/input/sales-forecasting/train.csv")
The Melbourne data is a bit large. For the sake of simplicity, we are taking only 1000 data points from the dataset.
melb=melb[0:1000]
Scatter Plots Using Plotly
Scatterplots are an excellent tool for analyzing data distribution and exploring the relationships between different variables. By plotting trends along the x-axis and y-axis, you can easily identify patterns and correlations within the data. Creating scatter plots with Plotly is straightforward and user-friendly.
x=[0, 1, 2, 3, 4, 5, 6]
y=[0, 2, 4, 5, 5.5, 7, 9] fig = px.scatter(x, y) fig.show()
One of the great features of Plotly is its interactivity. You can hover over the plots to view precise data values and additional details. I will share the link to the notebook shortly, so you can explore it. If you find it helpful, feel free to upvote the Kaggle notebook!
Next, let's work with the Iris dataset and create a scatter plot to visualize the data distribution.
# importing the library import plotly.express as px #we take the iris dataset now df = px.data.iris() # let's have a look at the data once print(df.head())
Making some changes to the parameters.
fig = px.scatter(df, x="sepal_width", y="sepal_length", color='petal_width') fig.show()
Adding some styles to the plots.
fig = px.scatter(df, x="sepal_width", y="sepal_length", color='species') fig.show()
Start plotting some data using the Melbourne dataset.
fig = px.scatter(df, y="petal_length", x="petal_width", color="species", symbol="species") fig.update_traces(marker_size=10)
Adding some columns to the plots.
fig = px.scatter(melb, x="Lattitude", y="Longtitude", marginal_x="histogram", marginal_y="rug",color="Type") fig.show()
Now, let’s change the parameters.
fig = px.scatter(melb, x="Price", y="YearBuilt", color="Type", facet_col="Rooms", ) fig.show()
We will now change the parameters.
fig = px.scatter(melb, x="Price", y="YearBuilt", color="Rooms", facet_col="Type", ) fig.show()
fig = px.scatter(melb, x="BuildingArea", y="Distance", color="Rooms", facet_col="Type", ) fig.show()
fig = px.scatter(melb, x="BuildingArea", y="Distance", color="Car", facet_col="Type", ) fig.show()
As we can see, all the plots created with Plotly are visually appealing and well-designed. The color schemes are vibrant and make it easy to interpret the data.
In addition to basic scatter plots, Plotly also allows for the creation of Linear Regression plots. For instance, we can use the Dips dataset to visualize the linear relationship between total bills and tips.
#linear regression df = px.data.tips() fig = px.scatter(df, x="total_bill", y="tip", trendline="ols") fig.show()
We can see that the linear plot is quite well made, and all the plots are interactive.
Check the Kaggle notebook here: Link
Line Plots Using Plotly
Line plots are excellent for visualizing continuous data. They are particularly useful for time-series data, mathematical functions, and other types of data that change over time. Line plots can reveal important trends, such as maxima, minima, and overall data patterns. They are ideal for visualizing time-related data, such as stock prices, sales figures over time, and more. Essentially, line plots provide a clear way to depict the relationship between two variables in a 2D space.
Let us use a line plot to plot a mathematical function.
x = np.linspace(0, 10, 1000) y= 3*x**2 - 2*x**2 + 4*x- 5 fig = px.line(x=x ,y =y,labels={'x':'x', 'y':'y'}) fig.show()
The plot is interactive, allowing us to hover over it and see the exact values, which makes the data exploration much more intuitive.
Now, let's plot a sin() function to visualize how it behaves over a range of values.
x = np.linspace(0, 10, 1000) y= np.sin(x) fig = px.line(x=x ,y =y,labels={'x':'x', 'y':'sin(x)'}) fig.show()
Next, let's plot some time-series data, beginning with stock data.
The stock symbol for Microsoft is "MSFT."
df = px.data.stocks() fig = px.line(df, x='date', y="MSFT") fig.show()
Now, I will add more stocks to the plot.
GOOG represents Google, FB represents Facebook, and AMZN represents Amazon.
df = px.data.stocks() fig = px.line(df, x='date', y=["MSFT","GOOG",'FB',"AMZN"]) fig.show()
We can see that all the plots are visually appealing and look nice with contrasting colors.
Now, we use some data from the Plotly library for some sample plotting.
df = px.data.gapminder().query("continent == 'Oceania'")
Let us check what the data looks like.
df.head()
We plot the data on a line plot now.
fig = px.line(df, x='year', y='pop', color='country') fig.show()
We can see that the plot card also shows the data and other parameters on a convenient line plot. Now, add some markers so that the data is easily visible.
fig = px.line(df, x='year', y='pop', color='country',markers=True) fig.show()
The plot has been made!
Now, a plot with different types of visuals will be made.
import plotly.graph_objects as go #combined plots N=100 random_x = np.linspace(0, 5, N) random_y0 = np.random.randn(N) + 5 random_y1 = np.random.randn(N) random_y2 = np.random.randn(N) - 5 fig = go.Figure() # Add traces fig.add_trace(go.Scatter(x=random_x, y=random_y0, mode='lines+markers', name='lines+markers')) fig.add_trace(go.Scatter(x=random_x, y=random_y1, mode='markers', name='markers')) fig.add_trace(go.Scatter(x=random_x, y=random_y2, mode='lines', name='lines')) fig.show()
This type of plot is known as a combined plot.
Combined plots are an excellent way to analyze data from multiple perspectives.
Bar Plots Using Plotly
Barplots are used to offer a clear comparison of data. They display categorical data using rectangular bars, where the height of each bar corresponds to the value it represents. Plotting bar charts in a graphing library like Plotly is straightforward and simple. Let's begin by plotting the population of Australia over time.
df = px.data.gapminder().query("country == 'Australia'") fig = px.bar(df, x='year', y='pop') fig.show()
Let us work on the sales data we had taken earlier. But, for the sake of simplicity, we take only the initial 100 data points.
sales=sales[0:100]
Let us plot sales in each US State.
fig = px.bar(sales, x="State", y="Sales") fig.show()
It also individually shows the sales figure of each sale.
Now, we analyze the sales category, and for that, we bring in another parameter.
fig = px.bar(sales, x="State", y="Sales",color='Category') fig.show()
Now, we plot the sales of each category and add a parameter to distinguish segments.
fig = px.bar(sales, x="Category", y="Sales",color='Segment') fig.show()
Next, we give a pattern shape to the plots.
fig = px.bar(sales, x="Category", y="Sales",color="Segment",pattern_shape="Segment", pattern_shape_sequence=[".", "x", "+"]) fig.show()
Now, let us add hues and more advanced colour interpretations to a plot. These improve the readability of the plot.
data = px.data.gapminder() data_canada = data[data.country == 'Canada'] fig = px.bar(data_canada, x='year', y='pop', hover_data=['lifeExp', 'gdpPercap'], color='lifeExp', labels={'pop':'population of Canada'}, height=400) fig.show()
We can clearly observe that both the population and life expectancy of Canada have increased over time. This can be attributed to better healthcare, improved medicines, and an overall increase in the quality of life.
As the life expectancy rises, the hue becomes brighter, as indicated by the color bar on the right.
Now, let's take a look at the GDP per capita.
fig = px.bar(data_canada, x='year', y='pop', hover_data=['lifeExp', 'gdpPercap'], color='gdpPercap', labels={'pop':'population of Canada'}, height=400) fig.show()
The GDP per capita has improved over time, which can be seen as an indication that the overall quality of life has also improved.
Now, let's create some stacked bar charts. An important consideration when plotting and representing data is knowing when to use which type of chart and understanding the significance of the data. Choosing the right chart is crucial to avoid any visualization mistakes.
Let's move on to some new data for this next analysis.
df = px.data.gapminder().query("continent == 'Oceania'")
df.head()
How we can plot the stacked bar charts.
fig = px.bar(df, x='year', y='pop',barmode='stack',color='country') fig.show()
Stacked bar charts show the summation of individual entries as well as the entire plot. So, it is a good way to understand the contribution of each individual factor toward a complete entity.
Let us see the life expectancy data.
fig = px.bar(df, x='year', y='lifeExp',barmode='stack',color='country') fig.show()
Now we will see custom visuals.
x = ['Suzuki', 'Honda', 'Tata'] y = [100, 40, 60] # Use the hovertext kw argument for hover text fig = go.Figure(data=[go.Bar(x=x, y=y, hovertext=['50 % Share', '20 % Share', '30 % Share'])]) fig.update_layout(title_text='Sales Data') fig.show()
Let us plot the populations of the most populous nations in Asia.
#uniform text size df = px.data.gapminder().query("continent == 'Asia' and year == 2007and pop > 8000000") fig = px.bar(df, y='pop', x='country', text='pop') fig.update_traces(texttemplate='%{text:.2s}', textposition='outside') fig.show()
So, we plotted a wide variety of bar plots and analyzed data. Let us try a different type of plot now.
Pie Chart Using Plotly
Pie charts are useful for understanding the composition of data and analyzing part-to-whole relationships. They plot the percentage of each category relative to the whole, allowing us to see how different parts contribute to the overall total.
Let's revisit the sales dataset and create a pie chart that displays the sales from each state. The chart will represent the percentage contribution of each state, providing valuable insights into the distribution of sales.
fig = px.pie(sales, values='Sales', names='State', title='Sales Per State in US') fig.show()
So, we can see that the majority of the sales are from California.
Now, we plot the sales segments and their contribution.
Now, we see the sales per category.
fig = px.pie(sales, values='Sales', names='Category', title='Sales Per Category in US') fig.show()
So, we can see that Furniture was sold the highest.
Now, we will make some more advanced plots, and we shall be using the tips dataset.
#setting colours df = px.data.tips() fig = px.pie(df, values='tip', names='day', color_discrete_sequence=px.colors.sequential.RdBu) fig.show()
So, the plots are entirely customizable.
labels = ['Apple','Microsoft','Amazon','Alphabet'] values = [2252, 1966, 1711, 1538] fig = go.Figure(data=[go.Pie(labels=labels, values=values, textinfo='label+percent', insidetextorientation='radial' )]) fig.show()
Let us make a doughnut chart now.
#donut chart labels = ['CAR','BIKE','BUS','TRAIN'] values = [1500, 2500, 6800, 9000] fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.3)]) fig.show()
The real difference between a doughnut chart and a pie chart is mainly the appearance and the way someone wants to plot the data.
Let us now make the chart a little bit customised.
#donut chart labels = ['CAR','BIKE','BUS','TRAIN'] values = [1500, 2500, 6800, 9000] fig = go.Figure(data=[go.Pie(labels=labels, values=values, pull=[0.1, 0.1, 0.2, 0.1])]) fig.show()
So, we can see that Plotly offers a high level of customization and visually appealing plots.
Check out the code here: Kaggle
Bubble Charts Using Plotly
These Charts are a great way to show magnitude by adjusting the size of the circle. Bubble Charts can be easily made in Python.
fig = go.Figure(data=[go.Scatter( x=[1, 2, 3, 4], y=[10, 12, 15, 16], mode='markers', marker_size=[20, 40, 50, 60]) ]) fig.show()
The plot is made easily.
df = px.data.gapminder() fig = px.scatter(df.query("year==2007"), x="gdpPercap", y="lifeExp", size="pop", color="continent", hover_name="country", log_x=True, size_max=60) fig.show()
Let us use the tips data again.
fig = px.scatter(tips, x="total_bill", y="size", size="tip", color="tip", size_max=20) fig.show()
Bubble charts are a great way to visualise data and understand insights.
Dot Plots Using Plotly
Dot Plots are a different way of presenting scatter plots and showing the data distribution properly.
We are taking a new dataset.
stud= pd.read_csv("/kaggle/input/students-performance-in-exams/StudentsPerformance.csv")
I will share the link to all codes in the end; please have a look there.
fig = px.scatter(stud, x="math score", y="parental level of education", color="gender", title="Student Performance in Exams" ) fig.show()
Let us try another plot.
fig = px.scatter(stud, x="writing score", y="parental level of education", color="lunch", title="Student Performance in Exams" ) fig.show()
Horizontal Bar Chart Using Plotly
Horizontal bar charts are just a way to interpret the traditional bar chart.
fig = px.bar(stud, x="reading score", y="parental level of education",color='gender', orientation='h') fig.show()
Gantt Chart
A Gantt Chart is a specialized bar chart used to display the progress of a project or task. It visualizes different phases or sections of a project, illustrating their timelines and progress. Gantt charts are particularly helpful for project management as they provide an overview of the project's schedule and the status of individual tasks.
Let’s proceed by plotting some sample Gantt charts to better understand how they represent project timelines.
df = pd.DataFrame([ dict(Task="Development", Start='2012-01-20', Finish='2012-02-20'), dict(Task="Website Design", Start='2012-01-10', Finish='2012-01-30'), dict(Task="Deployment", Start='2012-02-20', Finish='2012-03-30'), dict(Task="Marketing", Start='2012-02-25', Finish='2012-04-15') ]) fig = px.timeline(df, x_start="Start", x_end="Finish", y="Task") fig.update_yaxes(autorange="reversed") fig.show()
Let us add a few more features.
df = pd.DataFrame([ dict(Task="Development", Start='2012-01-20', Finish='2012-02-20', Team="Team A"), dict(Task="Website Design", Start='2012-01-10', Finish='2012-01-30', Team="Team B"), dict(Task="Deployment", Start='2012-02-20', Finish='2012-03-30', Team="Team A"), dict(Task="Marketing", Start='2012-02-25', Finish='2012-04-15', Team="Team C") ]) fig = px.timeline(df, x_start="Start", x_end="Finish", y="Task", color="Team") fig.update_yaxes(autorange="reversed") fig.show()
Now, let us add hues based on team size.
df = pd.DataFrame([ dict(Task="Development", Start='2012-01-20', Finish='2012-02-20', Team="Team A",Team_Size=20), dict(Task="Website Design", Start='2012-01-10', Finish='2012-01-30', Team="Team B",Team_Size=15), dict(Task="Deployment", Start='2012-02-20', Finish='2012-03-30', Team="Team A",Team_Size=20), dict(Task="Marketing", Start='2012-02-25', Finish='2012-04-15', Team="Team C",Team_Size=32) ]) fig = px.timeline(df, x_start="Start", x_end="Finish", y="Task",color="Team_Size") fig.update_yaxes(autorange="reversed") fig.show()
Box Plots Using Plotly
Box Plots are a great way to understand data distribution. They depict numerical data using quartiles.
fig = px.box(stud, y="math score") fig.show()
A box plot is a great way to visualize the distribution of a dataset, showing the minimum, maximum, median, and quartiles, as well as identifying outliers. Here's a breakdown of what each component represents:
- Minimum: The lowest data point in the dataset, excluding outliers.
- Maximum: The highest data point in the dataset.
- Median: The middle value that separates the higher half and the lower half of the data.
- Lower Quartile (25th percentile): The value below which 25% of the data falls.
- Upper Quartile (75th percentile): The value below which 75% of the data falls.
Let's now proceed to create some customized box plots to visualize the distribution and insights from our dataset.
fig = px.box(stud, x='gender',y="math score") fig.show()
fig = px.box(stud, x='gender',y="math score", points="all") fig.show()
fig = px.box(stud, x='gender',y="math score", color="test preparation course") fig.show()
Now, let us add a notch.
fig = px.box(stud, x='gender',y="math score", color="test preparation course", notched=True) fig.show()
Histograms
Histogram widgets are an excellent plot to understand the frequency distribution of numerical data.
fig = px.histogram(stud, x="math score", nbins=20, color="gender") fig.show()
Let us customize it.
fig = px.histogram(stud, x="math score", nbins=20, color="gender", marginal="rug") fig.show()
Let us make a data visual to show the proper representation of data by adding a box plot as well.
fig = px.histogram(stud, x="reading score", y="math score", color="gender", marginal="box", hover_data=stud.columns) fig.show()
Such visuals are really great for understanding how the data is spread, and we can interact with the plots.
fig = px.histogram(stud, x="reading score", y="writing score", color="parental level of education", marginal="box", hover_data=stud.columns) fig.show()
We had a look at major visualization methods in Plotly.
Code (Kaggle Notebooks):
Conclusion
The Plotly Python library is an open-source tool used for data visualization, offering support for a wide range of graph types including line charts, scatter plots, bar charts, histograms, and area plots. It produces interactive visualizations that can be easily embedded into websites and provides a variety of complex plotting options. These interactive features offer several advantages over static visualizations, such as those created with Matplotlib. Specifically, Plotly enhances the initial exploration of datasets by allowing users to interact with the graphs, saving time and providing deeper insights.