There are many graphs and overviews about the Corona data available.
In one discussion we read about an interesting approach: to view the data with starting point when a country first exceeds the 100 confirmed cases. And then focus on the growth, by using a logarithmic scale.
We wrote a Python script reading the data from JohnHopkins, with data-cleaning and some calculations. The script is executed directly in PowerBI to get the latest data and make the Graphs and insights, without manual actions. For people who are interested in de code: these are published at the end of this blog.
Click on picture to get a full page view. unfortunately we cannot publish the PowerBI on this public webpage.
The Map is showing the countries with reported cases. Africa and South-America have less reported cases in the dataset. We see that the growth of number of cases and deaths is still high. However, the growth rate is stabilising between 70 and 80K per day. The actual number of cases cannot be determined from this dataset because in the dataset we think that the number of recovered people is not accurate. In many countries the call to be less strict with regulations is growing. Politicians are responding to this and let us know that the virus has stabilized. In some countries the numbers support that. China, New Zealand, Iceland and some other have less than a 25 increase on confirmed cases. In the graph below we see how small that number of countries is at the moment.
The growth patterns from “the first day the number of confirmed cases was over 100 per country” can be seen in the graphs below. It shows how the pattern of the virus is increasing per country.
In the graph above you can see that Japan has a different growth characteristic from most countries. In some comments you read that they started later en slower with testing of cases. US has the most cases and growth. We have left China out of this graph, because the x-as will be expanded to 100 days. It started much sooner than other countries and is stable (in confirmed numbers) for the last few months.
Sweden has another approach about handling the Covid-19 Pandemic than most countries. At first it was following the same pattern as their neighbours Denmark and Norway. But the last days they are accelerating more than their neighbouring countries. But still lower than e.g. Netherlands and Belgium.
Nearly all countries show less growth, but still growth.
In the graph above we used the same visualisation, but per million inhabitants of a country. The majority of the countries in this selection have between 1400 and 4800 cases per million. The paterns are even more close than the absolute numbers.
The graph of reported and confirmed deaths shows the same pattern for many countries . We also see a different pattern for e.g. Japan.
This is the graph of the States in the US. We can clearly see the growth for New York that is hit hard with COVID-19. Nearly in all States the growth of number of cases is visible, and still going faster than in most countries around the world.
At the start of the COVID-19 crisis it was mentioned that older people are more at risk. So we looked for data with some indication about age per country. We found that in Worldometers. Also the population per country is in that dataset. In absolute numbers the US leads in number of cases. But when we look at country size, and measure number of cases per 1 million inhabitants then the view changes and we see other countries are on top. On the right bottom side we see the plot between med.age and number per million inhabitants. It looks like the med.age is indeed influencing the number of cases, but be aware that the numbers can also be more reliable in those countries.
import pandas as pd
import datetime as dt
import numpy as np
import sys
def daterange(start_date, end_date):
for n in range(int ((end_date - start_date).days)):
yield start_date + dt.timedelta(n)
def cleandata(df_raw):
# unpivot the raw data
df_cleaned=df_raw.melt(id_vars=['Province/State','Country/Region','Lat','Long'],value_name='Cases',var_name='Date')
try:
df_cleaned['Date'] = df_cleaned['Date'].astype('datetime64[ns]')
except:
print("Error")
return df_cleaned
## Read Data for Cases, Deaths and Recoveries
# print("Start reading files")
start_date = dt.date(2020, 1, 22)
columnnames = ['Province/State', 'Country/Region', 'Last Update' , 'Confirmed' , 'Deaths', 'Recovered']all_cases_raw = pd.DataFrame ( columns = columnnames)
for single_date in daterange(start_date, dt.date.today()):
try:
# single_date = dt.date(2020, 3, 29)
date_cases_raw=pd.read_csv('https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_daily_reports/'+ single_date.strftime("%m-%d-%Y") + '.csv', sep=','
, dtype={'Province/State':str,'Country/Region':str, 'FIPS':str, 'Lat':str,'Long':str, "latitude":str, "Longitude":str,'Long':str,
'Confirmed':str, "Deaths":str, "Recovered":str})
date_cases_raw["Date"] = single_date
last_date = single_date
all_cases_raw = all_cases_raw.append(date_cases_raw)
except:
print("Error reading file " + single_date.strftime("%m-%d-%Y") + " --> ", sys.exc_info()[0])
# print("Finish reading files")
del date_cases_raw
all_cases_raw["LastDay"] = np.where(all_cases_raw["Date"] == last_date, "Yes","No")
# Read province state FIPS lookup
FIPS_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/UID_ISO_FIPS_LookUp_Table.csv' , sep=','
, dtype={'UID':str,'iso2':str,'iso3':str,'code3':str, "FIPS":str, "Admin2":str,'Province_State':str,
'Country_Region':str, "Lat":str, "Long_":str, "Combined_Key":str})
# They changes names over time
print("Cleaning reading files")
all_cases_raw["Country/Region"] = np.where(all_cases_raw["Country/Region"].isnull() == False,all_cases_raw["Country/Region"],all_cases_raw["Country_Region"] )
all_cases_raw["Province/State"] = np.where(all_cases_raw["Province/State"].isnull() == False,all_cases_raw["Province/State"],all_cases_raw["Province_State"] )
all_cases_raw["Latitude"] = np.where(all_cases_raw["Latitude"].isnull() == False,all_cases_raw["Latitude"],all_cases_raw["Lat"] )
all_cases_raw["Longitude"] = np.where(all_cases_raw["Longitude"].isnull() == False,all_cases_raw["Longitude"],all_cases_raw["Long_"] )
all_cases_raw["Last Update"] = np.where(all_cases_raw["Last Update"].isnull() == False,all_cases_raw["Last Update"],all_cases_raw["Last_Update"] )
all_cases_raw = all_cases_raw.drop({"Country_Region","Province_State","Lat","Long_","Last_Update"}, axis=1)
all_cases_raw["Last Update"] = all_cases_raw["Last Update"].astype('datetime64')
all_cases_raw["Latitude"] = all_cases_raw["Latitude"].astype(str)
all_cases_raw["Longitude"] = all_cases_raw["Longitude"].astype(str)
all_cases_raw["Confirmed"] = all_cases_raw["Confirmed"].fillna(0).astype(int)
all_cases_raw["Deaths"] = all_cases_raw["Deaths"].fillna(0).astype(int)
all_cases_raw["Recovered"] = all_cases_raw["Recovered"].fillna(0).astype(int)
all_cases_raw["Date"] = all_cases_raw["Date"].astype('datetime64')
all_cases_raw = all_cases_raw.merge(FIPS_df[["Combined_Key", "Province_State"]], left_on="Combined_Key" , right_on = "Combined_Key" , how='left')
# they changed name of UK to United Kingdom, so we have to clean this. Also for other country changes
all_cases_raw.loc[:, "Country/Region"].replace('UK','United Kingdom', inplace=True)
all_cases_raw.loc[:, "Country/Region"].replace('Bahamas, The','Bahamas', inplace=True)
all_cases_raw.loc[:, "Country/Region"].replace(' Azerbaijan','Azerbaijan', inplace=True)
all_cases_raw.loc[:, "Country/Region"].replace('Hong Kong SAR','Hong Kong', inplace=True)
all_cases_raw.loc[:, "Country/Region"].replace('Iran (Islamic Republic of)', "Iran", inplace=True)
all_cases_raw.loc[:, "Country/Region"].replace('Macao SAR','Macao', inplace=True)
all_cases_raw.loc[:, "Country/Region"].replace('Republic of Ireland','Ireland', inplace=True)
all_cases_raw.loc[:, "Country/Region"].replace('Russian Federation','Russia', inplace=True)
all_cases_raw.loc[:, "Country/Region"].replace(r'Taiwan*','Taiwan', inplace=True)
all_cases_raw.loc[:, "Country/Region"].replace('Taiwan*', "Taiwan", inplace=True)
all_cases_raw.loc[:, "Country/Region"].replace('Viet Nam','Vietnam', inplace=True)
all_cases_raw.loc[:, "Country/Region"].replace('Mainland China','China', inplace=True)
all_cases_raw.loc[:, "Country/Region"].replace('Korea, South','South Korea', inplace=True)
all_cases_raw.loc[:, "Country/Region"].replace('Republic of Korea','South Korea', inplace=True)
all_cases_raw.loc[:, "Country/Region"].replace('Gambia, The','Gambia', inplace=True)
all_cases_raw.loc[:, "Country/Region"].replace('The Gambia','Gambia', inplace=True)
# in statements from here we calculate also the difference between the day before. Beware of the NULL, so fillna before and after the diff (no difference I like to have as 0)
# first time should be the number of the day
country_cases = all_cases_raw.groupby(["Country/Region", "Date"], as_index=False).agg({"Confirmed":sum,"Deaths":sum, "Recovered":sum, "LastDay":"first"})
country_cases["Confirmed_diff"] = country_cases.groupby(['Country/Region'])['Confirmed'].fillna(0).diff().fillna(0).astype(int)
country_cases["Deaths_diff"] = country_cases.groupby(['Country/Region'])['Deaths'].fillna(0).diff().fillna(0).astype(int)
country_cases["Recovered_diff"] = country_cases.groupby(['Country/Region'])['Recovered'].fillna(0).diff().fillna(0).astype(int)
country_cases["Country_key"] = country_cases["Country/Region"] + "-" + country_cases["Date"].astype(str)
# with the method of groupby diff, the first values on new instance for group values is not valid (is difference with previous group)
# I tried a loop per country, but then the first row is NaN, which is also not correct
temp_first_date = country_cases.groupby(["Country/Region"]).agg({"Date":"min"})
temp_first_date = temp_first_date.merge(country_cases[["Country/Region","Date","Confirmed","Deaths", "Recovered"]], left_on=["Country/Region", "Date"], right_on=["Country/Region", "Date"], how="left")
temp_first_date["Country_key"] = temp_first_date["Country/Region"] + "-" + temp_first_date["Date"].astype(str)
temp_first_date.rename(columns={"Confirmed":"Conf_first", "Deaths":"Deaths_first", "Recovered":"Recovered_first"}, inplace=True)
temp_first_date.drop(columns={"Country/Region","Date"}, inplace=True)
# in temp first date the correct values are there. They must update the rows in country_cases
country_cases = country_cases.merge(temp_first_date, left_on=["Country_key"], right_on=["Country_key"], how="left")
# Overwrite the columns for first date and clean up
country_cases["Confirmed_diff"] = np.where(country_cases["Conf_first"].isnull() == False, country_cases["Conf_first"].fillna(0).astype(int),country_cases["Confirmed_diff"].fillna(0).astype(int) )
country_cases["Deaths_diff"] = np.where(country_cases["Deaths_first"].isnull() == False, country_cases["Deaths_first"].fillna(0).astype(int),country_cases["Deaths_diff"].fillna(0).astype(int) )
country_cases["Recovered_diff"] = np.where(country_cases["Recovered_first"].isnull() == False, country_cases["Recovered_first"].fillna(0).astype(int),country_cases["Recovered_diff"].fillna(0).astype(int) )
country_cases.drop(columns={"Conf_first", "Deaths_first", "Recovered_first"}, inplace=True)
country_cases["Date 100"] = np.where(country_cases["Confirmed"]>100,"1","0")
# for each country we like to have the first day that 100 confirmed cases where reported
temp_country = country_cases.loc[(country_cases["Date 100"] == "1")].groupby(["Country/Region"], as_index = False).agg({"Date":"min"})
temp_country.rename(columns={"Date":"Day of 100"}, index=str, inplace=True)
country_cases = country_cases.merge(temp_country, left_on = ["Country/Region"], right_on = ["Country/Region"], how= "left")
country_cases.drop(columns={"Date 100"}, axis=1, inplace=True)
country_cases["Days after 100"] = (country_cases["Date"] - country_cases["Day of 100"]).dt.days.fillna(0).astype(int)
# Province table
all_cases_temp = all_cases_raw
# made choice to replace null provinces by "NaN"
all_cases_temp["Province/State"] = np.where(all_cases_temp["Province/State"].isnull() == True,"NaN", all_cases_temp["Province/State"] )
province_cases = all_cases_temp.groupby(["Country/Region", "Province/State", "Date"], as_index=False).agg({"Confirmed":sum,"Deaths":sum, "Recovered":sum, "LastDay":"first"})
province_cases["Confirmed_diff"] = province_cases.groupby(['Country/Region', "Province/State"])['Confirmed'].fillna(0).diff().fillna(0).astype(int)
province_cases["Deaths_diff"] = province_cases.groupby(['Country/Region', "Province/State"])['Deaths'].fillna(0).diff().fillna(0).astype(int)
province_cases["Recovered_diff"] = province_cases.groupby(['Country/Region', "Province/State"])['Recovered'].fillna(0).diff().fillna(0).astype(int)
province_cases["Province_key"] = province_cases["Country/Region"] + "-" + province_cases["Province/State"] + "-" + province_cases["Date"].astype(str)
# with the method of groupby diff, the first values on new instance for group values is not valid (is difference with previous group)
# I tried a loop per country, but then the first row is NaN, which is also not correct
temp_first_date = province_cases.groupby(["Country/Region", "Province/State"]).agg({"Date":"min"})
temp_first_date = temp_first_date.merge(province_cases[["Country/Region","Province/State", "Date","Confirmed","Deaths", "Recovered"]], left_on=["Country/Region", "Province/State", "Date"], right_on=["Country/Region", "Province/State", "Date"], how="left")
temp_first_date["Province_key"] = temp_first_date["Country/Region"] + "-" + temp_first_date["Province/State"] + "-" + temp_first_date["Date"].astype(str)
temp_first_date.rename(columns={"Confirmed":"Conf_first", "Deaths":"Deaths_first", "Recovered":"Recovered_first"}, inplace=True)
temp_first_date.drop(columns={"Country/Region", "Province/State", "Date"}, inplace=True)
# in temp first date the correct values are there. They must update the rows in country_cases
province_cases = province_cases.merge(temp_first_date, left_on=["Province_key"], right_on=["Province_key"], how="left")
# Overwrite the columns and clean up
province_cases["Confirmed_diff"] = np.where(province_cases["Conf_first"].isnull() == False, province_cases["Conf_first"].fillna(0).astype(int),province_cases["Confirmed_diff"].fillna(0).astype(int) )
province_cases["Deaths_diff"] = np.where(province_cases["Deaths_first"].isnull() == False, province_cases["Deaths_first"].fillna(0).astype(int),province_cases["Deaths_diff"].fillna(0).astype(int) )
province_cases["Recovered_diff"] = np.where(province_cases["Recovered_first"].isnull() == False, province_cases["Recovered_first"].fillna(0).astype(int),province_cases["Recovered_diff"].fillna(0).astype(int) )
province_cases.drop(columns={"Conf_first", "Deaths_first", "Recovered_first"}, inplace=True)
province_cases["Date 100"] = np.where(province_cases["Confirmed"]>100,"1","0")
province_cases["Country_key"] = province_cases["Country/Region"] + "-" + province_cases["Date"].astype(str)
# for each country we like to have the first day that 100 confirmed cases where reported
temp_province = province_cases.loc[(province_cases["Date 100"] == "1")].groupby(["Country/Region", "Province/State"], as_index = False).agg({"Date":"min"})
temp_province.rename(columns={"Date":"Day of 100"}, index=str, inplace=True)
province_cases = province_cases.merge(temp_province, left_on = ["Country/Region", "Province/State"], right_on = ["Country/Region", "Province/State"], how= "left")
province_cases.drop(columns={"Date 100"}, axis=1, inplace=True)
province_cases["Days after 100"] = (province_cases["Date"] - province_cases["Day of 100"]).dt.days.fillna(0).astype(int)
# Set for City
all_cases_temp["Admin2"] = np.where(all_cases_temp["Admin2"].isnull() == True,"NaN", all_cases_temp["Admin2"] )
city_cases = all_cases_temp.groupby(["Country/Region", "Province/State", "Admin2", "Date"], as_index=False).agg({"Confirmed":sum,"Deaths":sum, "Recovered":sum, "LastDay":"first"})
city_cases["Confirmed_diff"] = city_cases.groupby(['Country/Region', "Province/State", "Admin2"])['Confirmed'].fillna(0).diff().fillna(0).astype(int)
city_cases["Deaths_diff"] = city_cases.groupby(['Country/Region', "Province/State", "Admin2"])['Deaths'].fillna(0).diff().fillna(0).astype(int)
city_cases["Recovered_diff"] = city_cases.groupby(['Country/Region', "Province/State", "Admin2"])['Recovered'].fillna(0).diff().fillna(0).astype(int)
city_cases["City_key"] = city_cases["Country/Region"] + "-" + city_cases["Province/State"] + "-" + city_cases["Admin2"] + "-" + city_cases["Date"].astype(str)
# with the method of groupby diff, the first values on new instance for group values is not valid (is difference with previous group)
# I tried a loop per country, but then the first row is NaN, which is also not correct
temp_first_date = city_cases.groupby(["Country/Region", "Province/State", "Admin2"]).agg({"Date":"min"})
temp_first_date = temp_first_date.merge(city_cases[["Country/Region","Province/State", "Admin2", "Date","Confirmed","Deaths", "Recovered"]], left_on=["Country/Region", "Province/State", "Admin2" ,"Date"], right_on=["Country/Region", "Province/State", "Admin2", "Date"], how="left")
temp_first_date["City_key"] = temp_first_date["Country/Region"] + "-" + temp_first_date["Province/State"] + "-" + temp_first_date["Date"].astype(str)
temp_first_date.rename(columns={"Confirmed":"Conf_first", "Deaths":"Deaths_first", "Recovered":"Recovered_first"}, inplace=True)
temp_first_date.drop(columns={"Country/Region", "Province/State", "Admin2", "Date"}, axis=1,inplace=True)
# in temp first date the correct values are there. They must update the rows in country_cases
city_cases = city_cases.merge(temp_first_date, left_on=["City_key"], right_on=["City_key"], how="left")
# Overwrite the columns and clean up
city_cases["Confirmed_diff"] = np.where(city_cases["Conf_first"].isnull() == False, city_cases["Conf_first"].fillna(0).astype(int),city_cases["Confirmed_diff"].fillna(0).astype(int) )
city_cases["Deaths_diff"] = np.where(city_cases["Deaths_first"].isnull() == False, city_cases["Deaths_first"].fillna(0).astype(int),city_cases["Deaths_diff"].fillna(0).astype(int) )
city_cases["Recovered_diff"] = np.where(city_cases["Recovered_first"].isnull() == False, city_cases["Recovered_first"].fillna(0).astype(int),city_cases["Recovered_diff"].fillna(0).astype(int) )
city_cases.drop(columns={"Conf_first", "Deaths_first", "Recovered_first"}, axis=1, inplace=True)
city_cases["Date 100"] = np.where(city_cases["Confirmed"]>100,"1","0")
city_cases["Province_key"] = city_cases["Country/Region"] + "-" + city_cases["Province/State"] + "-" + city_cases["Date"].astype(str)
# for each country we like to have the first day that 100 confirmed cases where reported
temp_city = city_cases.loc[(city_cases["Date 100"] == "1")].groupby(["Country/Region", "Province/State", "Admin2"], as_index = False).agg({"Date":"min"})
temp_city.rename(columns={"Date":"Day of 100"}, index=str, inplace=True)
city_cases = city_cases.merge(temp_city, left_on = ["Country/Region", "Province/State", "Admin2"], right_on = ["Country/Region", "Province/State", "Admin2"], how= "left")
city_cases.drop(columns={"Date 100", "City_key"}, axis=1, inplace=True)
del all_cases_temp, temp_country, temp_province, temp_city
The Log graphs are also written in Python. This did give me more controle over layout (name directly after the line in same color etc.)
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
plt.style.use('Solarize_Light2')
plt.rcParams["figure.figsize"] = (24,12)
plt.margins( x=None, y=None, tight=True)
plt.yscale('log')
plt.xlim(0, dataset["Days after 100"].max() + 5)
# plt.figure(figsize=(8,6))
linecolor = 0
for country in dataset["Country/Region"].unique():
linecolor = linecolor + 1
xas = dataset["Days after 100"].loc[(dataset["Country/Region"] == country)].max()
yas = dataset["Confirmed"].loc[(dataset["Country/Region"] == country)].max()
plot = dataset.loc[(dataset["Country/Region"] == country)].groupby(["Days after 100"]).sum().unstack().plot()
plt.text(x=xas + 0.2, y=yas-20, s=country, alpha=0.7, c = plot.get_lines()[linecolor -1].get_color())
plot.set(xlabel="Days after first 100 confirmed cases in specific country", ylabel="Confirmed cases (log)")
plot.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
plot.xaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
plt.plot([0, 50], [100, 10000000], 'k-', linestyle='dashed')
plt.show()
0 Comments