Wikipedia Pull Data from table sample

I was researching some data sets and had the need to pull population data from Wikipedia. I looked at several articles on doing this and they all seemed either very basic or not current. I had already done an article on the Wikipedia Python package so I decided to try to write one on pulling table data out of a Wikipedia page. The packages I used were urllib and Beautiful Soup. I could have done this with other packages (requests comes to mind) but this is very simple and it works. Many Wikipedia pages offer tables to display data and most of them happen to be the only tables on the page or have unique IDs. Unfortunately, on the "List_of_states_and_territories_of_the_United_States_by_population" page on Wikipedia, There is a table for state rankings and another for rankings by region as well. So I would need to only pull table data for the specific table I needed, which was not too hard.

So to start, how do you pull data from a Wikipedia page? urllib works for me so that's what I used:

So we import urlopen from urllib.request and beautiful soup from bs4. Setup the url and then open and read in the content. The next step is to setup your data with BeautifulSoup. Quick note: I have hardcoded some things and removed the code from the class it was in for simplicity sake. The next code snippet reads through the Beautiful soup(BS) dataset and parses the data for me.

That first line is where most of the magic happens. Most tables on the Wikipedia site use the table classes, wikitable and sortable. BS has a great interface called select where you can use HTML tags (table in this case) and then further find the exact table you want using css tags, so that explains the .wikitable.sortable . Then we tell BS that we want to further filter our page table with the tbody and tr tags that are found within the table we want. The rest is just simplified for loop to loop through the table rows and pull out the State name and population statistics I needed. The result is the data shown below.

Here are the results for U.S Population by State:

State2019 Population Estimate2010 Census PopulationPercent of US Population
39,237,836 37,253,956 5.3% 11.80%
29,527,941 25,145,561 17.4% 8.70%
21,781,128 18,801,310 15.85% 6.43%
19,835,913 19,378,102 2.36% 6.03%
12,964,056 12,702,379 2.06% 3.88%
12,671,469 12,830,632 –0.1% 3.82%
11,780,017 11,536,504 2.3% 3.52%
10,799,566 9,687,653 10.6% 3.20%
10,551,162 9,535,483 9.5% 3.12%
10,050,811 9,883,640 2.0% 3.01%
9,267,130 8,791,894 5.7% 2.77%
8,642,274 8,001,024 7.9% 2.58%
7,738,692 6,724,540 14.6% 2.30%
7,276,316 6,392,017 11.9% 2.13%
6,984,723 6,547,629 7.4% 2.10%
6,975,218 6,346,105 8.9% 2.06%
6,805,985 6,483,802 4.7% 2.02%
6,165,129 5,773,552 7.0% 1.84%
6,168,187 5,988,927 2.8% 1.84%
5,895,908 5,686,986 3.6% 1.76%
5,812,069 5,029,196 14.8% 1.72%
5,707,390 5,303,925 7.6% 1.70%
5,190,705 4,625,364 10.7% 1.53%
5,039,877 4,779,736 5.1% 1.50%
4,624,047 4,533,372 2.0% 1.39%
4,505,836 4,339,367 3.8% 1.35%
4,237,256 3,831,074 10.6% 1.26%
3,959,353 3,751,351 5.5% 1.18%
3,605,944 3,574,097 0.9% 1.08%
3,285,874 3,725,789 –‍11.8% 0.98%
3,271,616 2,763,885 18.4% 0.98%
3,190,369 3,046,355 4.7% 0.95%
3,104,614 2,700,551 15.0% 0.93%
3,025,891 2,915,918 3.8% 0.90%
2,961,279 2,967,297 –0.2% 0.88%
2,937,880 2,853,118 3.0% 0.88%
2,117,522 2,059,179 2.8% 0.63%
1,961,504 1,826,341 7.4% 0.59%
1,839,106 1,567,582 17.3% 0.55%
1,793,716 1,852,994 –3.2% 0.54%
1,441,553 1,360,301 6.0% 0.43%
1,388,992 1,316,470 5.5% 0.41%
1,362,359 1,328,361 2.6% 0.41%
1,097,379 1,052,567 4.3% 0.33%
1,084,225 989,415 9.6% 0.32%
989,948 897,934 10.2% 0.30%
886,667 814,180 8.9% 0.26%
779,094 672,591 15.8% 0.23%
733,391 710,231 3.3% 0.22%
689,545 601,723 14.6% 0.21%
643,077 625,741 2.8% 0.19%
576,851 563,626 2.3% 0.17%
153,836 159,358 –3.5% 0.05%
87,146 106,405 –18.1% 0.03%
49,710 55,519 –10.5% 0.01%
47,329 53,883 –12.2% 0.01%
Back to top Total States and Territories found: 56