Web Scraping with Python (Practice Case Study)
2024 / Python
Using Python and BeautifulSoup package, I scraped the UGM SDGS Center website.
The frontpage showed 45 excerpts of articles (at the time of scraping), divided into 5 pages. The following steps were taken:
- Scraped the URL for the 5 pages (a).
- Scraped the URL for the 45 articles using URL from step (a) above (b).
- Scraped the title and the first 3 paragraphs for the 45 articles using URL from step (b) above.
- Exported the scraped data into a JSON file.
This website and the contents scraped are publicly accessible.


Source code:
#code starts
from urllib.parse import urljoin
import json
import requests
from bs4 import BeautifulSoup
# Get the first page.
url = 'https://sustainabledevelopment.ugm.ac.id/category/reduced-inequalities/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
page_link_el = soup.select('.elementor-pagination a')
index_page = page_link_el[0].get('href')
last_page = page_link_el[len(page_link_el)-1].get('href')
last_page_number = str(last_page)[-2]
pages_add = []
# Make links for and process the following pages.
for i in range (1,int(last_page_number) + 1):
link = urljoin(url, "page/" + str(i) + "/")
response = requests.get(link)
soup = BeautifulSoup(response.text, 'lxml')
url_get = response.url
pages_add.append(url_get)
post_links = []
for page_added in pages_add:
response = requests.get(page_added)
soup = BeautifulSoup(response.text, 'lxml')
post_titles = soup.select('div:nth-child(2) > article > div > h3')
for title in post_titles:
post_links.append(title.find('a').get('href'))
post_no = 1
scraped_data = []
for post_link_item in post_links:
response = requests.get(post_link_item)
soup = BeautifulSoup(response.text, 'lxml')
post_titles_opened = soup.select('div.elementor-element-25e359e')
for post_titles_opened_item in post_titles_opened:
for p1_item in post_titles_opened_item.select('p:nth-child(1)'):
temp_item1 = p1_item.text
for p2_item in post_titles_opened_item.select('p:nth-child(2)'):
temp_item2 = p2_item.text
for p3_item in post_titles_opened_item.select('p:nth-child(3)'):
temp_item3 = p3_item.text
data = {
"No": str(post_no),
"Title": post_titles_opened_item.find('h1').getText(),
"URL": post_link_item,
"Par 1": temp_item1,
"Par 2": temp_item2,
"Par 3": temp_item3,
}
scraped_data.append(data)
post_no += 1
print(scraped_data)
with open("test_scraper.json", "w", encoding="utf-8") as jsonfile:
json.dump(scraped_data, jsonfile, indent=4)
#code ends
The code above produced in the following data (partially shown):
[ { "No": "1", "Title": "UGM Initiates Sign Language Training to Provide More Inclusive Services", "URL": "https://sustainabledevelopment.ugm.ac.id/2023/06/27/ugm-initiates-sign-language-training-to-provide-more-inclusive-services/", "Par 1": "The UGM Public Relations and Protocol organized sign language training for non-teaching staff from all faculties and units within the university on Monday (26/6) at UGM Central Building.", "Par 2": "The training, in collaboration with the Indonesian Sign Language Center (Pusbisindo), aims to enhance the quality of inclusive public services at Universitas Gadjah Mada (UGM).", "Par 3": "u201cUGM, philosophically, strategically, and practically, strives to develop and continuously improve the facilities and role of UGM as an inclusive campus. Today, we will be introduced to sign language, and there will be more in-depth training in the future,u201d said the Acting Head of UGM Public Relations and Protocol, Dr. Dina W. Kariodimedjo." }, { "No": "2", "Title": "UGM Committed to Improving Disability Services", "URL": "https://sustainabledevelopment.ugm.ac.id/2023/06/16/ugm-committed-to-improving-disability-services/", "Par 1": "Universitas Gadjah Mada (UGM) is deeply committed to transforming itself into an inclusive campus that fosters a welcoming environment for individuals with disabilities. This steadfast commitment is realized through the provision of inclusive education and services for all members of the academic community, employees, and the wider public.", "Par 2": "UGM goes above and beyond in its offerings of diverse and evolving inclusive services tailored specifically for individuals with disabilities. This comprehensive range of services encompasses the provision of disability-friendly facilities, the development of inclusive curricula, and the implementation of disability-specific services.", "Par 3": "Currently, UGM is pioneering the development of the Disability Service Unit (DSU). This unit is not only aimed to facilitate and support access for individuals with disabilities within the academic community but also to fulfill the mandate of Law No. 8 of 2016 on Persons with Disabilities, particularly Article 42, paragraph 3, which states that every higher education institution is obliged to facilitate the establishment of DSU." }, --- --- --- { "No": "44", "Title": "Summer Courseu2019s FGD on Governance of Former Migrant Workers or Former Indonesian Workers (TKI) in Wonosobo", "URL": "https://sustainabledevelopment.ugm.ac.id/2019/08/13/summer-courses-fgd-on-governance-of-former-migrant-workers-or-former-indonesian-workers-tki-in-wonosobo/", "Par 1": "", "Par 2": "From the group discussion forum (FGD) held at the Lipursari Village Hall, it was apparent that most participants were curious about the success of the ex-migrant workersu2019 struggle to protect their rights through permanent regulation by the district government until they received support from the village government. A former migrant worker who is now an activist defending the fate of migrant workers, Siti Maryam alias Maria Bo Niok, revealed that a long process was undertaken for the issuance of a Perda on Protection of Migrant Workers in Wonosobo.", "Par 3": "It was explained, the ex-migrant workers of Wonosobo, it took more than two years to fight for the issuance of Perda No. 8/2016 on the Placement and Protection of Indonesian Workers. u201cIn this regulation, one of them regulates that every prospective female migrant worker who has a breastfeeding obligation, must wait until her child is two years old if they wish to go abroad to work as a laborer,u201d she explained." }, { "No": "45", "Title": "Monthly Seminar: Achievement of Gender Mainstreaming : Between Opportunities and Challenges", "URL": "https://sustainabledevelopment.ugm.ac.id/2019/08/02/monthly-seminar-achievement-of-gender-mainstreaming-between-opportunities-and-challenges/", "Par 1": "", "Par 2": "The Womenu2019s Study Center conducts priority activities based on the following three things, namely:", "Par 3": "Education and Training Program" } ]