Enabling Unstructured Data to develop a Legislative data extraction & processing engine

Overview

  • To develop a Legislative data extraction & processing engine
  • Client’s business utilizes legislative and regulatory information of federal and state
  • Governments to build a legislative search platform. Scope includes US federal, all US states and 20+ Countries with their states/provinces.

Solution

System comprises web scrapers that are responsible for extracting data from dynamic Content web pages and deliver the extracted data to a relational database

Delivered scrapers for different countries after thorough testing and are already deployed on production.

  • Extraction: Data harvesting through web scraping of source web pages or from web APIs (where available)
  • Transformation: Transform values of inconsistent data, monitor, cleanse “bad” data, filter and validate data.
  • Loading: Loads refined and processed data to a database or a data warehouse.
  • Quality Assurance: Automated testing of loaded data supplemented with manual QA for added assurance.

Tech Stack

  • BigData/Hadoop, Hive, Impala, Oozie, Python, SQL, Tableau BI, UI/UX on ReactJS, Java Spring boot, CI/CD -TeamCity/Jenkins/Hudson
  • Core technologies used are Python, PostgreSql, RabbitMQ, Apache Thrift
Enabling Unstructured Data to develop a Legislative data extraction & processing engine
case studies

See More Case Studies

Contact us

Partner with Us for Comprehensive IT Resources

We’re happy to answer any questions you may have and help you determine which of our services best fit your needs.

Your benefits:
What happens next?
1

We Schedule a call at your convenience 

2

We do a discovery and consulting meting 

3

We prepare a proposal 

Schedule a Free Consultation