D>
04:29:48 WARNING pupa: validation of Membership 13ef5094-a6f8-11ef-b281-5a8f52bca9f1 failed: 2 validation errors:
Value '--' for field '' does not match regular expression '\A1 \d{3} \d{3}-\d{4}(?: x\d+)?\Z'
Value 'Telephone: --' for field '' does not match regular expression '\A1 \d{3} \d{3}-\d{4}(?: x\d+)?\Z'
|
ca
|
2024-09-16 04:41:41
|
2024-11-20 04:29:48
|
Value 'Telephone: --' for field '' does not match regular expression '\A1 \d{3} \d{3}-\d{4}(?: x\d+)?\Z'
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 175, in validate
validator.validate(self.as_dict(), schema)
File "/app/.heroku/python/lib/python3.9/site-packages/validictory/validator.py", line 616, in validate
raise MultipleValidationError(self._errors)
validictory.validator.MultipleValidationError: 2 validation errors:
Value '--' for field '' does not match regular expression '\A1 \d{3} \d{3}-\d{4}(?: x\d+)?\Z'
Value 'Telephone: --' for field '' does not match regular expression '\A1 \d{3} \d{3}-\d{4}(?: x\d+)?\Z'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 104, in do_scrape
self.save_object(obj)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 93, in save_object
self.save_object(obj)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 89, in save_object
raise ve
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 85, in save_object
obj.validate()
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 177, in validate
raise ScrapeValueError('validation of {} {} failed: {}'.format(
pupa.exceptions.ScrapeValueError: validation of Membership 13ef5094-a6f8-11ef-b281-5a8f52bca9f1 failed: 2 validation errors:
Value '--' for field '' does not match regular expression '\A1 \d{3} \d{3}-\d{4}(?: x\d+)?\Z'
Value 'Telephone: --' for field '' does not match regular expression '\A1 \d{3} \d{3}-\d{4}(?: x\d+)?\Z'
|
C
|
ca_ab
|
2024-11-20 04:47:34
|
2024-11-20 04:47:34
|
|
C
|
ca_ab_calgary
|
2024-11-20 04:19:45
|
2024-11-20 04:19:45
|
|
C
|
ca_ab_edmonton
|
2024-11-20 04:19:01
|
2024-11-20 04:19:01
|
|
C
|
ca_ab_grande_prairie
|
2024-11-20 04:56:56
|
2024-11-20 04:56:56
|
|
D>
|
ca_ab_grande_prairie_county_no_1
|
2024-09-16 04:02:50
|
2024-11-20 04:42:18
|
IndexError: list index out of range
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 99, in do_scrape
for obj in self.scrape(**kwargs) or []:
File "/app/scrapers/ca_ab_grande_prairie_county_no_1/people.py", line 17, in scrape
name = councillor.xpath('.//div[@class="lb-imageBox_header {headColor}"]')[0].text_content()
IndexError: list index out of range
|
C
|
ca_ab_lethbridge
|
2024-11-20 04:04:13
|
2024-11-20 04:04:13
|
|
C
|
ca_ab_strathcona_county
|
2024-11-20 04:02:19
|
2024-11-20 04:02:20
|
|
C
|
ca_ab_wood_buffalo
|
2024-11-20 04:17:23
|
2024-11-20 04:17:23
|
|
D>
|
ca_bc
|
2024-06-26 04:14:44
|
2024-11-20 04:55:14
|
scrapelib.HTTPError: 404 while retrieving https://www.leg.bc.ca/_api/search/query?querytext='(contentclass:sts_listitem%20OR…
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 99, in do_scrape
for obj in self.scrape(**kwargs) or []:
File "/app/scrapers/ca_bc/people.py", line 16, in scrape
page = self.lxmlize(COUNCIL_PAGE, xml=True)
File "/app/scrapers/utils.py", line 206, in lxmlize
response = self.get(url, cookies=cookies, verify=verify)
File "/app/scrapers/utils.py", line 196, in get
return super().get(*args, verify=kwargs.pop("verify", SSL_VERIFY), **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 602, in request
raise HTTPError(resp)
scrapelib.HTTPError: 404 while retrieving https://www.leg.bc.ca/_api/search/query?querytext='(contentclass:sts_listitem%20OR%20IsDocument:True)%20SPSiteUrl:/content%20ListId:8ecafcaa-2bf9-4434-a60c-3663a9afd175%20MLAActiveOWSBOOL:1%20-LastNameOWSTEXT:Vacant'&selectproperties='Picture1OWSIMGE,Title,Path'&&sortlist='LastNameSort:ascending'&rowlimit=100&QueryTemplatePropertiesUrl='spfile://webroot/queryparametertemplate.xml'
|
C
|
ca_bc_abbotsford
|
2024-11-20 04:19:37
|
2024-11-20 04:19:37
|
|
C
|
ca_bc_burnaby
|
2024-11-20 04:41:13
|
2024-11-20 04:41:14
|
|
C
|
ca_bc_coquitlam
|
2024-11-20 04:14:09
|
2024-11-20 04:14:09
|
|
C
|
ca_bc_kelowna
|
2024-11-20 04:36:17
|
2024-11-20 04:36:17
|
|
D>
04:33:46 WARNING scrapelib: got ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')) sleeping for 10 seconds before retry
04:34:09 WARNING scrapelib: got HTTPSConnectionPool(host='www.tol.ca', port=443): Max retries exceeded with url: /en/the-township/councillors.aspx (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f382d4ede20>: Failed to establish a new connection: [Errno 111] Connection refused')) sleeping for 20 seconds before retry
04:34:32 WARNING scrapelib: got HTTPSConnectionPool(host='www.tol.ca', port=443): Max retries exceeded with url: /en/the-township/councillors.aspx (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f382d4edf40>: Failed to establish a new connection: [Errno 111] Connection refused')) sleeping for 40 seconds before retry
|
ca_bc_langley
|
2024-11-19 04:29:55
|
2024-11-20 04:36:12
|
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.tol.ca', port=443): Read timed out. (read timeout=60)
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 468, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 463, in _make_request
httplib_response = conn.getresponse()
File "/app/.heroku/python/lib/python3.9/http/client.py", line 1377, in getresponse
response.begin()
File "/app/.heroku/python/lib/python3.9/http/client.py", line 320, in begin
version, status, reason = self._read_status()
File "/app/.heroku/python/lib/python3.9/http/client.py", line 281, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/app/.heroku/python/lib/python3.9/socket.py", line 704, in readinto
return self._sock.recv_into(b)
File "/app/.heroku/python/lib/python3.9/ssl.py", line 1275, in recv_into
return self.read(nbytes, buffer)
File "/app/.heroku/python/lib/python3.9/ssl.py", line 1133, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.9/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 802, in urlopen
retries = retries.increment(
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/util/retry.py", line 552, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/packages/six.py", line 770, in reraise
raise value
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 716, in urlopen
httplib_response = self._make_request(
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 470, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 358, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='www.tol.ca', port=443): Read timed out. (read timeout=60)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 99, in do_scrape
for obj in self.scrape(**kwargs) or []:
File "/app/scrapers/ca_bc_langley/people.py", line 10, in scrape
page = self.lxmlize(COUNCIL_PAGE)
File "/app/scrapers/utils.py", line 206, in lxmlize
response = self.get(url, cookies=cookies, verify=verify)
File "/app/scrapers/utils.py", line 196, in get
return super().get(*args, verify=kwargs.pop("verify", SSL_VERIFY), **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 579, in request
resp = super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 404, in request
resp = super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 232, in request
return super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 175, in request
raise exception_raised
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 122, in request
resp = super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/adapters.py", line 713, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.tol.ca', port=443): Read timed out. (read timeout=60)
|
D>
04:20:15 WARNING scrapelib: sleeping for 10 seconds before retry
04:20:25 WARNING scrapelib: sleeping for 20 seconds before retry
04:20:45 WARNING scrapelib: sleeping for 40 seconds before retry
|
ca_bc_langley_city
|
2024-11-14 04:44:37
|
2024-11-20 04:21:25
|
scrapelib.HTTPError: 403 while retrieving https://www.langleycity.ca/cityhall/city-council/council-members
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 99, in do_scrape
for obj in self.scrape(**kwargs) or []:
File "/app/scrapers/ca_bc_langley_city/people.py", line 11, in scrape
page = self.lxmlize(COUNCIL_PAGE)
File "/app/scrapers/utils.py", line 206, in lxmlize
response = self.get(url, cookies=cookies, verify=verify)
File "/app/scrapers/utils.py", line 196, in get
return super().get(*args, verify=kwargs.pop("verify", SSL_VERIFY), **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 602, in request
raise HTTPError(resp)
scrapelib.HTTPError: 403 while retrieving https://www.langleycity.ca/cityhall/city-council/council-members
|
C
|
ca_bc_new_westminster
|
2024-11-20 04:13:53
|
2024-11-20 04:13:53
|
|
C
|
ca_bc_richmond
|
2024-11-20 04:54:43
|
2024-11-20 04:54:43
|
|
C
|
ca_bc_saanich
|
2024-11-20 04:19:06
|
2024-11-20 04:19:06
|
|
C
|
ca_bc_surrey
|
2024-11-20 04:37:40
|
2024-11-20 04:37:40
|
|
C
|
ca_bc_vancouver
|
2024-11-20 04:45:56
|
2024-11-20 04:45:56
|
|
C
|
ca_bc_victoria
|
2024-11-20 04:17:15
|
2024-11-20 04:17:15
|
|
C
|
ca_mb
|
2024-11-20 04:13:16
|
2024-11-20 04:13:16
|
|
C
|
ca_mb_winnipeg
|
2024-11-20 04:17:10
|
2024-11-20 04:17:10
|
|
D>
|
ca_nb
|
2024-09-16 04:20:57
|
2024-11-20 04:48:42
|
IndexError: list index out of range
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 99, in do_scrape
for obj in self.scrape(**kwargs) or []:
File "/app/scrapers/ca_nb/people.py", line 16, in scrape
address = node.xpath('//td[contains(text(),"Address")]/parent::tr//td[2]')[0]
IndexError: list index out of range
|
C
|
ca_nb_fredericton
|
2024-11-20 04:41:40
|
2024-11-20 04:41:41
|
|
C
|
ca_nb_moncton
|
2024-11-20 04:45:51
|
2024-11-20 04:45:51
|
|
C
|
ca_nb_saint_john
|
2024-11-20 04:01:12
|
2024-11-20 04:01:13
|
|
C
|
ca_nl
|
2024-11-20 04:48:56
|
2024-11-20 04:48:56
|
|
C
|
ca_nl_st_john_s
|
2024-11-20 04:19:32
|
2024-11-20 04:19:32
|
|
C
|
ca_ns
|
2024-11-20 04:48:37
|
2024-11-20 04:48:37
|
|
D>
04:12:09 WARNING pupa: validation of CanadianPerson 9cc9e198-a6f5-11ef-b281-5a8f52bca9f1 failed: 2 validation errors:
Value '' for field '<obj>.name' does not match regular expression 'regex.Regex('\\A(?!(?:Chair|Commissioner|Conseiller|Councillor|Deputy|Dr|Hon|M|Maire|Mayor|Miss|Mme|Mr|Mrs|Ms|Regional|Warden)\\b)(?:(?:(?:\\p{Lu}\\.)+|\\p{Lu}+|(?:Jr|Rev|Sr|St)\\.|da|de|den|der|la|van|von|[("](?:\\p{Lu}+|\\p{Lu}\\p{Ll}*(?:-\\p{Lu}\\p{Ll}*)*)[)"]|(?:D\'|d\'|De|de|Des|Di|Du|L\'|La|Le|Mac|Mc|O\'|San|St\\.|Van|Vander?|van|vanden)?\\p{Lu}\\p{Ll}+|\\p{Lu}\\p{Ll}+Anne?|Marie\\p{Lu}\\p{Ll}+|Ch\'ng|Prud\'homme|D!ONNE|IsaBelle|Ya\'ara)(?:\'|-| - | ))+(?:(?:\\p{Lu}\\.)+|\\p{Lu}+|(?:Jr|Rev|Sr|St)\\.|da|de|den|der|la|van|von|[("](?:\\p{Lu}+|\\p{Lu}\\p{Ll}*(?:-\\p{Lu}\\p{Ll}*)*)[)"]|(?:D\'|d\'|De|de|Des|Di|Du|L\'|La|Le|Mac|Mc|O\'|San|St\\.|Van|Vander?|van|vanden)?\\p{Lu}\\p{Ll}+|\\p{Lu}\\p{Ll}+Anne?|Marie\\p{Lu}\\p{Ll}+|Ch\'ng|Prud\'homme|D!ONNE|IsaBelle|Ya\'ara)\\Z', flags=regex.V0)'
Value '' for field '<obj>.name' cannot be blank'
|
ca_ns_cape_breton
|
2024-10-09 06:47:44
|
2024-11-20 04:12:09
|
Value '' for field '<obj>.name' cannot be blank'
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 175, in validate
validator.validate(self.as_dict(), schema)
File "/app/.heroku/python/lib/python3.9/site-packages/validictory/validator.py", line 616, in validate
raise MultipleValidationError(self._errors)
validictory.validator.MultipleValidationError: 2 validation errors:
Value '' for field '<obj>.name' does not match regular expression 'regex.Regex('\\A(?!(?:Chair|Commissioner|Conseiller|Councillor|Deputy|Dr|Hon|M|Maire|Mayor|Miss|Mme|Mr|Mrs|Ms|Regional|Warden)\\b)(?:(?:(?:\\p{Lu}\\.)+|\\p{Lu}+|(?:Jr|Rev|Sr|St)\\.|da|de|den|der|la|van|von|[("](?:\\p{Lu}+|\\p{Lu}\\p{Ll}*(?:-\\p{Lu}\\p{Ll}*)*)[)"]|(?:D\'|d\'|De|de|Des|Di|Du|L\'|La|Le|Mac|Mc|O\'|San|St\\.|Van|Vander?|van|vanden)?\\p{Lu}\\p{Ll}+|\\p{Lu}\\p{Ll}+Anne?|Marie\\p{Lu}\\p{Ll}+|Ch\'ng|Prud\'homme|D!ONNE|IsaBelle|Ya\'ara)(?:\'|-| - | ))+(?:(?:\\p{Lu}\\.)+|\\p{Lu}+|(?:Jr|Rev|Sr|St)\\.|da|de|den|der|la|van|von|[("](?:\\p{Lu}+|\\p{Lu}\\p{Ll}*(?:-\\p{Lu}\\p{Ll}*)*)[)"]|(?:D\'|d\'|De|de|Des|Di|Du|L\'|La|Le|Mac|Mc|O\'|San|St\\.|Van|Vander?|van|vanden)?\\p{Lu}\\p{Ll}+|\\p{Lu}\\p{Ll}+Anne?|Marie\\p{Lu}\\p{Ll}+|Ch\'ng|Prud\'homme|D!ONNE|IsaBelle|Ya\'ara)\\Z', flags=regex.V0)'
Value '' for field '<obj>.name' cannot be blank'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 104, in do_scrape
self.save_object(obj)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 89, in save_object
raise ve
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 85, in save_object
obj.validate()
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 177, in validate
raise ScrapeValueError('validation of {} {} failed: {}'.format(
pupa.exceptions.ScrapeValueError: validation of CanadianPerson 9cc9e198-a6f5-11ef-b281-5a8f52bca9f1 failed: 2 validation errors:
Value '' for field '<obj>.name' does not match regular expression 'regex.Regex('\\A(?!(?:Chair|Commissioner|Conseiller|Councillor|Deputy|Dr|Hon|M|Maire|Mayor|Miss|Mme|Mr|Mrs|Ms|Regional|Warden)\\b)(?:(?:(?:\\p{Lu}\\.)+|\\p{Lu}+|(?:Jr|Rev|Sr|St)\\.|da|de|den|der|la|van|von|[("](?:\\p{Lu}+|\\p{Lu}\\p{Ll}*(?:-\\p{Lu}\\p{Ll}*)*)[)"]|(?:D\'|d\'|De|de|Des|Di|Du|L\'|La|Le|Mac|Mc|O\'|San|St\\.|Van|Vander?|van|vanden)?\\p{Lu}\\p{Ll}+|\\p{Lu}\\p{Ll}+Anne?|Marie\\p{Lu}\\p{Ll}+|Ch\'ng|Prud\'homme|D!ONNE|IsaBelle|Ya\'ara)(?:\'|-| - | ))+(?:(?:\\p{Lu}\\.)+|\\p{Lu}+|(?:Jr|Rev|Sr|St)\\.|da|de|den|der|la|van|von|[("](?:\\p{Lu}+|\\p{Lu}\\p{Ll}*(?:-\\p{Lu}\\p{Ll}*)*)[)"]|(?:D\'|d\'|De|de|Des|Di|Du|L\'|La|Le|Mac|Mc|O\'|San|St\\.|Van|Vander?|van|vanden)?\\p{Lu}\\p{Ll}+|\\p{Lu}\\p{Ll}+Anne?|Marie\\p{Lu}\\p{Ll}+|Ch\'ng|Prud\'homme|D!ONNE|IsaBelle|Ya\'ara)\\Z', flags=regex.V0)'
Value '' for field '<obj>.name' cannot be blank'
|
D>
|
ca_ns_halifax
|
2024-11-15 04:44:46
|
2024-11-20 04:03:49
|
pupa.exceptions.UnresolvedIdError: cannot resolve pseudo id to Post: ~{"label": "Dartmouth East\u2014Burnside", "organizatio…
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 307, in do_handle
report['import'] = self.do_import(juris, args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 211, in do_import
report.update(membership_importer.import_directory(datadir))
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/importers/base.py", line 190, in import_directory
return self.import_data(json_stream())
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/importers/base.py", line 227, in import_data
obj_id, what = self.import_item(data)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/importers/base.py", line 247, in import_item
data = self.prepare_for_db(data)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/importers/memberships.py", line 50, in prepare_for_db
data['post_id'] = self.post_importer.resolve_json_id(data['post_id'])
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/importers/base.py", line 165, in resolve_json_id
raise UnresolvedIdError(errmsg)
pupa.exceptions.UnresolvedIdError: cannot resolve pseudo id to Post: ~{"label": "Dartmouth East\u2014Burnside", "organization__classification": "legislature", "role": "Councillor"}
|
C
|
ca_nt
|
2024-11-20 04:32:13
|
2024-11-20 04:32:13
|
|
A
|
ca_nu
|
2024-11-20 04:45:23
|
2024-11-20 04:45:23
|
|
C
|
ca_on
|
2024-11-20 04:16:34
|
2024-11-20 04:16:35
|
|
C
04:44:36 WARNING scrapelib: got HTTPSConnectionPool(host='www.ajax.ca', port=443): Read timed out. (read timeout=60) sleeping for 10 seconds before retry
|
ca_on_ajax
|
2024-11-20 04:44:46
|
2024-11-20 04:44:47
|
|
C
|
ca_on_belleville
|
2024-11-20 04:48:48
|
2024-11-20 04:48:48
|
|
C
|
ca_on_brampton
|
2024-11-20 04:02:04
|
2024-11-20 04:02:04
|
|
C
|
ca_on_brantford
|
2024-11-20 04:04:19
|
2024-11-20 04:04:19
|
|
C
|
ca_on_burlington
|
2024-11-20 04:21:36
|
2024-11-20 04:21:36
|
|
C
|
ca_on_caledon
|
2024-11-20 04:02:58
|
2024-11-20 04:02:58
|
|
C
|
ca_on_cambridge
|
2024-11-20 04:45:27
|
2024-11-20 04:45:27
|
|
C
|
ca_on_chatham_kent
|
2024-11-20 04:37:26
|
2024-11-20 04:37:26
|
|
C
|
ca_on_clarington
|
2024-11-20 04:03:05
|
2024-11-20 04:03:05
|
|
C
|
ca_on_fort_erie
|
2024-11-20 04:29:52
|
2024-11-20 04:29:52
|
|
C
|
ca_on_georgina
|
2024-11-20 04:31:22
|
2024-11-20 04:31:22
|
|
C
|
ca_on_greater_sudbury
|
2024-11-20 04:49:01
|
2024-11-20 04:49:01
|
|
C
|
ca_on_grimsby
|
2024-11-20 04:31:12
|
2024-11-20 04:31:12
|
|
C
|
ca_on_guelph
|
2024-11-20 04:19:51
|
2024-11-20 04:19:51
|
|
D>
|
ca_on_haldimand_county
|
2024-11-19 04:35:40
|
2024-11-20 04:03:24
|
AssertionError: No councillors found
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 99, in do_scrape
for obj in self.scrape(**kwargs) or []:
File "/app/scrapers/ca_on_haldimand_county/people.py", line 12, in scrape
assert len(councillors), "No councillors found"
AssertionError: No councillors found
|
C
|
ca_on_hamilton
|
2024-11-20 04:42:02
|
2024-11-20 04:42:02
|
|
C
|
ca_on_huron
|
2024-11-20 04:01:56
|
2024-11-20 04:01:56
|
|
D>
04:50:19 WARNING scrapelib: got HTTPSConnectionPool(host='www.kawarthalakes.ca', port=443): Read timed out. (read timeout=60) sleeping for 10 seconds before retry
04:51:29 WARNING scrapelib: got HTTPSConnectionPool(host='www.kawarthalakes.ca', port=443): Read timed out. (read timeout=60) sleeping for 20 seconds before retry
04:52:49 WARNING scrapelib: got HTTPSConnectionPool(host='www.kawarthalakes.ca', port=443): Read timed out. (read timeout=60) sleeping for 40 seconds before retry
|
ca_on_kawartha_lakes
|
2024-11-19 04:23:46
|
2024-11-20 04:54:30
|
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.kawarthalakes.ca', port=443): Read timed out. (read timeout=6…
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 468, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 463, in _make_request
httplib_response = conn.getresponse()
File "/app/.heroku/python/lib/python3.9/http/client.py", line 1377, in getresponse
response.begin()
File "/app/.heroku/python/lib/python3.9/http/client.py", line 320, in begin
version, status, reason = self._read_status()
File "/app/.heroku/python/lib/python3.9/http/client.py", line 281, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/app/.heroku/python/lib/python3.9/socket.py", line 704, in readinto
return self._sock.recv_into(b)
File "/app/.heroku/python/lib/python3.9/ssl.py", line 1275, in recv_into
return self.read(nbytes, buffer)
File "/app/.heroku/python/lib/python3.9/ssl.py", line 1133, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.9/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 802, in urlopen
retries = retries.increment(
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/util/retry.py", line 552, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/packages/six.py", line 770, in reraise
raise value
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 716, in urlopen
httplib_response = self._make_request(
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 470, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 358, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='www.kawarthalakes.ca', port=443): Read timed out. (read timeout=60)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 99, in do_scrape
for obj in self.scrape(**kwargs) or []:
File "/app/scrapers/ca_on_kawartha_lakes/people.py", line 11, in scrape
page = self.lxmlize(COUNCIL_PAGE)
File "/app/scrapers/utils.py", line 206, in lxmlize
response = self.get(url, cookies=cookies, verify=verify)
File "/app/scrapers/utils.py", line 196, in get
return super().get(*args, verify=kwargs.pop("verify", SSL_VERIFY), **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 579, in request
resp = super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 404, in request
resp = super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 232, in request
return super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 175, in request
raise exception_raised
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 122, in request
resp = super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/adapters.py", line 713, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.kawarthalakes.ca', port=443): Read timed out. (read timeout=60)
|
C
|
ca_on_king
|
2024-11-20 04:32:47
|
2024-11-20 04:32:47
|
|
C
|
ca_on_kingston
|
2024-11-20 04:33:22
|
2024-11-20 04:33:22
|
|
C
|
ca_on_kitchener
|
2024-11-20 04:44:51
|
2024-11-20 04:44:51
|
|
C
|
ca_on_lambton
|
2024-11-20 04:54:59
|
2024-11-20 04:54:59
|
|
C
|
ca_on_lasalle
|
2024-11-20 04:19:48
|
2024-11-20 04:19:48
|
|
C
|
ca_on_lincoln
|
2024-11-20 04:17:05
|
2024-11-20 04:17:05
|
|
C
|
ca_on_london
|
2024-11-20 04:44:56
|
2024-11-20 04:44:56
|
|
C
|
ca_on_markham
|
2024-11-20 04:31:45
|
2024-11-20 04:31:46
|
|
C
|
ca_on_milton
|
2024-11-20 04:33:31
|
2024-11-20 04:33:31
|
|
C
|
ca_on_mississauga
|
2024-11-20 04:32:35
|
2024-11-20 04:32:35
|
|
C
|
ca_on_newmarket
|
2024-11-20 04:19:18
|
2024-11-20 04:19:18
|
|
C
|
ca_on_niagara
|
2024-11-20 04:33:18
|
2024-11-20 04:33:19
|
|
C
|
ca_on_niagara_on_the_lake
|
2024-11-20 04:32:40
|
2024-11-20 04:32:40
|
|
C
|
ca_on_north_dumfries
|
2024-11-20 04:13:56
|
2024-11-20 04:13:56
|
|
C
|
ca_on_oakville
|
2024-11-20 04:18:40
|
2024-11-20 04:18:40
|
|
C
|
ca_on_oshawa
|
2024-11-20 04:03:11
|
2024-11-20 04:03:11
|
|
C
|
ca_on_ottawa
|
2024-11-20 04:03:56
|
2024-11-20 04:03:56
|
|
C
|
ca_on_peel
|
2024-11-20 04:32:19
|
2024-11-20 04:32:19
|
|
D>
04:37:43 WARNING scrapelib: got HTTPSConnectionPool(host='www.pickering.ca', port=443): Max retries exceeded with url: /en/city-hall/citycouncil.aspx (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f382d418b80>: Failed to establish a new connection: [Errno 111] Connection refused')) sleeping for 10 seconds before retry
04:37:54 WARNING scrapelib: got HTTPSConnectionPool(host='www.pickering.ca', port=443): Max retries exceeded with url: /en/city-hall/citycouncil.aspx (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f382d513a00>: Failed to establish a new connection: [Errno 111] Connection refused')) sleeping for 20 seconds before retry
04:39:14 WARNING scrapelib: got HTTPSConnectionPool(host='www.pickering.ca', port=443): Read timed out. (read timeout=60) sleeping for 40 seconds before retry
|
ca_on_pickering
|
2024-11-19 04:26:35
|
2024-11-20 04:40:54
|
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.pickering.ca', port=443): Read timed out. (read timeout=60)
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 468, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 463, in _make_request
httplib_response = conn.getresponse()
File "/app/.heroku/python/lib/python3.9/http/client.py", line 1377, in getresponse
response.begin()
File "/app/.heroku/python/lib/python3.9/http/client.py", line 320, in begin
version, status, reason = self._read_status()
File "/app/.heroku/python/lib/python3.9/http/client.py", line 281, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/app/.heroku/python/lib/python3.9/socket.py", line 704, in readinto
return self._sock.recv_into(b)
File "/app/.heroku/python/lib/python3.9/ssl.py", line 1275, in recv_into
return self.read(nbytes, buffer)
File "/app/.heroku/python/lib/python3.9/ssl.py", line 1133, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.9/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 802, in urlopen
retries = retries.increment(
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/util/retry.py", line 552, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/packages/six.py", line 770, in reraise
raise value
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 716, in urlopen
httplib_response = self._make_request(
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 470, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 358, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='www.pickering.ca', port=443): Read timed out. (read timeout=60)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 99, in do_scrape
for obj in self.scrape(**kwargs) or []:
File "/app/scrapers/ca_on_pickering/people.py", line 11, in scrape
page = self.lxmlize(COUNCIL_PAGE)
File "/app/scrapers/utils.py", line 206, in lxmlize
response = self.get(url, cookies=cookies, verify=verify)
File "/app/scrapers/utils.py", line 196, in get
return super().get(*args, verify=kwargs.pop("verify", SSL_VERIFY), **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 579, in request
resp = super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 404, in request
resp = super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 232, in request
return super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 175, in request
raise exception_raised
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 122, in request
resp = super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/adapters.py", line 713, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.pickering.ca', port=443): Read timed out. (read timeout=60)
|
C
|
ca_on_richmond_hill
|
2024-11-20 04:45:46
|
2024-11-20 04:45:46
|
|
C
|
ca_on_sault_ste_marie
|
2024-11-20 04:33:11
|
2024-11-20 04:33:11
|
|
C
|
ca_on_st_catharines
|
2024-11-20 04:58:18
|
2024-11-20 04:58:18
|
|
D>
|
ca_on_thunder_bay
|
2024-09-16 04:10:30
|
2024-11-20 04:31:49
|
requests.exceptions.SSLError: HTTPSConnectionPool(host='www.thunderbay.ca', port=443): Max retries exceeded with url: /en/ci…
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 716, in urlopen
httplib_response = self._make_request(
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 404, in _make_request
self._validate_conn(conn)
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1061, in _validate_conn
conn.connect()
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connection.py", line 419, in connect
self.sock = ssl_wrap_socket(
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 458, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 502, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "/app/.heroku/python/lib/python3.9/ssl.py", line 501, in wrap_socket
return self.sslsocket_class._create(
File "/app/.heroku/python/lib/python3.9/ssl.py", line 1074, in _create
self.do_handshake()
File "/app/.heroku/python/lib/python3.9/ssl.py", line 1343, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: DH_KEY_TOO_SMALL] dh key too small (_ssl.c:1133)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.9/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 802, in urlopen
retries = retries.increment(
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/util/retry.py", line 594, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.thunderbay.ca', port=443): Max retries exceeded with url: /en/city-hall/mayor-and-council-profiles.aspx (Caused by SSLError(SSLError(1, '[SSL: DH_KEY_TOO_SMALL] dh key too small (_ssl.c:1133)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 99, in do_scrape
for obj in self.scrape(**kwargs) or []:
File "/app/scrapers/ca_on_thunder_bay/people.py", line 13, in scrape
page = self.lxmlize(COUNCIL_PAGE, verify=False)
File "/app/scrapers/utils.py", line 206, in lxmlize
response = self.get(url, cookies=cookies, verify=verify)
File "/app/scrapers/utils.py", line 196, in get
return super().get(*args, verify=kwargs.pop("verify", SSL_VERIFY), **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 579, in request
resp = super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 404, in request
resp = super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 232, in request
return super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 122, in request
resp = super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/adapters.py", line 698, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='www.thunderbay.ca', port=443): Max retries exceeded with url: /en/city-hall/mayor-and-council-profiles.aspx (Caused by SSLError(SSLError(1, '[SSL: DH_KEY_TOO_SMALL] dh key too small (_ssl.c:1133)')))
|
C
|
ca_on_toronto
|
2024-11-20 04:45:32
|
2024-11-20 04:45:32
|
|
C
|
ca_on_uxbridge
|
2024-11-20 04:02:34
|
2024-11-20 04:02:34
|
|
C
|
ca_on_vaughan
|
2024-11-20 04:05:24
|
2024-11-20 04:05:24
|
|
C
|
ca_on_waterloo
|
2024-11-20 04:00:44
|
2024-11-20 04:00:44
|
|
C
|
ca_on_waterloo_region
|
2024-11-20 04:55:36
|
2024-11-20 04:55:36
|
|
C
|
ca_on_welland
|
2024-11-20 04:42:14
|
2024-11-20 04:42:14
|
|
C
|
ca_on_wellesley
|
2024-11-20 04:19:09
|
2024-11-20 04:19:09
|
|
C
|
ca_on_whitby
|
2024-11-20 04:41:23
|
2024-11-20 04:41:23
|
|
C
|
ca_on_whitchurch_stouffville
|
2024-11-20 04:02:14
|
2024-11-20 04:02:14
|
|
C
|
ca_on_wilmot
|
2024-11-20 04:16:59
|
2024-11-20 04:16:59
|
|
D>
|
ca_on_windsor
|
2024-10-09 06:56:12
|
2024-11-20 04:18:43
|
IndexError: list index out of range
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 99, in do_scrape
for obj in self.scrape(**kwargs) or []:
File "/app/scrapers/ca_on_windsor/people.py", line 13, in scrape
data = json.loads(self.get(data_url).text.split(" = ")[1])
IndexError: list index out of range
|
D>
|
ca_on_woolwich
|
2024-08-12 04:27:59
|
2024-11-20 04:37:01
|
scrapelib.HTTPError: 404 while retrieving https://www.woolwich.ca/en/council/council.asp
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 99, in do_scrape
for obj in self.scrape(**kwargs) or []:
File "/app/scrapers/ca_on_woolwich/people.py", line 13, in scrape
page = self.lxmlize(COUNCIL_PAGE)
File "/app/scrapers/utils.py", line 206, in lxmlize
response = self.get(url, cookies=cookies, verify=verify)
File "/app/scrapers/utils.py", line 196, in get
return super().get(*args, verify=kwargs.pop("verify", SSL_VERIFY), **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 602, in request
raise HTTPError(resp)
scrapelib.HTTPError: 404 while retrieving https://www.woolwich.ca/en/council/council.asp
|
C
|
ca_pe
|
2024-11-20 04:13:48
|
2024-11-20 04:13:48
|
|
C
|
ca_pe_charlottetown
|
2024-11-20 04:36:26
|
2024-11-20 04:36:26
|
|
C
|
ca_pe_stratford
|
2024-11-20 04:19:41
|
2024-11-20 04:19:41
|
|
C
|
ca_pe_summerside
|
2024-11-20 04:49:15
|
2024-11-20 04:49:15
|
|
C
|
ca_qc
|
2024-11-20 04:11:52
|
2024-11-20 04:11:52
|
|
C
|
ca_qc_beaconsfield
|
2024-11-20 04:56:51
|
2024-11-20 04:56:52
|
|
C
|
ca_qc_brossard
|
2024-11-20 04:21:39
|
2024-11-20 04:21:40
|
|
C
|
ca_qc_cote_saint_luc
|
2024-11-20 04:41:18
|
2024-11-20 04:41:18
|
|
C
|
ca_qc_dollard_des_ormeaux
|
2024-11-20 04:17:19
|
2024-11-20 04:17:19
|
|
C
|
ca_qc_dorval
|
2024-11-20 04:20:10
|
2024-11-20 04:20:10
|
|
C
|
ca_qc_gatineau
|
2024-11-20 04:01:50
|
2024-11-20 04:01:50
|
|
D>
04:17:26 WARNING scrapelib: sleeping for 10 seconds before retry
04:17:36 WARNING scrapelib: sleeping for 20 seconds before retry
04:17:56 WARNING scrapelib: sleeping for 40 seconds before retry
04:18:36 WARNING pupa: could not save RunPlan, no successful runs of ocd-jurisdiction/country:ca/csd:2466102/legislature yet
|
ca_qc_kirkland
|
|
2024-11-20 04:18:36
|
scrapelib.HTTPError: 403 while retrieving https://www.ville.kirkland.qc.ca/portrait-municipal/conseil-municipal/elus-municip…
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 99, in do_scrape
for obj in self.scrape(**kwargs) or []:
File "/app/scrapers/ca_qc_kirkland/people.py", line 11, in scrape
page = self.lxmlize(COUNCIL_PAGE)
File "/app/scrapers/utils.py", line 206, in lxmlize
response = self.get(url, cookies=cookies, verify=verify)
File "/app/scrapers/utils.py", line 196, in get
return super().get(*args, verify=kwargs.pop("verify", SSL_VERIFY), **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 602, in request
raise HTTPError(resp)
scrapelib.HTTPError: 403 while retrieving https://www.ville.kirkland.qc.ca/portrait-municipal/conseil-municipal/elus-municipaux
|
C
|
ca_qc_laval
|
2024-11-20 04:03:18
|
2024-11-20 04:03:19
|
|
C
|
ca_qc_levis
|
2024-11-20 04:54:54
|
2024-11-20 04:54:54
|
|
C
|
ca_qc_longueuil
|
2024-11-20 04:56:46
|
2024-11-20 04:56:46
|
|
C
|
ca_qc_mercier
|
2024-11-20 04:02:27
|
2024-11-20 04:02:27
|
|
C
|
ca_qc_montreal
|
2024-11-20 04:33:42
|
2024-11-20 04:33:42
|
|
C
|
ca_qc_montreal_est
|
2024-11-20 04:19:55
|
2024-11-20 04:19:55
|
|
C
|
ca_qc_pointe_claire
|
2024-11-20 04:32:56
|
2024-11-20 04:32:56
|
|
C
|
ca_qc_quebec
|
2024-11-20 04:54:47
|
2024-11-20 04:54:47
|
|
C
|
ca_qc_saguenay
|
2024-11-20 04:01:25
|
2024-11-20 04:01:25
|
|
C
|
ca_qc_sainte_anne_de_bellevue
|
2024-11-20 04:55:09
|
2024-11-20 04:55:09
|
|
C
|
ca_qc_saint_jean_sur_richelieu
|
2024-11-20 04:16:56
|
2024-11-20 04:16:56
|
|
D>
|
ca_qc_saint_jerome
|
2024-06-25 04:19:49
|
2024-11-20 04:32:52
|
AssertionError: No councillors found
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 99, in do_scrape
for obj in self.scrape(**kwargs) or []:
File "/app/scrapers/ca_qc_saint_jerome/people.py", line 11, in scrape
assert len(councillors), "No councillors found"
AssertionError: No councillors found
|
C
|
ca_qc_senneville
|
2024-11-20 04:20:01
|
2024-11-20 04:20:01
|
|
C
|
ca_qc_sherbrooke
|
2024-11-20 04:01:06
|
2024-11-20 04:01:07
|
|
D>
04:29:57 WARNING scrapelib: sleeping for 10 seconds before retry
04:30:07 WARNING scrapelib: sleeping for 20 seconds before retry
04:30:27 WARNING scrapelib: sleeping for 40 seconds before retry
|
ca_qc_terrebonne
|
2024-08-28 04:18:16
|
2024-11-20 04:31:07
|
scrapelib.HTTPError: 500 while retrieving https://terrebonne.ca/maire/
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 99, in do_scrape
for obj in self.scrape(**kwargs) or []:
File "/app/scrapers/ca_qc_terrebonne/people.py", line 25, in scrape
page = self.lxmlize(url)
File "/app/scrapers/utils.py", line 206, in lxmlize
response = self.get(url, cookies=cookies, verify=verify)
File "/app/scrapers/utils.py", line 196, in get
return super().get(*args, verify=kwargs.pop("verify", SSL_VERIFY), **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 602, in request
raise HTTPError(resp)
scrapelib.HTTPError: 500 while retrieving https://terrebonne.ca/maire/
|
C
|
ca_qc_trois_rivieres
|
2024-11-20 04:36:58
|
2024-11-20 04:36:58
|
|
C
|
ca_qc_westmount
|
2024-11-20 04:19:14
|
2024-11-20 04:19:15
|
|
D>
|
ca_sk
|
2024-09-16 04:12:59
|
2024-11-20 04:36:20
|
AssertionError: No members found
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 99, in do_scrape
for obj in self.scrape(**kwargs) or []:
File "/app/scrapers/ca_sk/people.py", line 15, in scrape
assert len(members), "No members found"
AssertionError: No members found
|
D>
04:42:09 WARNING pupa: validation of Membership cd484a54-a6f9-11ef-b281-5a8f52bca9f1 failed: 2 validation errors:
Value '' for field '<obj>.contact_details[0].value' cannot be blank'
Value '' for field '' does not match regular expression '\A1 \d{3} \d{3}-\d{4}(?: x\d+)?\Z'
|
ca_sk_regina
|
2024-11-18 04:27:59
|
2024-11-20 04:42:09
|
Value '' for field '' does not match regular expression '\A1 \d{3} \d{3}-\d{4}(?: x\d+)?\Z'
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 175, in validate
validator.validate(self.as_dict(), schema)
File "/app/.heroku/python/lib/python3.9/site-packages/validictory/validator.py", line 616, in validate
raise MultipleValidationError(self._errors)
validictory.validator.MultipleValidationError: 2 validation errors:
Value '' for field '<obj>.contact_details[0].value' cannot be blank'
Value '' for field '' does not match regular expression '\A1 \d{3} \d{3}-\d{4}(?: x\d+)?\Z'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 102, in do_scrape
self.save_object(iterobj)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 93, in save_object
self.save_object(obj)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 89, in save_object
raise ve
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 85, in save_object
obj.validate()
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 177, in validate
raise ScrapeValueError('validation of {} {} failed: {}'.format(
pupa.exceptions.ScrapeValueError: validation of Membership cd484a54-a6f9-11ef-b281-5a8f52bca9f1 failed: 2 validation errors:
Value '' for field '<obj>.contact_details[0].value' cannot be blank'
Value '' for field '' does not match regular expression '\A1 \d{3} \d{3}-\d{4}(?: x\d+)?\Z'
|
D>
04:42:22 WARNING scrapelib: got HTTPSConnectionPool(host='saskatoonopendataconfig.blob.core.windows.net', port=443): Max retries exceeded with url: /converteddata/MayorAndCityCouncilContactInformation.csv (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f382d418fd0>: Failed to establish a new connection: [Errno -2] Name or service not known')) sleeping for 10 seconds before retry
04:42:32 WARNING scrapelib: got HTTPSConnectionPool(host='saskatoonopendataconfig.blob.core.windows.net', port=443): Max retries exceeded with url: /converteddata/MayorAndCityCouncilContactInformation.csv (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f382d5eb640>: Failed to establish a new connection: [Errno -2] Name or service not known')) sleeping for 20 seconds before retry
04:42:52 WARNING scrapelib: got HTTPSConnectionPool(host='saskatoonopendataconfig.blob.core.windows.net', port=443): Max retries exceeded with url: /converteddata/MayorAndCityCouncilContactInformation.csv (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f382d5eb760>: Failed to establish a new connection: [Errno -2] Name or service not known')) sleeping for 40 seconds before retry
|
ca_sk_saskatoon
|
2024-09-12 04:05:10
|
2024-11-20 04:43:32
|
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='saskatoonopendataconfig.blob.core.windows.net', port=443): Ma…
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/util/connection.py", line 72, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/app/.heroku/python/lib/python3.9/socket.py", line 954, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 716, in urlopen
httplib_response = self._make_request(
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 404, in _make_request
self._validate_conn(conn)
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1061, in _validate_conn
conn.connect()
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connection.py", line 363, in connect
self.sock = conn = self._new_conn()
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f382d41f250>: Failed to establish a new connection: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.9/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/connectionpool.py", line 802, in urlopen
retries = retries.increment(
File "/app/.heroku/python/lib/python3.9/site-packages/urllib3/util/retry.py", line 594, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='saskatoonopendataconfig.blob.core.windows.net', port=443): Max retries exceeded with url: /converteddata/MayorAndCityCouncilContactInformation.csv (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f382d41f250>: Failed to establish a new connection: [Errno -2] Name or service not known'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 99, in do_scrape
for obj in self.scrape(**kwargs) or []:
File "/app/scrapers/utils.py", line 397, in scrape
reader = self.csv_reader(
File "/app/scrapers/utils.py", line 240, in csv_reader
response = self.get(url, **kwargs)
File "/app/scrapers/utils.py", line 196, in get
return super().get(*args, verify=kwargs.pop("verify", SSL_VERIFY), **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 579, in request
resp = super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 404, in request
resp = super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 232, in request
return super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 175, in request
raise exception_raised
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 122, in request
resp = super().request(
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/adapters.py", line 700, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='saskatoonopendataconfig.blob.core.windows.net', port=443): Max retries exceeded with url: /converteddata/MayorAndCityCouncilContactInformation.csv (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f382d41f250>: Failed to establish a new connection: [Errno -2] Name or service not known'))
|
D>
04:57:00 WARNING scrapelib: sleeping for 10 seconds before retry
04:57:10 WARNING scrapelib: sleeping for 20 seconds before retry
04:57:31 WARNING scrapelib: sleeping for 40 seconds before retry
|
ca_yt
|
2024-09-11 05:27:53
|
2024-11-20 04:58:11
|
scrapelib.HTTPError: 403 while retrieving https://yukonassembly.ca/mlas
Traceback (most recent call last):
File "/app/reports/utils.py", line 73, in scrape_people
report.report = subcommand.handle(args, other)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 260, in handle
return self.do_handle(args, other, juris)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 305, in do_handle
report['scrape'] = self.do_scrape(juris, args, scrapers)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/cli/commands/update.py", line 173, in do_scrape
report[scraper_name] = scraper.do_scrape(**scrape_args)
File "/app/.heroku/python/lib/python3.9/site-packages/pupa/scrape/base.py", line 99, in do_scrape
for obj in self.scrape(**kwargs) or []:
File "/app/scrapers/ca_yt/people.py", line 16, in scrape
page = self.lxmlize(COUNCIL_PAGE, cookies=COOKIES, user_agent=USER_AGENT)
File "/app/scrapers/utils.py", line 206, in lxmlize
response = self.get(url, cookies=cookies, verify=verify)
File "/app/scrapers/utils.py", line 196, in get
return super().get(*args, verify=kwargs.pop("verify", SSL_VERIFY), **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/scrapelib/__init__.py", line 602, in request
raise HTTPError(resp)
scrapelib.HTTPError: 403 while retrieving https://yukonassembly.ca/mlas
|