I am currently facing this issue with the AccuWeather site where the ReceiveHTTPFile() or even HTTPRequest() functions don't work.
Here's an example:
Code: Select all
https://www.accuweather.com/en/br/s%C3%A3o-paulo/45881/weather-forecast/45881Code: Select all
https://www.accuweather.com/en/br/s%C3%A3o-paulo/45881/weather-forecast/45881Hi Kiffi. You're right, and I'm already developing with AccuWeather through their APIs. But access to them is subscription based, and even the free tier allows only limited access. Thus I was finding a workaround, by trying to read the results directly from their pubic webpages.
Absolutely, Fred. I took your suggestion in the other thread and adapted the REST API example to use the HTTPRequest() function. It now works with both http and https as well.
Code: Select all
https://www.accuweather.com/en/br/sao-paulo/45881/weather-forecast/45881Thanks for pointing that out, Bernd. I was so focused on finding the fault that I missed the cause.
Code: Select all
HTTPRequest=HTTPRequest(#PB_HTTP_Get,URL$)
If HTTPRequest
Status=HTTPInfo(HTTPRequest,#PB_HTTP_StatusCode)
Select Status
Case "200"
HTMLContent$=HTTPInfo(HTTPRequest,#PB_HTTP_Response)
If HTMLContent$
; * Here, see comment below
EndIf
;Case [...]
EndSelect
EndIf
Code: Select all
EnableExplicit
Structure COUNTRY
Name.s
URL.s
EndStructure
#IHTML_REGEXHTMLBAL="</?\w+:?\w*((\s+(\w+-?+)+:?\w?+(\s*=\s*(?:"+#DQUOTE$+".*?"+#DQUOTE$+"|'.*?'|[^'"+#DQUOTE$+">\s]+))?)+\s*|\s*)/?>"
#REGEX_HTMLBALISE=0
Global NewList Countries.COUNTRY()
Procedure.a Pc_ContentAnalysis(ArgHTMLContent.s)
Protected.i AncPosition=1 ; Position précédente dans la variable TexteFichier
Protected.i NouvPosition ; Position actuelle dans la variable TexteFichier après appel RegEx
Protected.i LongChaine ; Longueur de la chaine textuelle entre deux balises
Protected.a Commentaire ; Commentaire HTML en cours de traitement (Booléen)
Protected.a BaliseListeTrouvee ; Table contenant les infos trouvée
Protected.a BalisePaysTrouvee ; Balise pays trouvée
Protected.s TexteHTML
; Suppression des LF, CR & TAB
;ArgHTMLContent=ReplaceString(ReplaceString(ReplaceString(ArgHTMLContent,Chr(10),""),Chr(13),""),Chr(9),"")
ReplaceString(ArgHTMLContent,Chr(10)," ",#PB_String_InPlace)
ReplaceString(ArgHTMLContent,Chr(13)," ",#PB_String_InPlace)
ReplaceString(ArgHTMLContent,Chr(9)," ",#PB_String_InPlace)
ArgHTMLContent=Trim(ArgHTMLContent)
If PeekA(@ArgHTMLContent)=$FF And PeekA(@ArgHTMLContent+1)=$FE:ArgHTMLContent=Mid(ArgHTMLContent,2):EndIf
; Test entête fichier HTML
If UCase(Left(ArgHTMLContent,15))<>"<!DOCTYPE HTML>"
MessageRequester("Analyse Balises et attributs HTML","Le fichier ne semble pas être un page HTML valide",#PB_MessageRequester_Error)
ProcedureReturn #False
EndIf
; Boucle lecture des balises
If CreateRegularExpression(#REGEX_HTMLBALISE,#IHTML_REGEXHTMLBAL)
If ExamineRegularExpression(#REGEX_HTMLBALISE,ArgHTMLContent)
While NextRegularExpressionMatch(#REGEX_HTMLBALISE)
NouvPosition=RegularExpressionMatchPosition(#REGEX_HTMLBALISE)
; Analyse du contenu entre deux balises ou commentaires HTML "<!-- blabla -->"
If AncPosition<>NouvPosition ; Texte ou commentaire
LongChaine=NouvPosition-AncPosition
TexteHTML=Mid(ArgHTMLContent,AncPosition,LongChaine)
If Left(LTrim(TexteHTML),4)="<!--" ; Balise début commentaire
If Right(RTrim(TexteHTML),3)<>"-->" ; Commentaire encadrant:"<!-- blabla > <blabla> blabla <!-->"
Commentaire=#True
EndIf
ElseIf Right(RTrim(TexteHTML),3)="-->" ; Balise fin commentaire encadrant
Commentaire=#False
ElseIf BalisePaysTrouvee
Countries()\Name=TexteHTML
BalisePaysTrouvee=#False
EndIf
EndIf
; Analyse balise HTML
LongChaine=RegularExpressionMatchLength(#REGEX_HTMLBALISE)
TexteHTML=RegularExpressionMatchString(#REGEX_HTMLBALISE)
If Left(TexteHTML,18)="<div class="+Chr(34)+"lists"+Chr(34)
BaliseListeTrouvee=#True
ElseIf BaliseListeTrouvee
If Left(TexteHTML,20)="<a href="+Chr(34)+"/countries/"
BalisePaysTrouvee=#True
AddElement(Countries())
Countries()\URL=StringField(StringField(TexteHTML,2,"href="+Chr(34)),1,Chr(34))
ElseIf Left(TexteHTML,31)="<div class="+Chr(34)+"hsg-width-sidebar"+Chr(34)+">"
Break
EndIf
EndIf
AncPosition=NouvPosition+LongChaine
Wend
EndIf
EndIf
EndProcedure
Procedure.s Fc_WebsiteRequest(ArgURL.s)
Protected.i HTTPRequest
Protected.s Status,HTMLContent
Debug "Request sent to the site"
HTTPRequest=HTTPRequest(#PB_HTTP_Get,ArgURL)
If HTTPRequest
Status=HTTPInfo(HTTPRequest,#PB_HTTP_StatusCode)
Debug "Status: "+Status
Select Status
Case "200"
HTMLContent=HTTPInfo(HTTPRequest,#PB_HTTP_Response)
If HTMLContent
Pc_ContentAnalysis(HTMLContent)
EndIf
HTMLContent=""
;Case [...]
EndSelect
Else
Debug "Error!"
EndIf
EndProcedure
;
Fc_WebsiteRequest("https://history.state.gov/countries/all")
If ListSize(Countries())
Debug ~"*--------------------------*\nCountry list:"
ForEach Countries()
Debug " "+Countries()\Name+": URL="+Countries()\URL
Next
Else
Debug "No countries"
EndIf
Seems like the best solution would be for #PB_Web_HtmlCode to be officially supported on Mac and Linux.TI-994A wrote: Mon Aug 12, 2024 4:02 amI'm already getting the desired result through WebGadget() and the GetGadgetItemText() function with the #PB_Web_HtmlCode flag. But sadly, this is indicated to be a Windows-only solution - although it appears to work on MacOS as well.
As per the documentation, the HTML extraction feature is supported only on the Windows platform. But strangely enough, it works well on macOS Sonoma (PureBasic v6.11 LTS arm64) and macOS Catalina (PureBasic 5.73 LTS x64).BarryG wrote: Tue Aug 13, 2024 3:39 am...for #PB_Web_HtmlCode to be officially supported on Mac and Linux.