Members

Technology Zones

Articles

Hosted By

MaximumASP

Info

[C++] Winsock HTTP GET

Last post 05-07-2008 12:08 PM by Wolfe. 3 replies.
Page 1 of 1 (4 items)
Sort Posts: Previous Next
  • 05-26-2007 5:31 PM

    • PoZZyX
    • Not Ranked
    • Joined on 05-26-2007
    • Switzerland
    • New Member
    • Points 20

    [C++] Winsock HTTP GET

    Hello,

    I'm writing in C++ a program that download a webpage sourcecode. I'm using Winsock. Here is my sourcecode :

    string get_source(string url)
    {
    WSADATA WSAData;
    WSAStartup(MAKEWORD(2,0), &WSAData);

    SOCKET sock;
    SOCKADDR_IN sin;

    char buffer[1024];


    string srequete = "GET /Simpsons/ HTTP/1.1\r\n";
    srequete += "Host: epguides.com\r\n";
    srequete += "Connection: close\r\n";
    srequete += "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n";
    srequete += "Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3\r\n";
    srequete += "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n";
    srequete += "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3\r\n";
    srequete += "Referer: http://pozzyx.net/\r\n";
    srequete += "\r\n";

    size_t requete_taille = srequete.size() + 1;

    char crequete[requete_taille];
    strncpy( crequete, srequete.c_str(), requete_taille );

    int i = 0;
    string source = "";

    sock = socket(AF_INET, SOCK_STREAM, 0);

    sin.sin_addr.s_addr = inet_addr("216.239.136.165"); // epguides.com
    sin.sin_family = AF_INET;
    sin.sin_port = htons(80); // port HTTP.

    connect(sock, (SOCKADDR *)&sin, sizeof(sin)); // on se connecte sur le site web.
    send(sock, crequete, strlen(crequete), 0); // on envoie la requête HTTP.


    do
    {
    i = recv(sock, buffer, sizeof(buffer), 0); // le buffer récupère les données reçues.
    source += buffer;
    } while (i != 0);


    closesocket(sock); // on ferme le socket.
    WSACleanup();

    return source;
    }


















































    If I try to download http://epguides.com/Simpsons/, no problem ==> :

    HTTP/1.1 200 OK
    Transfer-Encoding: chunked
    Connection: close
    Date: Sat, 26 May 2007 16:27:56 GMT
    Server: Microsoft-IIS/6.0
    --------------: -----
    Content-Type: text/html

    106E
    <html>
    <head>
    <title>The Simpsons (a Titles &amp; Air Dates Guide)</title>













    But when I try with epguides.com/Smallville, I don't get the correct source :
    HTTP/1.1 200 OK
    Transfer-Encoding: chunked
    Connection: close
    Date: Sat, 26 May 2007 16:30:08 GMT
    Server: Microsoft-IIS/6.0
    --------------: -----
    Content-Type: text/html

    1023
    <td><a target="_blank" href="../search/">SEARCH<br />epguides<br />&amp; TV.com</a></td
    <td><a href="../FAQ/">FAQ</a></td>
    ...











    Can anyone help me please ?

    Sorry for my english but the french coummunities don't find my problem :D

    Thanks















    • Post Points: 15
  • 06-02-2007 5:37 AM In reply to

    • pcmattman
    • Top 150 Contributor
    • Joined on 01-03-2006
    • Fanatic Member
    • Points 1,385

    Re: [C++] Winsock HTTP GET

    Everything there looks OK from the HTTP side of things.

    I looked at http://www.epguides.com/Smallville/ in my browser (Firefox)
    and I had no problems... BUT: I can see something odd in your code...

    string srequete = "GET /Simpsons/ HTTP/1.1\r\n";

    Shouldn't that be changing?

    I'd also change your request to the bare minimum required by
    RFC2616 (Hypertext Transfer Protocol, HTTP/1.1):

    string srequete = "GET /Simpsons/ HTTP/1.1\r\n";
    srequete += "Host: epguides.com\r\n";
    srequete += "Connection: close\r\n";
    srequete += "\r\n";

    Post here if you continue to have issues.
















    gdt_set_gate(5, (unsigned long) &global_tss, sizeof( TSS_t ) + 0xFFFF, 0x89, 0x0F );

    http://www.sourceforge.net/projects/mattise - My hobby operating system (Intel, 32-bit only).
    • Post Points: 10
  • 06-03-2007 7:02 PM In reply to

    • PoZZyX
    • Not Ranked
    • Joined on 05-26-2007
    • Switzerland
    • New Member
    • Points 20

    Re: [C++] Winsock HTTP GET

    Thanks for your help, no problem when I try to get the sourcecode for simpsons. But when I replace GET /Simpsons/ whit GET /Smallville/, I get this start of source :
    HTTP/1.1 200 OK
    Transfer-Encoding: chunked
    Connection: close
    Date: Sun, 03 Jun 2007 17:59:49 GMT
    Server: Microsoft-IIS/6.0
    --------------: -----
    Content-Type: text/html

    1023
    <td><a target="_blank" href="../search/">SEARCH<br />epguides<br />&amp; TV.com</a></td>l&as_q=&num=30&q=%22Smallville%22">Related links</a><br>via<br><a target="visit" href="http://www.google.com/">Google</a></font></td>
    <td><a href="../FAQ/">FAQ</a></td>
    <td><a href="../menu/">All<br />Shows<br />Menu</a></td>
    <td><a href="../menu/current.shtml"><strong>Current<br />Shows<br />Menu</strong></a></td>
    <!--
    <td><font size='-1'><a target="_blank" href="../search/">SEARCH<br />epguides<br />&amp; TV.com</a></font></td>
    <td><font size='-1'><a href="../FAQ/">FAQ</a></font></td>
    <td><font size='-1'><a href="../menu/">All<br />Shows<br />Menu</a></font></td>
    <td><font size='-1'><a href="../menu/current.shtml"><strong>Current<br />Shows<br />Menu</strong></a></font></td>
    _____ ______ ____________ ___________ ___________________________________________
    Pilot

           P- 1
    B91
    <a target="_blank" href="http://www.tv.com/smallville/first-pilot/episode/64561/summary.html">First Pilot</a>

    Season 1

      1.   1- 1     475165     16 Oct 01   <a target="_blank" href="http://www.tv.com/smallville/pilot/episode/48011/summary.html">Pilot</a>
      2.   1- 2     227601     23 Oct 01   <a target="_blank" href="http://www.tv.com/smallville/metamorphosis/episode/64560/summary.html">Metamorphosis</a>


    And I don't have the same result when I go on the page with Firefox.



    Thanks for trying to help me :D

































    • Post Points: 5
  • 05-07-2008 12:08 PM In reply to

    • Wolfe
    • Not Ranked
    • Joined on 05-07-2008
    • Slovakia
    • New Member
    • Points 5

    Re: [C++] Winsock HTTP GET

    Thank you I used your function :P (though I remade it (for proxy and some fixes)) It made mess, because you always copied whole buffer, even he was not all filled ("recv" didn't read all bytes to buffer <
    first time you read into buffer[ 6 ] "ABCDEF"
    but second time only "123"
    but you put into string "123DEF".
    
    Ok now I have to figure out, how to get rid of http head :) Perhaps through "string srequete = "HEAD /Simpsons/ HTTP/1.1\r\n";" and remove that string I get from whole GET document.
        do
        {
            i = recv(sock, buffer, sizeof(buffer), 0); 
    
            strbuff = ""; // clean
            strbuff = buffer;
            source += strbuff.substr(0,i);
            //cout <<<<
    • Post Points: 5
Page 1 of 1 (4 items)