We need you!

We're working hard on the next version of Developer Fusion. Let us know what you think we should be up to!

Members

Technology Zones

Articles

Hosted By

MaximumASP

Info

[C++] Winsock HTTP GET

Last post 05-07-2008 12:08 PM by Wolfe. 3 replies.
Page 1 of 1 (4 items)
Sort Posts: Previous Next
  • 05-26-2007 5:31 PM

    • PoZZyX
    • Not Ranked
    • Joined on 05-26-2007
    • Switzerland
    • New Member
    • Points 20

    [C++] Winsock HTTP GET

    Hello,

    I'm writing in C++ a program that download a webpage sourcecode. I'm using Winsock. Here is my sourcecode :

    string get_source(string url)
    {
    WSADATA WSAData;
    WSAStartup(MAKEWORD(2,0), &WSAData);

    SOCKET sock;
    SOCKADDR_IN sin;

    char buffer[1024];


    string srequete = "GET /Simpsons/ HTTP/1.1\r\n";
    srequete += "Host: epguides.com\r\n";
    srequete += "Connection: close\r\n";
    srequete += "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n";
    srequete += "Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3\r\n";
    srequete += "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n";
    srequete += "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3\r\n";
    srequete += "Referer: http://pozzyx.net/\r\n";
    srequete += "\r\n";

    size_t requete_taille = srequete.size() + 1;

    char crequete[requete_taille];
    strncpy( crequete, srequete.c_str(), requete_taille );

    int i = 0;
    string source = "";

    sock = socket(AF_INET, SOCK_STREAM, 0);

    sin.sin_addr.s_addr = inet_addr("216.239.136.165"); // epguides.com
    sin.sin_family = AF_INET;
    sin.sin_port = htons(80); // port HTTP.

    connect(sock, (SOCKADDR *)&sin, sizeof(sin)); // on se connecte sur le site web.
    send(sock, crequete, strlen(crequete), 0); // on envoie la requête HTTP.


    do
    {
    i = recv(sock, buffer, sizeof(buffer), 0); // le buffer récupère les données reçues.
    source += buffer;
    } while (i != 0);


    closesocket(sock); // on ferme le socket.
    WSACleanup();

    return source;
    }


















































    If I try to download http://epguides.com/Simpsons/, no problem ==> :

    HTTP/1.1 200 OK
    Transfer-Encoding: chunked
    Connection: close
    Date: Sat, 26 May 2007 16:27:56 GMT
    Server: Microsoft-IIS/6.0
    --------------: -----
    Content-Type: text/html

    106E
    <html>
    <head>
    <title>The Simpsons (a Titles &amp; Air Dates Guide)</title>













    But when I try with epguides.com/Smallville, I don't get the correct source :
    HTTP/1.1 200 OK
    Transfer-Encoding: chunked
    Connection: close
    Date: Sat, 26 May 2007 16:30:08 GMT
    Server: Microsoft-IIS/6.0
    --------------: -----
    Content-Type: text/html

    1023
    <td><a target="_blank" href="../search/">SEARCH<br />epguides<br />&amp; TV.com</a></td
    <td><a href="../FAQ/">FAQ</a></td>
    ...











    Can anyone help me please ?

    Sorry for my english but the french coummunities don't find my problem :D

    Thanks















    • Post Points: 15
  • 06-02-2007 5:37 AM In reply to

    • pcmattman
    • Top 150 Contributor
    • Joined on 01-03-2006
    • Fanatic Member
    • Points 1,385

    Re: [C++] Winsock HTTP GET

    Everything there looks OK from the HTTP side of things.

    I looked at http://www.epguides.com/Smallville/ in my browser (Firefox)
    and I had no problems... BUT: I can see something odd in your code...

    string srequete = "GET /Simpsons/ HTTP/1.1\r\n";

    Shouldn't that be changing?

    I'd also change your request to the bare minimum required by
    RFC2616 (Hypertext Transfer Protocol, HTTP/1.1):

    string srequete = "GET /Simpsons/ HTTP/1.1\r\n";
    srequete += "Host: epguides.com\r\n";
    srequete += "Connection: close\r\n";
    srequete += "\r\n";

    Post here if you continue to have issues.
















    gdt_set_gate(5, (unsigned long) &global_tss, sizeof( TSS_t ) + 0xFFFF, 0x89, 0x0F );

    http://www.sourceforge.net/projects/mattise - My hobby operating system (Intel, 32-bit only).
    • Post Points: 10
  • 06-03-2007 7:02 PM In reply to

    • PoZZyX
    • Not Ranked
    • Joined on 05-26-2007
    • Switzerland
    • New Member
    • Points 20

    Re: [C++] Winsock HTTP GET

    Thanks for your help, no problem when I try to get the sourcecode for simpsons. But when I replace GET /Simpsons/ whit GET /Smallville/, I get this start of source :
    HTTP/1.1 200 OK
    Transfer-Encoding: chunked
    Connection: close
    Date: Sun, 03 Jun 2007 17:59:49 GMT
    Server: Microsoft-IIS/6.0
    --------------: -----
    Content-Type: text/html

    1023
    <td><a target="_blank" href="../search/">SEARCH<br />epguides<br />&amp; TV.com</a></td>l&as_q=&num=30&q=%22Smallville%22">Related links</a><br>via<br><a target="visit" href="http://www.google.com/">Google</a></font></td>
    <td><a href="../FAQ/">FAQ</a></td>
    <td><a href="../menu/">All<br />Shows<br />Menu</a></td>
    <td><a href="../menu/current.shtml"><strong>Current<br />Shows<br />Menu</strong></a></td>
    <!--
    <td><font size='-1'><a target="_blank" href="../search/">SEARCH<br />epguides<br />&amp; TV.com</a></font></td>
    <td><font size='-1'><a href="../FAQ/">FAQ</a></font></td>
    <td><font size='-1'><a href="../menu/">All<br />Shows<br />Menu</a></font></td>
    <td><font size='-1'><a href="../menu/current.shtml"><strong>Current<br />Shows<br />Menu</strong></a></font></td>
    _____ ______ ____________ ___________ ___________________________________________
    Pilot

           P- 1
    B91
    <a target="_blank" href="http://www.tv.com/smallville/first-pilot/episode/64561/summary.html">First Pilot</a>

    Season 1

      1.   1- 1     475165     16 Oct 01   <a target="_blank" href="http://www.tv.com/smallville/pilot/episode/48011/summary.html">Pilot</a>
      2.   1- 2     227601     23 Oct 01   <a target="_blank" href="http://www.tv.com/smallville/metamorphosis/episode/64560/summary.html">Metamorphosis</a>


    And I don't have the same result when I go on the page with Firefox.



    Thanks for trying to help me :D

































    • Post Points: 5
  • 05-07-2008 12:08 PM In reply to

    • Wolfe
    • Not Ranked
    • Joined on 05-07-2008
    • Slovakia
    • New Member
    • Points 5

    Re: [C++] Winsock HTTP GET

    Thank you I used your function :P (though I remade it (for proxy and some fixes)) It made mess, because you always copied whole buffer, even he was not all filled ("recv" didn't read all bytes to buffer <
    first time you read into buffer[ 6 ] "ABCDEF"
    but second time only "123"
    but you put into string "123DEF".
    
    Ok now I have to figure out, how to get rid of http head :) Perhaps through "string srequete = "HEAD /Simpsons/ HTTP/1.1\r\n";" and remove that string I get from whole GET document.
        do
        {
            i = recv(sock, buffer, sizeof(buffer), 0); 
    
            strbuff = ""; // clean
            strbuff = buffer;
            source += strbuff.substr(0,i);
            //cout <<<<
    • Post Points: 5
Page 1 of 1 (4 items)