<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>hoverkraft archivos - Detrás del último no va nadie</title>
	<atom:link href="https://blog.krusher.net/tag/hoverkraft/feed/" rel="self" type="application/rss+xml" />
	<link></link>
	<description>Porque alguien tenía que pensar en los peces</description>
	<lastBuildDate>Fri, 15 Jan 2016 18:44:51 +0000</lastBuildDate>
	<language>es</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://blog.krusher.net/wp-content/uploads/2016/02/cropped-detras-del-ultimo-no-va-nadie-icon-32x32.jpg</url>
	<title>hoverkraft archivos - Detrás del último no va nadie</title>
	<link></link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Webcrawler java Hoverkraft</title>
		<link>https://blog.krusher.net/2016/01/webcrawler-java-hoverkraft/</link>
					<comments>https://blog.krusher.net/2016/01/webcrawler-java-hoverkraft/#respond</comments>
		
		<dc:creator><![CDATA[Krusher]]></dc:creator>
		<pubDate>Fri, 15 Jan 2016 18:39:16 +0000</pubDate>
				<category><![CDATA[Programación]]></category>
		<category><![CDATA[hoverkraft]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[webcrawler]]></category>
		<guid isPermaLink="false">http://blog.krusher.net/?p=89</guid>

					<description><![CDATA[<p>He estado trasteando una forma de simular un navegador en Java. Hasta ahora he usado JMeter, que es tremendamente potente, configurable y para pruebas de carga es imprescindible. No obstante hay dos detalles que no me convencen: a veces uno quiere algo programático en lugar de declarativo, y segundo el JMeter es durillo de entender &#8230; <a href="https://blog.krusher.net/2016/01/webcrawler-java-hoverkraft/" class="more-link">Continuar leyendo<span class="screen-reader-text"> "Webcrawler java Hoverkraft"</span></a></p>
<p>La entrada <a href="https://blog.krusher.net/2016/01/webcrawler-java-hoverkraft/">Webcrawler java Hoverkraft</a> se publicó primero en <a href="https://blog.krusher.net">Detrás del último no va nadie</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>He estado trasteando una forma de simular un navegador en Java. Hasta ahora he usado <strong>JMeter</strong>, que es tremendamente potente, configurable y para pruebas de carga es imprescindible. No obstante hay dos detalles que no me convencen: a veces uno quiere algo programático en lugar de declarativo, y segundo el JMeter es durillo de entender y configurar. Además, no siempre es necesario tener métricas exóticas o peticiones de Ajax, a veces sólo queremos acceder a algún servicio web o analizar una web para bajar ficheros o automatizar tareas.</p>
<p>Aunque hay un montón de soluciones disponibles, me he propuesto hacer un pequeño simulador de navegador (un <em>webcrawler</em>) en Java, que permita fácilmente y de forma sencilla implementar tareas. Le he puesto a la criatura <strong>Hoverkraft</strong>. Dejo por aquí el código fuente.</p>
<p><span id="more-89"></span></p>
<pre class="brush: java; title: ; notranslate">
package net.krusher.hoverkraft;

import java.io.DataOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.io.Serializable;
import java.net.HttpCookie;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import java.net.URLEncoder;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Scanner;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

/**
 * Hoverkraft - Das Web Boot
 * @author Axelei
 *
 */
public class Hoverkraft implements Serializable {
	
	/**
	 * 
	 */
	private static final long serialVersionUID = -4846381367781986634L;
	public static final String USER_AGENT = &quot; Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36&quot;;
	public static final String ACCEPT = &quot;text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8&quot;;
	public static final String ACCEPT_LANGUAGE = &quot;es,en-US;q=0.8,en;q=0.6&quot;;
	public static final int MAX_TRIES = 5;
	private static final String LINE_FEED = &quot;\r\n&quot;;
	
	public enum Method {
		GET, POST
	}
	
	private URL url;
	private HttpURLConnection connection;
	private String content;
	private int code = -1;
	private Method method;
	private String referer;
	private Map&lt;String, String&gt; postVars = new HashMap&lt;String, String&gt;();
	private Map&lt;String, HttpCookie&gt; cookies = new HashMap&lt;String, HttpCookie&gt;();
	private Map&lt;String, File&gt; uploads = new HashMap&lt;String, File&gt;();
	
	public Document getXml() {
		return Jsoup.parse(content);
	}
	
	public Hoverkraft() {
		super();
	}

	/**
	 * Set sail to a destination
	 * @param url
	 * @throws MalformedURLException
	 */
	public void go(String url, Method method) throws MalformedURLException {
		this.url = new URL(url);
		this.method = method;
	}
	
	public void go(String url) throws MalformedURLException {
		go(url, Method.GET);
	}
	
	public void disconnect() {
		connection.disconnect();
	}
	
	public void setPostVars(Map&lt;String, String&gt; vars) {
		this.postVars = vars;
	}
	
	public void setUploads(Map&lt;String, File&gt; uploads) {
		this.uploads = uploads;
	}
	
	/**
	 * Executes the web petition
	 * @throws IOException 
	 */
	public void execute() throws IOException {
		
		boolean redirect = false;
		int tries = 0;
		
		do {
			connection = (HttpURLConnection) url.openConnection();
			setProperties(connection);
			
			connection.connect();
			
			code = connection.getResponseCode();
			
			// Redirecciones

			if (code != HttpURLConnection.HTTP_OK) {
				if (code == HttpURLConnection.HTTP_MOVED_TEMP
					|| code == HttpURLConnection.HTTP_MOVED_PERM
					|| code == HttpURLConnection.HTTP_SEE_OTHER
					)
				redirect = true;
			}
		 
			if (redirect) {
				go(connection.getHeaderField(&quot;Location&quot;), method);
			}
		} while (redirect == true &amp;&amp; tries++ &lt; MAX_TRIES);
		
		InputStream is = (InputStream) connection.getContent();
		content = stream2string(is);
		
		referer = url.toString();

		Map&lt;String, List&lt;String&gt;&gt; headers = connection.getHeaderFields();
		
		/**
		 * Obtener cookies
		 */
		if (headers.containsKey(&quot;Set-Cookie&quot;)) {
			List&lt;String&gt; cookiesObtenidas = headers.get(&quot;Set-Cookie&quot;);
			
			for (String cookie : cookiesObtenidas) {
				List&lt;HttpCookie&gt; cookiesParseadas = HttpCookie.parse(cookie);
				for (HttpCookie cookieParseada : cookiesParseadas) {
					
					if (cookies.containsKey(cookieParseada.getName())) {
						cookies.remove(cookieParseada.getName());
					}
					cookies.put(cookieParseada.getName(), cookieParseada);
				}
			}
		}
	}
	
	private void setProperties(HttpURLConnection connection) throws IOException {
		
		// Cabeceras
		connection.setRequestProperty(&quot;user-agent&quot;, USER_AGENT);
		connection.setRequestProperty(&quot;accept&quot;, ACCEPT);
		connection.setRequestProperty(&quot;accept-language&quot;, ACCEPT_LANGUAGE);
		if (referer != null) {
			connection.setRequestProperty(&quot;referer&quot;, referer);
		}
		connection.setRequestMethod(method.toString());
		
		// Cookies
		for (HttpCookie cookie : cookies.values()) {
			connection.setRequestProperty(&quot;Cookie&quot;, cookie.toString());
		}
		
		connection.setDoOutput(false);
		
		// Variables Post y demÃ¡s
		if (method == Method.POST &amp;&amp; !postVars.isEmpty() &amp;&amp; uploads.isEmpty()) {
			
			connection.setDoOutput(true);
			StringBuffer urlParameters = new StringBuffer();
			
			for (Entry&lt;String, String&gt; var : postVars.entrySet()) {
				urlParameters.append(URLEncoder.encode(var.getKey(), &quot;UTF-8&quot;) + &quot;=&quot; + URLEncoder.encode(var.getValue(), &quot;UTF-8&quot;) + &quot;&amp;&quot;);
			}
			
			if (urlParameters.charAt(urlParameters.length() - 1) == '&amp;') {
				urlParameters.deleteCharAt(urlParameters.length() - 1);
			}
			postVars.clear();
			
			connection.setRequestProperty(&quot;Content-Type&quot;, &quot;application/x-www-form-urlencoded&quot;); 
			connection.setRequestProperty(&quot;charset&quot;, &quot;utf-8&quot;);
			connection.setRequestProperty(&quot;Content-Length&quot;, Integer.toString(urlParameters.toString().getBytes().length));
			
			DataOutputStream wr = new DataOutputStream(connection.getOutputStream());
			wr.writeBytes(urlParameters.toString());
			wr.flush();
			wr.close();
		}
		
		if (method == Method.POST &amp;&amp; !uploads.isEmpty()) {

			String boundary = &quot;===&quot; + System.currentTimeMillis() + &quot;===&quot;;

			connection.setUseCaches(false);
			connection.setDoOutput(true);
			connection.setDoInput(true);

			connection.setRequestProperty(&quot;Content-Type&quot;, &quot;multipart/form-data; boundary=&quot; + boundary);
			connection.setRequestProperty(&quot;charset&quot;, &quot;UTF-8&quot;);
			
			OutputStream outputStream = connection.getOutputStream();
			PrintWriter writer = new PrintWriter(new OutputStreamWriter(outputStream, &quot;UTF-8&quot;), true);

			for (Entry&lt;String, String&gt; var : postVars.entrySet()) {

				writer.append(&quot;--&quot; + boundary).append(LINE_FEED);
		        writer.append(&quot;Content-Disposition: form-data; name=\&quot;&quot; + var.getKey() + &quot;\&quot;&quot;).append(LINE_FEED);
		        writer.append(&quot;Content-Type: text/plain; charset=UTF-8&quot;).append(LINE_FEED);
		        writer.append(LINE_FEED);
		        writer.append(var.getValue()).append(LINE_FEED);
		        writer.flush();
			}
			
			for (Entry&lt;String, File&gt; fichero : uploads.entrySet()) {
		        String fileName = fichero.getValue().getName();
		        writer.append(&quot;--&quot; + boundary).append(LINE_FEED);
		        writer.append(&quot;Content-Disposition: form-data; name=\&quot;&quot; + fichero.getKey() + &quot;\&quot;; filename=\&quot;&quot; + fileName + &quot;\&quot;&quot;).append(LINE_FEED);
		        writer.append(&quot;Content-Type: &quot; + URLConnection.guessContentTypeFromName(fileName)).append(LINE_FEED);
		        writer.append(&quot;Content-Transfer-Encoding: binary&quot;).append(LINE_FEED);
		        writer.append(LINE_FEED);
		        writer.flush();
		        
		        FileInputStream inputStream = new FileInputStream(fichero.getValue());
		        byte&#x5B;] buffer = new byte&#x5B;4096];
		        int bytesRead = -1;
		        while ((bytesRead = inputStream.read(buffer)) != -1) {
		            outputStream.write(buffer, 0, bytesRead);
		        }
		        outputStream.flush();
		        inputStream.close();
		         
		        writer.append(LINE_FEED);
		        writer.flush();   
			}
			
			writer.append(LINE_FEED).flush();
	        writer.append(&quot;--&quot; + boundary + &quot;--&quot;).append(LINE_FEED);
	        writer.close();

			postVars.clear();
			uploads.clear();
			
		}

		
	}
	
	/**
	 * Resets the browser
	 */
	public void reset() {
		url = null;
		connection = null;
		content = null;
		code = -1;
		referer = null;
		postVars.clear();
		cookies.clear();
		uploads.clear();
	}
	
	/**
	 * Get contents of last execution
	 * @return
	 */
	public String getContent() {
		return content;
	}
	
	/**
	 * Gets HTTP code of last execution
	 * @return
	 */
	public int getCode() {
		return code;
	}
	
	private static String stream2string(InputStream is) {
		String salida = &quot;&quot;;
		Scanner scanner = new Scanner(is);
		scanner.useDelimiter(&quot;\\A&quot;);
		while (scanner.hasNext()) {
			salida += scanner.next();
		}
		scanner.close();
		return salida;
	}
}
</pre>
<p>Este código depende de la librería <strong>jsoup</strong>, que naturalmente está disponible como software libre por ahí y funciona desde Maven perfectamente.</p>
<p>Por supuesto estaría encantado de poder leer cualquier mejora o crítica. ¡Comenta, comenta!</p>
<p>La entrada <a href="https://blog.krusher.net/2016/01/webcrawler-java-hoverkraft/">Webcrawler java Hoverkraft</a> se publicó primero en <a href="https://blog.krusher.net">Detrás del último no va nadie</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.krusher.net/2016/01/webcrawler-java-hoverkraft/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
