Network Programming :: Lessons :: URLConnections
Opening URLConnections
URLConnection is an abstract class is Java that represents an active connection to a URL resource. It provides more control over the server connection than the URL class and can inspect the header sent by the server. It can also send data back to the server using POST or PUT.
The URLConnection class is part of Java's protocol handler system, which separates the details of handling a protocol from processing particular data types. To open a URLConnection you must follow these steps:
- Construct a URL object.
- Call the openConnection() method on the URL object to retrieve a URLConnection object.
- Configure the URLConnection.
- Read the header fields.
- Get an input stream and read data.
- Get an output stream and write data.
- Close the connection.
Although URLConnection is abstract, all but one of its methods are implemented. The connect() method is not implemented and when a URLConnection is first constructed so the local and remote host cannot send and receive data. The connect() method is used to establish this connection. You rarely need to call connect() directly since most methods that require an open connection will call it on their own.
Reading Data
The following steps should be following to retrieve data from a URL using a URLConnection object:
- Construct a URL object.
- Call the openConnection() method on the URL object to retrieve a URLConnection object.
- Call the URLConnection's getInputStream() method.
- Read from the input stream.
The following example would download a web page:
import java.io.*; import java.net.*; public class ViewSource { public static void main (String[] args) { if (args.length > 0 { try { URL url = new URL(args[0]); URLConnection urlC = url.openConnection(); try (InputStream input = urlC.getInputStream() { InputStream buffer = new BufferedInputStream(input); Reader reader = new InputStreamReader(buffer); int c; while ((c = reader.read()) != -1) { System.out.print((char)c); } } } catch (MalformedURLException ex) { System.err.println(args[0] + " is not a parseable URL."); } catch (IOException ex) { System.err.println(ex); } } } }
Below is an example of an HTTP header returned from an Apache web server:
HTTP/1.1 200 OK Date: Tue, 20 Sep 2016 13:21:21 GMT Server: Apache Expires: Tue, 20 Sep 2016 13:30:33 GMT Content-Encoding: gzip Content-Length: 3192 Connection: Keep-Alive Content-Type: text/html; charset=UTF-8
There are a number of methods to get information from a header. The following methods request specific, common fields from the header:
public String getContentType() public int getContentLength() public String getContentEncoding() public long getDate() public long getExpiration() public long getLastModified()
The getContentType() method returns the MIME media type of the response body. It returns null if the content is not available. The above example would return "text/html; charset=UTF-8." "text/html" is the most common content type you will encounter on web servers and UTF-8 is the character encoding. The default character encoding is ISO-8859-1 for HTTP.
The getContentLength() method returns the number of bytes in the content. It returns -1 if there is no Content-Length header. There is also a getContentLengthLong() method that you can use if you think the content may exceed the maximum integer size. The above example has a content length of 3192.
The getContentEncoding() method returns a String that tells you how the content is encoded. It returns null if the content is unencoded. The content above is encoded using GZip, which can be decoded using the GZipInputStream.
The getDate() method returns a long that specifies when the document was sent. The date is the milliseconds since midnight on January 1, 1970 (GMT). The above example would return 1474377681000.
The getExpiration() method indicates when the document should be deleted from the cache and reloaded. It returns a long just like the getDate() method. The method returns 0 if the method has no expiration. The above example would return 1474378233000.
The getLastModified() method also return a long indicating a date. This date is the point when the resource was last modified. If the header does not exist, as in the example above, the method returns 0.
The above methods can retrieve specific header fields, but the methods below can retrieve ANY header field:
public String getHeaderField(String name) public String getHeaderFieldKey(int n) public String getHeaderField(int n) public long getHeaderFieldDate(String name, long default) public int getHeaderFieldInt(String name, int default)
The getHeaderField(String name) method returns the value of the named header field. The name is not case sensitive. In the example above you can get the server type using the following code:
String server = urlC.getHeaderField("server");
The getHeaderFieldKey(int n) method returns the field name of the nth header field. The request method is considered header 0, so the code below would return "Server" since the Server header is the 2nd header:
String header2 = urlC.getHeaderFieldKey(2);
The getHeaderField(int n) method gets the value of the nth header field. So the example below would return "Apache:"
String header2 = urlC.getHeaderField(2);
The getHeaderFieldDate(String name, long default) and getHeaderFieldInt(String name, int default) try to retrieve the specified field name and convert that field to a time-based long or an int. If the method cannot parse the field the default value is returned.
Caches
Web browsers can caches pages and other resources to make them quicker to access in the future. By default, the assumption is that a page accessed with GET over HTTP should be cached while a page accessed with HTTPS or POST should not be cachced. HTTP header can adjust the following:
- The Expires header that indicates it is okay to cache this resource until the specified time.
- The Cache-Control header allow you to modify the following:
- Max-Age: Number of seconds from now before the cached resource should expire.
- s-Mageage: Number of seconds from now before the cached resource should expire from a shared cache.
- Public: It is okay to cache an authenticated response.
- Private: Only single user caches should Store the response. Shared caches should not share the response.
- No-Cache: The resource may still be cached, but the client should reverifty the state with an ETag or Last-modified header on each access.
- No-Store: Do not cache the resource.
- The Last-Modified header is the date when the resource was last changed.
- The ETag header is a unique identifier for the resource that changes when the resource does.
Java does not cache anything by default. To create a system-wide cache of the URL class you need the following:
- A concrete subclass of ResponseCache
- A concrete subclass of CacheRequest
- A concrete subclass of CacheResponse
Once a cache is installed whenever the system tries to load a new URL, it will first look for it in the cache. The following is an example of a concrete CacheRequest subclass:
import java.io.*; import java.net.*; public class ConcreteCacheRequest extends CacheRequest { private ByteArrayOutputStream out = new ByteArrayOutputStream(); @Override public OutputStream getBody() throws IOException { return out; } @Override public void abort() { out.reset(); } public byte[] getData() { if (out.size() == 0) return null; else return out.toByteArray(); } }
Below is an example of a concrete CacheResponse subclass:
import java.io.*; import java.net.*; import java.util.*; public class ConcreteCacheResponse extends CacheResponse { private final Map<String, List<String>> header; private final ConcreteCacheRequest request; private final Date expires; private final CacheControl control; public ConcreteCacheResponse(ConcreteCacheRequest request, URLConnection url, CacheControl control) throws IOException { this.request = request; this.control = control; this.expires = new Date(url.getExpiration()); this.headers = Collection.unmodiableMap(url.getHeaderFields()); } @Override public InputStream getBody() { return new ByteArrayInputStream(request.getData()); } @Override public Map<String, List<String>> getHeaders() throws IOException { return headers; } public CacheControl getControl { return control; } public boolean isExpired() { Date now = new Date(); if (control.getMaxAge().before(now)) return true; else if (expires != null && control.getMaxAge() != null) return expires.before(now); else return false; } }
Finally, you need a concrete ResponseCache subclass. The example below is suitable for a single-user, private cache.
import java.io.*; import java.net.*; import java.util.*; import java.util.concurrent.*; public class MemoryCache extends ResponseCache { private final Map<URI, ConcreteCacheResponse> response = new ConcurrentHashMap<URI, ConcreteCacheResponse>(); private final int maxEntries; public MemoryCache() { this(100); } public MemoryCache(int maxEntires) { this.maxEntries = maxEntires; } @Override public CacheRequest put(URI uri, URLConncection urlC) throws IOException { if (responses.size() >= maxEntries) return null; CacheControl control = new CacheControl(urlC.getHeaderField("Cache-Control")); if (control.noStore()) return null; else if (!urlC.getHeaderField(0).startsWith("GET ")) return null; ConcreteCacheRequest request = new ConcreteCacheRequest(); ConcreteCacheResponse response = new ConcreteCacheResponse(request, urlC, control); responses.put(uri, response); return request; } @Override public CacheResponse get(URI uri, String requestMethod, Map<String, List<String>> requestHeaders) throws IOException { if ("GET".equals(requestMethod)) { ConcreteCacheResponse response = responses.get(uri); if (response != null && response.isExpired()) { responses.remove(response); response = null; } return response; } else return null; } }
Configuring URLConnections
The URLConnection class has seven protected variables that define how the client makes a request to the server:
protected URL url; protected boolean doInput = true; protected boolean doOutput = false; protected boolean allowUserInteraction = defaultAllowUserInteraction; protected boolean useCaches = defaultUseCaches; protected long ifModifiedSince = 0; protected boolean connected = false;
The url variable specifies the URL that this URLConnection connects to. It can be accessed using the getURL() accessor.
The connected variable is true if the connection is open and false if it is closed. This variable cannot be changed directly in non-subclasses of URLConnection.
The allowUserInteraction variable is true if there is a user present to interact with the program, which cannot always be assummed. This can be read or set using the getAllowUserInteraction() accessor and setAllowUserInteraction(boolean allowUserInteraction) mutator. If a user isn't present you should avoid showing dialog boxes or doing anything else that requires user interaction.
The doInput variable is true if the URLConnection can be used for reading and false if it can't be used for reading. The getDoInput() accessor can be used to read the variable and the setDoInput(boolean doInput) mutator can be used to set the variable.
Similarly, the doOutput variable is true if a URLConnection can send output back to the server and false if it can't. Setting doOutput to true changes the request method from GET to POST. There is a setDoOutput(boolean doOutput) mutator and a getDoOutput() accessor.
The ifModifiedSince variable stores the date the specified resource was last modified. The getIfModifiedSince() accessor and setIfModifiedSince(long ifModifiedSince) mutator can be used to read from or write to the variable.
Finally, the useCaches variable is true if a cache should be used if it is available and false if caches should not be used. This can be set using the setUseCaches(boolean useCaches) mutator and read using the getUseCaches() accessor.
There are also four methods that can be used to read/set the timeout time for connecting or reading from a URL.
public void setConnectTimeout(int timeout) public int getConnectTimeout() public void setReadTimeout(int timeout) public int getReadTimeout()
The setRequestProperty() method can be used to add a header to the HTTP header before a connection is opened.
public void setRequestProperty(String name, String value)
Writing Data to a Server
Sometimes you need to write data to a URLConnection, such as when you submit a form to a web server using POST or upload a file using PUT. The getOutputStream() method returns an OutputStream you can use to write data for transmission to a server. You have to call setDoOutput(true) since output is not allowed by default, which changes the request method from GET to POST. The example below shows a buffered OutputStream:
try { URL url = new URL("http://www.yhcs.us/uploads"); URLConnection urlC = url.openConnection(); urlC.setDoOutput(true); OutputStream stream = urlC.getOutputStream(); OutputStream buffer = new BufferedOutputStream(stream); OutputStreamWriter out = new OutputStreamWriter(buffer, "8859_1"); out.write("file=stuff.zip\r\n"); out.flush(); out.close(); } catch (IOException ex) { System.err.println(ex); }