Uploaded.to download with wget

Downloading from uploaded.to/.net with premium credentials through the command line is possible using standard tools such as wget or curl. However, there is no official API and the exact method required depends on the mechanism implemented by the uploaded.to/.net website. Finding these implementation details requires a little amount reverse engineering.

Here I share a small shell script that should work on all POSIX-compliant platforms (e.g. Mac or Linux). The method is based on current behavior of the uploaded.to website. There are no special tools involved, just wget, grep, sed, mktemp.

(The solutions I found on the web did not work (anymore) and/or were suspiciously wrong.)

Usage

Copy the script content below, define username and password, and save the script as, for instance, download.sh. Then, invoke the script like so:

$ /bin/sh download.sh urls.txt

The file urls.txt should contain one uploaded.to/.net URL per line, such as in this example:

http://uploaded.net/file/98389123/foo.rar
http://uploaded.net/file/bxmsdkfm/bar.rar
http://uploaded.net/file/72asjh98/not.zip

Method

This paragraph is just for the curious ones. The script first POSTs your credentials to http://uploaded.net/io/login and stores the resulting authentication cookie in a file. This authentication cookie is then used for retrieving the website corresponding to an uploaded.to file. That website contains a temporarily valid download URL corresponding to the file. Using grep and sed, the HTML code is filtered for this URL. The payload data transfer is triggered by firing a POST request with empty body against this URL (cookie not needed). Files are downloaded to the current working directory. All intermediate data is stored in a temporary directory. That directory is automatically deleted upon script exit (no data is leaked, unless the script is terminated with SIGKILL).

The script

#!/bin/sh
# Copyright 2015 Jan-Philip Gehrcke, http://gehrcke.de
# See http://gehrcke.de/2015/03/uploaded-to-download-with-wget/
 
 
USERNAME="user"
PASSWORD="password"
 
 
if [ "$#" -ne 1 ]; then
    echo "Missing argument: URLs file (containing one URL per line)." >&2
    exit 1
fi
 
 
URLSFILE="${1}"
if [ ! -r "${URLSFILE}" ]; then
    echo "Cannot read URLs file ${URLSFILE}. Exit." >&2
    exit 1
fi
if [ ! -s "${URLSFILE}" ]; then
    echo "URLs file is empty. Exit." >&2
    exit 1
fi
 
 
TMPDIR="$(mktemp -d)"
# Install trap that removes the temporary directory recursively
# upon exit (except for when this program retrieves SIGKILL).
trap 'rm -rf "$TMPDIR"' EXIT
 
 
LOGINRESPFILE="${TMPDIR}/login.response"
LOGINOUTPUTFILE="${TMPDIR}/login.outerr"
COOKIESFILE="${TMPDIR}/login.cookies"
LOGINURL="http://uploaded.net/io/login"
 
 
echo "Temporary directory: ${TMPDIR}"
echo "Log in via POST request to ${LOGINURL}, save cookies."
wget --save-cookies=${COOKIESFILE} --server-response \
    --output-document ${LOGINRESPFILE} \
    --post-data="id=${USERNAME}&pw=${PASSWORD}" \
    ${LOGINURL} > ${LOGINOUTPUTFILE} 2>&1
 
# Status code is 200 even if login failed.
# Uploaded sends a '{"err":"User and password do not match!"}'-like response
# body in case of error.
 
echo "Verify that login response is empty."
# Response is more than 0 bytes in case of login error.
if [ -s "${LOGINRESPFILE}" ]; then
    echo "Login response larger than 0 bytes. Print response and exit." >&2
    cat "${LOGINRESPFILE}"
    exit 1
fi
 
# Zero response size does not necessarily imply successful login.
# Wget adds three commented lines to the cookies file by default, so
# set cookies should result in more than three lines in this file.
COOKIESFILELINES="$(cat ${COOKIESFILE} | wc -l)"
echo "${COOKIESFILELINES} lines in cookies file found."
if [ "${COOKIESFILELINES}" -lt "4" ]; then
    echo "Expected >3 lines in cookies file. Exit.". >&2
    exit 1
fi
 
echo "Process URLs."
# Assume that login worked. Iterate through URLs.
while read CURRENTURL; do
    if [ "x$CURRENTURL" = "x" ]; then
        # Skip empty lines.
        continue
    fi
    echo -e "\n\n"
    TMPFILE="$(mktemp --tmpdir=${TMPDIR} response.html.XXXX)"
    echo "GET ${CURRENTURL} (use auth cookie), store response."
    wget --no-verbose --load-cookies=${COOKIESFILE} \
        --output-document ${TMPFILE} ${CURRENTURL}
 
    if [ ! -s "${TMPFILE}" ]; then
        echo "No HTML response: ${TMPFILE} is zero size. Skip processing."
        continue
    fi
 
    # Extract (temporarily valid) download URL from HTML.
    LINEOFINTEREST="$(grep post ${TMPFILE} | grep action | grep uploaded)"
    # Match entire line, include space after action="bla" , replace
    # entire line with first group, which is bla.
    DLURL=$(echo $LINEOFINTEREST | sed 's/.*action="\(.\+\)" .*/\1/')
    echo "Extracted download URL: ${DLURL}"
    # This file contains account details, so delete as soon as not required
    # anymore.
    rm -f "${TMPFILE}"
    echo "POST to URL w/o data. Response is file. Get filename from header."
    # --content-disposition should extract the proper filename.
    wget --content-disposition --post-data='' "${DLURL}"
done < "${URLSFILE}"

Structured data for Google: how to add the ‘updated’ hentry field

This is a WordPress-specific post. I am using a modified TwentyTwelve theme and Google Webmaster Tools report missing structured data for all of my posts:

Missing "updated" field in the microformats hatom markup

Missing “updated” field in the microformats hatom markup.

In particular, it is the updated hentry field that seems to be missing. TwentyTwelve, like many themes, uses the microformats approach to communicate structured data to Google (also to others, it’s just that Google is a popular and important consumer of this data). How to correctly present date/time information to Google? A quote from their microdata docs:

To specify dates and times unambiguously, use the time element with the datetime attribute. […] The value in the datetime attribute is specified using the ISO date format.

And a quote from their microformats docs:

In general, microformats use the class attribute in HTML tags

It appears that we can combine the approaches. Might be dirty, but it works. So, what you might want to have in the HTML source of your blog post looks like this:

<time class="updated" datetime="2015-02-28T18:09:49+00:00" pubdate>
February 28, 2015
</time>

The value of the datetime attribute is in ISO 8601 format and not shown to the user. It should contain the point in time the article/blog post was last modified (updated). It is parsed by Google as the updated property, because of the class="updated" attribute. The string content of the time tag is what is displayed to your users (February 28, 2015 in this case). There, you usually want to display the point in time when the article was first published.

So, how do you get this into the HTML source code of all of your blog posts? A simple solution is to create a custom “byline” (that is what the author and date information string is often called in the context of WordPress themes), for instance with a PHP function like this:

function modbyline() {
    $datecreated = esc_html(get_the_date());
    $author = esc_html(get_the_author());
    $datemodifiedISO = esc_html(get_the_modified_time("c"));
    echo '<div class="bylinemod"><time class="entry-date updated" datetime="'.$datemodifiedISO.'" pubdate>'.$datecreated.'</time> &mdash; by '.$author.'</div>';
}

This creates HTML code for a custom byline, in my case rendered like so:

<div class="bylinemod">
    <time class="entry-date updated" datetime="2015-02-28T18:09:49+00:00" pubdate>
        February 28, 2015
    </time>
    &mdash; by Jan-Philip Gehrcke
</div>

The user-visible date is the article publication date, and the machine-readable datetime attribute encodes the modification time of the article. Note that WordPress’ get_the_modified_time() by default returns a date string with a human-readable default format. In order to make it machine-readable by ISO 8601 standard, you need to provide it the "c" format specifier argument (I have done this in the function above).

You want to define this custom byline function in your (child) theme’s functions.php. It should be called from within content.php.

After inclusion use Google’s structured data testing tool for validation of the approach. It should show updated entry, containing the correct date.

Google authorship feature deactivated

I just realized that the Google authorship feature (by which web content could be related to a Google+ profile) had been disabled in summer 2014. The feature was introduced not long before that and the web ecosystem followed with enthusiasm: content management systems like WordPress offered support (at least via plugins), and the SEO media response was positive. Many articles were published on the importance and usage of this feature, such as:

And then, suddenly, a posting on Google+:

[…] With this in mind, we’ve made the difficult decision to stop showing authorship in search results.

Another posting from John Mueller:

Edit: In the meantime, we’ve decided to remove authorship completely

What is left is the URL http://plus.google.com/authorship which redirects to https://support.google.com/webmasters/answer/6083347, which shows nothing but:

Authorship markup is no longer supported in web search.
To learn about what markup you can use to improve search results, visit rich snippets.

What’s left are many websites containing wasteful markup. Garbage, and it will remain for years, probably. I just deactivated my Google Author Link WordPress plugin. What a waste of time, for so many people. For the interested ones, the removal of this feature is discussed in some depth in this article.

A rational explanation for the dress color confusion and hype

Introduction

Wired reported, SPIEGEL ONLINE reported, Adobe contributed, Twitter and Facebook went crazy like hell — it’s like half of the world discusses the colors of a dress today. One thing upfront: no sensation here, at all. The level of “virality” of this story and its scientific impact are not related. Still, there are various important aspects about this topic which I want to bring into order and contribute to the discussion.

I think the most revealing insight of this development is not about color perception — it is about

  • how easily people can get biased (how many of us are not able to grasp an independent thought).
  • how many of us are not able to isolate the actual facts and to neutrally reflect them.
  • how difficult it is to grasp seemingly simple concepts.

Evidently, these points are not new. This is all human, and the natural cure to the listed problems is science. Now I want you to relax, and approach this topic scientifically, to a certain degree at least. In this context it is important to clarify: what does “scientifically” mean? Let me quote Steven D. Schafersman:

Science is a method of discovering reliable knowledge about nature. There are other methods of discovering and learning knowledge about nature, but science is the only method that results in the acquisition of reliable knowledge. Reliable knowledge is knowledge that has a high probability of being true because its veracity has been justified by a reliable method.

This is vague, but this really is the main idea. It is up to us to fulfill this criterion, and obviously this requires some knowledge about how things really work.

Wired misses the point

There are many levels of “science” that can be applied here, and for sure others tried. Wired, for instance, claim that they have taken a scientific approach, but they did not do a good (and complete) job from my point of view: they argue why people have different perceptions, and that is fine, but known. Then they discuss the colors in the image without explaining what it is that the observer of the image actually sees. Eventually, they jump into the following conclusion:

At least we can all agree on one thing: The people who see the dress as white are utterly, completely wrong.

How dare you. Point missed! Providing insights in this topic is not done by telling people how their brain should work. Providing insight is done by telling them what it actually is that enters their eyes. Beyond that boundary there is no binary distinction between right and wrong (nor between black and white, pun).

So, the Wired article on the topic does not construct a convincing chain of causality. The one thing they got right is that colors are interpreted by brain depending on the environment.

Two concepts got mixed up, constructively interfering towards a super-magic phenomenon

Colors are interpreted by the brain, that is nothing new. Don’t get me wrong, this is an impressive and super-interesting fact. From a neuroscientific point of view, there is endless deepness in this fact: this is an interface allowing to investigate how parts of human brain work. This can be considered “magic”, because we still do not know exactly how the brain works in this regard. This is concept one:

1. Colors are interpreted by brain, depending on environment. Not fully understandable, but appreciable.

And then the dress story came, suggesting the following concept to people:

2. The dress is either white/gold or blue/black. One of these solutions is correct. The correct solution can be inferred from the photograph. Some people see the correct solution, others are wrong.

This is wrong and problematic in different regards, as I will explain this further below. The dress story got so much attention, largely because these two concepts were mixed. One of the concepts obviously is right. The other concept is, unfortunately, not obviously wrong to many people. In combination, both concepts suggest even more magic than there really is. So the debate got heated up.

We do not need the Wired article cited above to explain in many words how the first concept works. It is XKCD who does the perfect job of punching this very fact right into our faces:

XKCD nails (at least a part of) it

http://xkcd.com/1492/

http://xkcd.com/1492/

The RGB colors of both dresses are the same. The effect (that the perception depends on the environment) is obvious. This was also obvious to Randall Munroe, the genius behind XKCD. And, I suppose, it was not this very obvious fact why he created the cartoon. I think the most important message of his cartoon number 1492 is, as so often, hidden in the image title on xkcd.com/1492 (seen only when keeping the mouse pointer a while on top of the image):

This white-balance illusion hit so hard because it felt like someone had been playing through the Monty Hall scenario and opened their chosen door, only to find there was unexpectedly disagreement over whether the thing they'd revealed was a goat or a car.

Please, chew on that, for a minute. Randall suggests the following scenario: What is behind the door? Car or goat, car or goat, car or goat, car or goat? Now you open the door. And you might say: oh, I am not sure whether this is it a car or a goat. And you think you now have to decide between both, because car is great, and goat is bad and everything else is not important. You have lost capability to see and realize that it is neither of both. People are so focused on seeing either of car or goat that they lose their ability to judge neutrally what they see.

This is Randall’s way of criticizing how people are asked about this very dress issue and about how media fails in attempting to resolve this issue. Media is creating a bias in this discussion, which clearly prevents a neutral analysis. In fact, what happened is a phenomenon that can be described as social self-amplification, converging towards a tremendous simplification of the problem, and towards the wrong questions being asked, eventually yielding the seemingly only viable options of white/gold vs. blue/black.

The essential ingredients for understanding the issue

Rollback. By now it is clear that we should forget about white/gold vs. blue/black, and focus on what we actually see. This might read trivial, but is the primary insight for treating this topic on a scientific level.

What we actually see is, from my “scientific” experience, built of four major components:

  • the working principle of a camera for taking a photograph
  • the flow of information from the camera towards a digital picture
  • the flow of information from the digital picture over the screen into your eyes
  • the flow of information within your brain

Hence, there are multiple levels of information and information flow involved. The actual color information in the picture is just a part of the “global picture”. However, it is the global picture which one needs to have in mind when one wants to rationally grasp WTF this is about. So, when Adobe tweeted

For those seeing #WhiteandGold in #TheDress (http://bit.ly/1APDFay ), @HopeTaylorPhoto ends the debate.

they made a fool of themselves and approached the problem at a wrong level, not looking at the entire flow of information.

The essential concepts for assessing the dress color problem, distributed over four levels of information flow, are not that complex and shall be discussed within the next few paragraphs.

The role of camera and digital modification reduces the problem

Depending on the exposure and white balance settings, a camera records very different images of any given scene. You do not need to exactly understand what these two things mean. You just need to appreciate that color and brightness as recorded by the camera might be very different from what a human being would have observed in position of the camera. Simple, right? Then, once the image has landed on a computer, it can be edited. In any way. The first two points from the list above boil down to the following conclusion: a photograph that you did not take yourself and that you did not digitally edit yourself might show anything far from what the actual, real scene looked like to humans. Still simple, so far.

I think the most important conclusion from here is: based on a random image circulating the Internet, it is pointless to discuss which color the dress really has. This might sound nitpicky to you, but in fact this relaxes the problem a lot. The important insight from here is that asking for the real color of the dress is an ill-posed problem. It is a waste of time to even think about the real color of the dress. A valid question left is: what color does the dress have in the image?

Take the colors for granted, just for a moment

I fear that my last point might still might appear a little nitpicky. But it is essential, so I’ll use a second approach: there is no point in interpreting this photograph with respect to the real lightning. The flow of information from the real scene towards the digital photograph implicates a loss of information. This is irreversible. Furthermore, the information that is contained in the image underwent transformations whose details we do not know (remember: we did not take and edit this photo by ourselves). Hence, the remaining information can not be used to make resilient conclusions about the lightning in the real scene.

The point is: if you want to approach this scientifically, forget about the real scene, really. I mean, you can do whatever you want, but if you incorporate the real scene of the dress photograph in your thought process, then you are making unsupported assumptions and leave the scientific regime.

When one follows this logic, the only valid way to answer this question (“what color does the dress have in the image?”) is to quantify the colors in the image. Quantification usually happens through “measurement”, which is an important concept in science. During a measurement, a certain quantity is determined. The measurement yields a number, and the number has a unit. Measuring should exclude experimental errors as far as possible. Naturally, a measurement transforms one type of information into another. Now, you might just use a calibrated screen and a calibrated camera and then display the image on the screen and then point the camera to the screen and then measure colors. Or you stop over-interpreting this very simple problem and realize that in this case there is no need to transform information another time. The information we are interested in (the colors in the image) are already there. In the digital image file. Just read them out. There is no point in performing another measurement.

So, now we look at the colors encoded in the digital image file. Others have done this for us. This is one of the most comprehensive results I have found (in a Reddit discussion):

dress-colors-categorized

Given the dress photograph as data source, we now have (scientifically) obtained a conclusion (you might want to name it a “fact”): The two main colors of the dress in the image are blueish and brownish (the RGB unit system is unambiguous, and its transformation to “words” at least follows a reproducible system, by neutral standards).

The last step of information flow: error-prone extrapolation

The remaining level of information flow happens within the brain of the one observing the image. Clearly, there is an interesting relation between the distribution of colors in the image (not in the scene, in the image), and human perception/imagination of the two main colors of the dress in the real scene. I appreciate that when forced to imagine how the dress might have looked like in reality, certain people argue that they think it was white/golden or blue/black. Fine with me, and I believe them: Clearly, this particular image has a weird light constellation, so that it requires a huge amount of extrapolation towards imagining the real lightning in the real scene. A huge amount of extrapolation based on bad data yields an unstable result, even when using a computer and a numerical approach. Now, this is not a computer. This is brain. Higher complexity, many more degrees of freedom and clearly a larger potential to end up with different outcomes when extrapolating. Same person, different attempts, different outcomes. Different persons, different outcomes. Really, this is not surprising, and there is not right or wrong, because the underlying data is bad.

I asked myself: which colors do I see in the picture? Honestly, “blueish” and “brownish” were the colors I saw. I actively denied to think further than that, because I knew that it does not make sense to over-interpret that picture. Because so much information was lost on the way from the real scene towards the image on my screen. Because, obviously, this is a bad photograph. You know that when you work with photographs. And when I looked at this picture with friends and we discussed which colors we really see (without thinking), everybody agreed on seeing blueish/brownish.

Wrapping up

The discussed confusion is due to a “real” quantity (something that is physically measurable) being measured under unclear conditions, manipulated to an unknown amount, and then interpreted by human brain. The interpretation may differ from person to person, depending on the question and depending on how the person trained his/her brain (without knowing) throughout life. As clarified in the article, there is no right or wrong about what people say what they see here. It is just that every honest answer to the question “Which colors does the dress have in the image” is the result of either

  • a neutral color analysis based on what is in the image (without extrapolation)
  • or a complex extrapolation thought process (subconscious or not) with the goal to identify the real scene lightning.

The latter is, as argued, an ill-posed problem and people are weighting different aspects of this thought process differently, which is why the outcome is different. Media turned this whole story into something that looks super magical, because asking whether he/she thinks if the dress is white/gold or blue/black and suggesting that only one solution is correct is very different from asking “which colors do you see?” and manipulates people.

How to set up a 64 bit version of NumPy on Windows

A short note on a complex topic. Feel free to shoot questions at me in the comments.

There are no official NumPy 64 bit builds available for Windows. In fact, 64 bit Windows is not officially supported by NumPy. So, if you are serious about your project, you need to either consider building on top of Unix-like platforms and inherit external quality assurance, or (on Windows) you need to anticipate issues of various kinds, and do extensive testing on your own. One of the reasons is that there is no adequate (open source, reliable, feature-rich) tool chain for creating proper 64 bit builds of NumPy on Windows (further references: numpy mailing list thread, Intel forums). Nevertheless, in many cases a working solution are the non-official builds provided by Christoph Gohlke, created with Intel’s commercial compiler suite. It is up to you to understand the license impacts and whether you want or can use these builds. I love to use these builds.

The following steps show a very simple way to get NumPy binaries for the AMD64 architecture installed on top of CPython 3(.4). These instructions are valid only for Python installed with an official CPython installer, obtained from python.org.

1) Install CPython for AMD64 arch

Download a 64 bit MSI installer file from python.org. The crucial step is to get an installer for the AMD64 (x86-64) architecture, usually called “Windows x86-64 MSI installer”. I have chosen python-3.4.2.amd64.msi. Run the setup.

2) Upgrade pip

Recent versions of Python 3 ship with pip, but you should use the newest version for proper wheel support. Open cmd.exe, and run

C:\> pip install pip --upgrade

Verify:

C:\> pip --version
pip 6.0.8 from C:\Python34\lib\site-packages (python 3.4)

The latter verifies that this pip i) is up-to-date, and ii) belongs to our target CPython version (multiple versions of CPython can be installed on any given system, and the correspondence between pip and a certain Python build is sometimes not obvious).

Note: The CPython installer should properly adjust your PATH environment variable so that python as well as pip entered at the command line correspond to what has been installed by the installer. It is however possible that you have somehow lost control of your environment by installing too many different things in an unreasonable order. In that case, you might have to manually adjust your PATH so that it priorizes the exetuables in C:\Python34\Scripts (or wherever you have installed your 64 bit Python version to).

3) Download wheel of NumPy build for AMD64 on Windows

Navigate to lfd.uci.edu/~gohlke/pythonlibs/#numpy and select a build for your Python version and for AMD64. I chose numpy‑1.9.2rc1+mkl‑cp34‑none‑win_amd64.whl.

4) Install the wheel via pip

On the command line, navigate to the directory where you have downloaded the wheel file to. Install it:

C:\Users\user\Desktop>pip install "numpy-1.9.2rc1+mkl-cp34-none-win_amd64.whl"
Processing c:\users\user\desktop\numpy-1.9.2rc1+mkl-cp34-none-win_amd64.whl
Installing collected packages: numpy
 
Successfully installed numpy-1.9.2rc1

The simplicity of this approach is kind of new. Actually, this simplicity is why wheels have been designed in the first place! Installing pre-built binaries with pip has not been possible with the “old” egg package format. So, older tutorials/descriptions of this kind might point to MSI installers or dubious self-extracting installers. These times are over now, and this is also is the major reason why I am writing this blog post.

5) Verify

>>> import numpy
>>> numpy.__version__
'1.9.2rc1'

Great.

Third-party Python distributions

I do not want to leave unmentioned that out there are very nice third party Python distributions (i.e. not provided by the Python Software Foundation) that include commercially supported and properly tested NumPy/SciPy builds for 64 bit Windows platforms. Most of these third party vendors have a commercial background, and dictate their own licenses with respect to the usage of their Python distribution. For non-commercial purposes, most of them can be used for free. The following distributions provide a working solution:

All of these three distributions are recommendable from a technical point of view (I cannot tell whether their license models / restrictions are an issue for you or not). They all come as 64 bit builds. I am not entirely sure if Enthought and ActiveState build NumPy against Intel’s Math Kernel Library. In case of Anaconda, this definitely is not the case in the free version — this is something that can be explicitly obtained, for 29 $ (it’s called the “MKL Optimizations” package).