Can't get robots.txt Disallow to work

Arpeggio · July 2, 2013

I have 2 sites under shared hosting with MDD. One of them has downloads only meant for people who have purchased a product (domain name is given with it). To block this site entirely I have placed a robots.txt file into its root directory (public_html), the file is as follows...

User-agent: *Disallow: /

In my Google webmaster account when testing under Blocked URLs it says everything is allowed and "robots.txt file does not appear to be valid". I have followed the google guidelines https://support.google.com/webmasters/answer/156449?hl=en

No idea what I have missed. Anyone know what the problem could be? Thanks.

Michael D. · July 4, 2013

Unfortunately this sort of issue is near impossible to diagnose without more details. That said - have you tried loading the robots.txt via your browser to make sure it's permissions are set correctly and it's able to be loaded?

Arpeggio · July 7, 2013

H Mike. Sorry for the late reply. It's fine now, it seemed to be delayed response in my webmaster tools. I went into there later on and that saw it said that site had crawl errors and that the robots where functioning.

SnakEyez · July 8, 2013

Arpeggio, speaking from experience, if you don't want files to be found, don't put them on a public facing domain. Robots.txt is a suggestion, not a rule or a law that must be followed. Most scripts out there for downloads will hide them above your root folder, or make files only accessible from within a network and not via a domain (ie: so only a server on the web servers domain can be accessed). Here's some documents on the topic:

Why did this robot ignore my robots.txt file?

Can I block bad robots?

Can robots.txt be used in a court of law?

Arpeggio · July 11, 2013

Hi SnakEyez. The files have to be accesible to people who have bought a copy of a book or eBook, they are music method books and include audio as downloadable mp3, the link to which is only in copies of the book / eBook. I'm not sure what you are suggesting would apply to that.

SnakEyez · July 12, 2013

It would. I don't have it installed on my account at the moment because the domain I had it on is under a major revamp at the moment. But what I did was create a folder called "downloads" above the public_html or www folder on my site. Then the script I was using used a root-relative path to get the files for people who were logged in. Because the folder is above public_html/www Google will never index it. And because the link lies within a member-secured login area, the pages with links were never accessible to the public.

Think of it this way. Let's take a Windows computer. You want to run a server from your local computer. If you install something like Apache/IIS or a pre-made package like WAMP, only the folder you specify is web-accessible. So think of testing using the address "http://localhost". You could create a link to that file as in "C:\Desktop\Myfile.jpg". The trick is here is that to the web application, C:\ is a perfectly valid protocol and will link to a real file. But if someone tries to use the address elsewhere it wouldn't work. It's a crude example, but I think it clarifies the point. I had used IP.Downloads before which is an extension of the forum software that MDDHosting uses, but there are other download management web applications that do this for you so that files are never web-facing.

Sign In

Can't get robots.txt Disallow to work

Recommended Posts

Arpeggio

Link to comment

Share on other sites

Michael D.

Link to comment

Share on other sites

Arpeggio

Link to comment

Share on other sites

SnakEyez

Link to comment

Share on other sites

Arpeggio

Link to comment

Share on other sites

SnakEyez

Link to comment

Share on other sites

Join the conversation

Browse

Activity