Jump to content


SnakEyez

Member Since 11 Dec 2010
Offline Last Active Mar 05 2014 08:24 PM

Posts I've Made

In Topic: Can't get robots.txt Disallow to work

12 July 2013 - 04:56 PM

It would.  I don't have it installed on my account at the moment because the domain I had it on is under a major revamp at the moment.  But what I did was create a folder called "downloads" above the public_html or www folder on my site.  Then the script I was using used a root-relative path to get the files for people who were logged in.  Because the folder is above public_html/www Google will never index it.  And because the link lies within a member-secured login area, the pages with links were never accessible to the public.

 

Think of it this way.  Let's take a Windows computer.  You want to run a server from your local computer.  If you install something like Apache/IIS or a pre-made package like WAMP, only the folder you specify is web-accessible.  So think of testing using the address "http://localhost".  You could create a link to that file as in "C:\Desktop\Myfile.jpg".  The trick is here is that to the web application, C:\ is a perfectly valid protocol and will link to a real file.  But if someone tries to use the address elsewhere it wouldn't work.  It's a crude example, but I think it clarifies the point.  I had used IP.Downloads before which is an extension of the forum software that MDDHosting uses, but there are other download management web applications that do this for you so that files are never web-facing.  


In Topic: Can't get robots.txt Disallow to work

07 July 2013 - 07:41 PM

Arpeggio, speaking from experience, if you don't want files to be found, don't put them on a public facing domain.  Robots.txt is a suggestion, not a rule or a law that must be followed.  Most scripts out there for downloads will hide them above your root folder, or make files only accessible from within a network and not via a domain (ie: so only a server on the web servers domain can be accessed).  Here's some documents on the topic:

 

Why did this robot ignore my robots.txt file?

Can I block bad robots?

Can robots.txt be used in a court of law?