|
ERDDAP
Easier access to scientific data |
Brought to you by NOAA NMFS SWFSC ERD |
And see the related document Working with the datasets.xml File
ERDDAP is an all-open source, all-Java (servlet), web application that runs in a web application server
(for example, Tomcat). This web page is mostly for people ("ERDDAP administrators") who want to set up
their own ERDDAP installation at their own web site.
The Xmx memory setting is important because, ERDDAP can run out of memory.
The more physical memory in the server the better: 4+ GB is really good
(extra is used for swap space, which is useful), 2 GB is okay, less is not recommended.
Even with abundant physical memory, Tomcat/Java won't run if you try to set -Xmx much above 1500M.
If your server has less than 2GB of memory, reduce the -Xmx value (in 'M'egaBytes)
to 1/2 of the physical memory.
Optional: You can add "-verbose:gc" to the JAVA_OPTS. It tells Java to send
Java garbage collection
information to <tomcat>/logs/catalina.out (or some other Tomcat log file)
which is useful if your ERDDAP has memory problems.
ERDDAP has a web service so that flags can be set via URLs.
The flag system can serve as the basis for a more efficient mechanism for telling ERDDAP when to
reload a dataset. For example, you could set a dataset's <reloadEveryNMinutes> to a
large number (e.g., 10080 = 1 week).
Then, when you know the dataset has changed (perhaps because you added a file to the dataset's data
directory), set a flag so that the dataset is reloaded as soon as possible.
This is much more responsive and much more efficient than setting <reloadEveryNMinutes>
to a small number.
For a few types of datasets (notably EDDGridCopy, EDDTableCopy, EDDGridFromXxxFiles, and
EDDTableFromXxxFiles), ERDDAP stores on disk some information about the dataset that is reused
when the dataset is reloaded. This greatly speeds the reloading process.
Sitemaps are an easy way for webmasters to inform search engines about pages on theirActually, since ERDDAP is RESTful, search engine spiders can easily crawl your ERDDAP.
sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists
URLs for a site along with additional metadata about each URL (when it was last updated, how
often it usually changes, and how important it is, relative to other URLs in the site) so that
search engines can more intelligently crawl the site.Web crawlers usually discover pages from links within the site and from other sites. Sitemaps
supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap
and learn about those URLs using the associated metadata. Using the Sitemap protocol does not
guarantee that web pages are included in search engines, but provides hints for web crawlers
to do a better job of crawling your site.
This architecture puts each ERDDAP administrator in charge of determining where the data
for his/her ERDDAP comes from.
To set up the security system:
Authentication (logging in) - Currently, ERDDAP supports Custom and OpenID (recommended) authentication.
We recommend OpenID because it frees you from storing and handling user's passwords.
Remember that users often use the same password at different sites.
So they may be using the same password for your ERDDAP as they do at their bank.
That makes their password very valuable -- much more valuable to the user than the data they are requesting.
So you need to do as much as you can to keep the passwords private. That is a big responsibility.
OpenID takes care of passwords, so you don't have to gather, store, or work with them.
So you are freed from that responsibility.
This approach uses a cookie on the user's computer,
so the user's browser must be set to allow cookies.
If a user is making ERDDAP requests from a computer program (not a browser), cookies are hard
to work with. Sorry.
OpenID uses a cookie on the user's computer,
so the user's browser must be set to allow cookies.
If a user is making ERDDAP requests from a computer program (not a browser),
cookies are hard to work with. Sorry.
Secure Data Sources - If a data set is to have restricted access to ERDDAP users,
the data source (from where ERDDAP gets the data) should not be publicly accessible.
So how can ERDDAP get the data for restricted access datasets? Some options are:
But in general, currently, ERDDAP can't deal these data sources because it has no
provisions for logging on to the data source.
This is the reason why access to
EDDGridFromErddap and EDDTableFromErddap datasets
can't be restricted.
Currently, the local ERDDAP has no way to login and access the metadata information
from the remote ERDDAP.
And putting the remote ERDDAP behind your firewall and removing its dataset's
accessibleTo restrictions doesn't solve the problem:
since user requests for EDDXxxFromErddap data need to be redirected to the remote ERDDAP,
the remote ERDDAP can't be behind a firewall.
Questions? Suggestions? If you have any questions about ERDDAP's security system
or have any questions, doubts, concerns, or suggestions about how it is set up,
please email bob dot simons at noaa dot gov.
Starting with ERDDAP version 1.14, it became much less likely that a user would actually see this error.
Now, when the underlying error occurs, ERDDAP automatically internally tries to reload the dataset
and resubmit the request to the reloaded dataset.
Often this succeeds. When it does, the user will simply see that a given request took a little longer
than usual. If it fails, the user should (as the message says) wait a minute, then try again.
For many situations where you might be tempted to use parts of ERDDAP in your project,
we think you will find it much easier to install and use ERDDAP as is,
and then write other services which use ERDDAP's services.
You can set up your own ERDDAP installation crudely in an hour or two.
You can set up your own ERDDAP installation in a polished way in a few days
(depending on the number and complexity of your datasets).
But hacking out parts of ERDDAP for your own project is likely to take weeks
(and months to catch subtleties).
We (obviously) think there are many benefits to using ERDDAP as is and making your ERDDAP
installation publicly accessible.
However, in some circumstances, you might not want to make your ERDDAP installation publicly accessible.
Then, your service can access and use your private ERDDAP and your clients needn't know about ERDDAP.
Half Way - Or, there is another approach which you may find useful
which is half way between delving into ERDDAP's code and using ERDDAP as a stand-alone web service:
In the EDD class, there is a static method which lets you make an instance of a dataset
(based on the specification in datasets.xml): oneFromDatasetXml(String tDatasetID)
It returns an instance of an EDDTable or EDDGrid dataset.
Given that instance, you can call
makeNewFileForDapQuery(String userDapQuery, String dir, String fileName, String fileTypeName)
to tell the instance to make a data file, of a specific fileType, with the results from a user query.
Thus, this is a simple way to use ERDDAP's methods to request data and get a file in response,
just as a client would use the ERDDAP web application.
But this approach works within your Java program and bypasses the need for an application server
like Tomcat.
We use this approach for many of the unit tests of EDDTable and EDDGrid subclasses,
so you can see examples of this in the source code for all of those classes.
ERDDAP, Copyright 2010, NOAA.PERMISSION TO USE, COPY, MODIFY, AND DISTRIBUTE THIS SOFTWARE AND
ITS DOCUMENTATION FOR ANY PURPOSE AND WITHOUT FEE IS HEREBY GRANTED,
PROVIDED THAT THE ABOVE COPYRIGHT NOTICE APPEAR IN ALL COPIES, THAT
BOTH THE COPYRIGHT NOTICE AND THIS PERMISSION NOTICE APPEAR IN
SUPPORTING DOCUMENTATION, AND THAT REDISTRIBUTIONS OF MODIFIED FORMS
OF THE SOURCE OR BINARY CODE CARRY PROMINENT NOTICES STATING THAT THE
ORIGINAL CODE WAS CHANGED AND THE DATE OF THE CHANGE. THIS SOFTWARE
IS PROVIDED "AS IS" WITHOUT EXPRESS OR IMPLIED WARRANTY.
ERDDAP, Version 1.24
Disclaimers |
Privacy Policy