Thursday, August 1, 2013

Amazon's CloudSearch: Implementation using C#.Net (dotnet)

Amazon's CloudSearch: Implementation using C#.Net (dotnet)

 After working with DTSearch for long time, we switched over to Amazon's Cloud service. The factor for shifting was just the auto scalability feature in Amazon's Cloud Search as we need to worry for scalability issues with DTSearch.

 What is Amazon CloudSearch?

Amazon CloudSearch is a fully-managed service in the AWS Cloud that makes it easy to set up, manage, and scale a search solution for your website or application. Amazon CloudSearch enables you to search large collections of data such as web pages, document files, forum posts, or product information. Built for high throughput and low latency, Amazon CloudSearch supports a rich set of features including free text search, faceted search, customizable relevance ranking, configurable search fields, text processing options, and near real-time indexing.

To use Amazon CloudSearch, you simply:
1. Create a search domain
2. Configure your search fields
3. Upload your data for indexing, and
4. Submit search requests from your website or application

You can get more details at : http://aws.amazon.com/cloudsearch/

You can use Amazon CloudSearch to index and search both structured data and plain text. Amazon CloudSearch supports full text search, searching within fields, prefix searches, Boolean searches, and faceting. You can get search results in JSON or XML, sort and filter results based on field values, and rank results alphabetically, numerically, or according to custom rank expressions.

To build a search solution with Amazon CloudSearch, you:

1. Create and configure a search domain. A search domain encapsulates your searchable data and the search instances that handle your search requests. You set up a separate domain for each different data set you want to search.

2. Upload the data you want to search to your domain. Amazon CloudSearch automatically indexes your data and deploys the search index to one or more search instances.

3. Search your domain. You send a search request to your domain's search endpoint as an HTTP/HTTPS
GET request.

You can use Amazon's CloudSearch developer guide to know more:
Http://docs.aws.amazon.com/cloudsearch/latest/developerguide/SvcIntro.html

Amazon does not support the .net implementation for CloudSearch in its .net SDK. Here is the c# code you can use to get Amazon Cloudsearch implemented to post and get search requests from search indexes

namespace AmazonCloudSearch
{
    /// <summary>
    /// Data Formats in Cloud Search
    /// </summary>
    public enum DataFormat
    {
        Json,Xml
    }

    public class CloudSearch
    {

        /// <summary>
        /// Used to Add/Update/Delete Document batch data to Amazon Cloud Search
        /// </summary>
        /// <param name="JsonFormat"></param>
        /// <param name="DocumentUri"></param>
        /// <param name="ApiVersion"></param>
        /// <returns>Response Status</returns>    
        public string PostDocuments(string JsonFormat, string DocumentUri, string ApiVersion, DataFormat format)
        {
            try
            {
                var request = (HttpWebRequest)
                WebRequest.Create(string.Format("http://{0}{1}batch", DocumentUri, ApiVersion));
                request.ProtocolVersion = HttpVersion.Version11;
                request.Method = "POST";
                byte[] postBytes = Encoding.UTF8.GetBytes(JsonFormat);              
                request.ContentLength = postBytes.Length;
                request.Accept = "application/json";
                request.ContentType = format == DataFormat.Xml ? "text/xml;charset=utf-8" : "application/json;charset=utf-8";
                var requestStream = request.GetRequestStream();
                requestStream.Write(postBytes, 0, postBytes.Length);
                requestStream.Close();
                HttpWebResponse response = null;
                response = (HttpWebResponse)request.GetResponse();
                var retVal = new StreamReader(stream: response.GetResponseStream()).ReadToEnd();
                var statusCode = response.StatusCode;
                return statusCode + retVal;
            }
            catch (WebException Ex)
            {
                throw Ex;
            }
            catch (Exception ex)
            {
                throw ex;
            }
        }


        /// <summary>
        /// Used to make search Request to Amazon Cloud Search
        /// </summary>
        /// <param name="SearchUri"></param>
        /// <param name="SearchQuery"></param>
        /// <param name="ReturnFields"></param>
        /// <param name="PageSize"></param>
        /// <param name="Start"></param>
        /// <returns>Response Status</returns>      
        public string SearchRquest(string SearchUri, string SearchQuery, string ReturnFields, int PageSize, int Start, DataFormat format)
        {
            try
            {
                string SearchUrl = SearchUri + SearchQuery + "&return-fields=" + ReturnFields + "&start=" + Start + "&size=" + PageSize;
                if (format == DataFormat.Xml)
                    SearchUrl = SearchUrl + "&results-type=xml";

                string responseFromServer = string.Empty;
                // Create a request for the URL.
                WebRequest request = WebRequest.Create(SearchUrl);
                // If required by the server, set the credentials.
                request.Credentials = CredentialCache.DefaultCredentials;
                request.ContentType = "application/json";
                // Get the response.
                WebResponse response = request.GetResponse();
                // Get the stream containing content returned by the server.
                Stream dataStream = response.GetResponseStream();
                // Open the stream using a StreamReader for easy access.
                StreamReader reader = new StreamReader(dataStream);
                // Read the content.
                responseFromServer = reader.ReadToEnd();
                // Clean up the streams and the response.
                reader.Close();
                response.Close();
                //returns response from the server
                return responseFromServer;
            }
            catch (WebException Ex)
            {
                throw Ex;
            }
            catch (Exception ex)
            {
                throw ex;
            }
        }
    }
}

Here is the code snippet we can use to use the above class in code:

AmazonCloudSearch.CloudSearch ObjCloudSearch = new AmazonCloudSearch.CloudSearch();
string strResponse = string.Empty;
if (IsSortByViews)
   strQuery = strQuery + "&rank=-vw";
else if (isSortByLatest)
   strQuery = strQuery + "&rank=-pid";
//Call Amazon Cloud Search API Service -- views and contentid are the indexed fields
strResponse = ObjCloudSearch.SearchRquest(configSection.SearchUri, strQuery, strReturnFields, p_intPageSize, p_intStart, AmazonCloudSearch.DataFormat.Json);


 You can create strQuery using the search criteria using the syntax as given in developer guide.