17. 3. 2011  |  20044 Views  |   Trackback link  |  0 komentarjev

MnogoSearch optimization example

tt_news

Mnogosearch plugin's manual suggests that tt_news articles should probably be indexed from the database. This is true for sites where you have only one single page or you don't mind that all search results point to the same single view page. If we wish that search results take into account the Single view page that is based on the first news category of the article or you have any other "more complex" tt_news setup, we can still index tt_news articles as normal pages.

If we use Facebook Like or any other social networking tools to spread the word about our articles, each article should have only one Single View page (unicate URL). Keeping the URLs for the same content as unicate as possible seams reasonable. This is also a good practice from the search engine's point of view, e.g., you get more "points" from Google, although, it can be guided with canonical tag. But, that way we are dealing with the consequences and not the source of the "problem", so it should be avoided, if possible (IMHO).

When indexing news, we wish to exclude content elements that would result in "false positives" during the search. Sources of this nature will depend on your information infrastructure. In most cases this will include titles and abstracts of articles in the List views that are used as teasers on various pages around the site. We wish that mnoGoSearch's results point only to Single view of articles that contain search term.

The quick list:

  1. After the initial run, exclude Single view pages from search. Go to page properties and select "Disable" in the "Include in Search" option in the "Behaviour" tab. If search results return two hits for each article, one with the title of the article and one with the title of the Single view page, you should recheck if you have excluded this page from search.
  2. Use the <!--TYPO3SEARCH_begin--> and <!--TYPO3SEARCH_end--> to mark parts of the page that should be indexed.

If you exclude Single view pages before the initial run, Single view pages of your articles will not get indexed. The first point is kind of a hack, I suppose. Keep it in mind, since it might couose problems if you clear the mnoGoSearch tables with URLs. For the second point we are using principle to index everything that is not excluded. Therefore, we begin the content on the page with and we put at the end. Next, we exclude the parts that we do not want to index. Take care that markers are not nested. Here is an example for the List view template:

TypoScript 
  1. <!-- ###TEMPLATE_LIST### begin
  2.  This is the template for the list of news-->
  3.     <!--TYPO3SEARCH_end-->
  4.       <div class="news_list">
  5.         <!-- ###CONTENT### begin
  6.          This is the part with the list of news:-->
  7.           <!-- ###NEWS### begin
  8.            Template for a single item-->
  9.             <div class="news_item">
  10.               <div class="date">###NEWS_DATE###</div>
  11.               <div class="title"><!--###LINK_ITEM###-->
  12.                  ###NEWS_TITLE###<!--###LINK_ITEM###-->
  13.               </div>
  14.             </div>
  15.             <!-- ###NEWS### end-->
  16.         <!-- ###CONTENT###  end -->
  17.       </div>
  18.     <!--TYPO3SEARCH_begin-->
  19. <!-- ###TEMPLATE_LIST### end -->
<!-- ###TEMPLATE_LIST### begin
  This is the template for the list of news-->
    <!--TYPO3SEARCH_end-->
      <div class="news_list">
        <!-- ###CONTENT### begin
          This is the part with the list of news:-->
          <!-- ###NEWS### begin
            Template for a single item-->
            <div class="news_item">
              <div class="date">###NEWS_DATE###</div>
              <div class="title"><!--###LINK_ITEM###-->
                 ###NEWS_TITLE###<!--###LINK_ITEM###-->
              </div>
            </div>
            <!-- ###NEWS### end-->
        <!-- ###CONTENT###  end -->
      </div>
    <!--TYPO3SEARCH_begin-->
<!-- ###TEMPLATE_LIST### end -->

More on specification of the web space for indexing ... Perhaps, I should mention that mnoGoSearch was cought in the loop during the setup when a dozen rules of the type Realm and Comparison type String were used. When I optimized rules and used regular expressions, the loop was gone. I did try to manually run the indexer, but did not perform any deeper research of the problem, since final setup was working for me (at this time ;-).

t3blog

Mark unwanted content

This is done with the use of the markers and . We decided to exclude all lists with the following TypoScript:

TypoScript 
  1. plugin.tx_t3blog_pi1 {
  2.   views {
  3.     list.10.wrap = <!--TYPO3SEARCH_end--> <h2>|</h2> <!--TYPO3SEARCH_begin-->
  4.     list.20.wrap = <!--TYPO3SEARCH_end--> <div class="news_list"> | </div> <!--TYPO3SEARCH_begin-->
  5.   } 
  6.   blogList {
  7.     singleNavigation.wrap = <!--TYPO3SEARCH_end--> <div id="singleNavigation">|</div> <!--TYPO3SEARCH_begin-->
  8.   }
  9.   archive {  
  10.     listWrap.10.dataWrap = <!--TYPO3SEARCH_end--> <ul id="archive_{field:id}" class="{field:class}"> |  </ul> <!--TYPO3SEARCH_begin-->
  11.   }
  12.   latestCommentsNav {
  13.     list.10.wrap = <!--TYPO3SEARCH_end--><h2>|</h2><!--TYPO3SEARCH_begin-->
  14.     list.20.wrap = <!--TYPO3SEARCH_end--><div class="news_list">|</div><!--TYPO3SEARCH_begin-->
  15.   }
  16. } 
plugin.tx_t3blog_pi1 {
  views {
    list.10.wrap = <!--TYPO3SEARCH_end--> <h2>|</h2> <!--TYPO3SEARCH_begin-->
    list.20.wrap = <!--TYPO3SEARCH_end--> <div class="news_list"> | </div> <!--TYPO3SEARCH_begin-->
  } 
  blogList {
    singleNavigation.wrap = <!--TYPO3SEARCH_end--> <div id="singleNavigation">|</div> <!--TYPO3SEARCH_begin-->
  }
  archive {  
    listWrap.10.dataWrap = <!--TYPO3SEARCH_end--> <ul id="archive_{field:id}" class="{field:class}"> |  </ul> <!--TYPO3SEARCH_begin-->
  }
  latestCommentsNav {
    list.10.wrap = <!--TYPO3SEARCH_end--><h2>|</h2><!--TYPO3SEARCH_begin-->
    list.20.wrap = <!--TYPO3SEARCH_end--><div class="news_list">|</div><!--TYPO3SEARCH_begin--> 
  }
} 

Remove date form the URLs

We have noticed that links in a "singleNavigation" section are using the date of current post. Consequently, each posts exists at all dates that we have prepared a post. This is probably just a glich, since snowflake's blog renders this links correctly. We did not investigete this matter at all, since we wanted to remove dates from the URL. Hmm, perhaps this shouldn't be part of this post. 

You can remove that part of the URL with the following TypoScript in your plugin.tx_t3blog_pi1definition (see also "Customizing T3blog"):

 
  1. plugin.tx_t3blog_pi1 {
  2.   blogList {
  3.     titleLink.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
  4.     single.moreLink.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
  5.     textRow.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
  6.     commentsLink.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
  7.     singleNavTitleLink.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
  8.     comment.30.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:blogUid}&tx_t3blog_pi1[blogList][editCommentUid]={field:uid}
  9.   }
  10.   views {
  11.     list.30.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
  12.     link.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
  13.   }
  14.   archive {
  15.     titleLink.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
  16.   }
  17.   latestCommentsNav {
  18.     link.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
  19.   }
  20.   latestPostNav {
  21.     list.30.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
  22.     link.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
  23.   }
  24. }
  25.  
plugin.tx_t3blog_pi1 {
  blogList {
    titleLink.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
    single.moreLink.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
    textRow.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
    commentsLink.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
    singleNavTitleLink.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
    comment.30.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:blogUid}&tx_t3blog_pi1[blogList][editCommentUid]={field:uid}
  }
  views {
    list.30.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
    link.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
  }
  archive {
    titleLink.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
  }
  latestCommentsNav {
    link.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
  }
  latestPostNav {
    list.30.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
    link.10.typolink.additionalParams.dataWrap = &tx_t3blog_pi1[blogList][showUid]={field:uid}
  }
}

Unfortunatelly, we also had to change
typo3conf/ext/t3blog/pi1/widgets/blogList/class.blogList.php.
Find function getTrackbackLink and do some commenting like this:

 
  1. $trackBackParameters = t3lib_div::implodeArrayForUrl('tx_t3blog_pi1', array(
  2.   'blogList' => array(
  3.     /* 'day' => sprintf('%02d', $dateInfo['mday']),
  4.         'month' => sprintf('%02d', $dateInfo['mon']),
  5.         'year' => $dateInfo['year'],*/
  6.     'showUid' => $uid,
  7.     'trackback' => 1
  8.    )
  9. ));
$trackBackParameters = t3lib_div::implodeArrayForUrl('tx_t3blog_pi1', array(
  'blogList' => array(
    /* 'day' => sprintf('%02d', $dateInfo['mday']),
        'month' => sprintf('%02d', $dateInfo['mon']),
        'year' => $dateInfo['year'],*/
    'showUid' => $uid,
    'trackback' => 1
   )
));

If you are using permalink, you should comment function getPermalink in the same manner. The request to move this to the TypoScript have already been published on the forge.typo3.org. You can find some additional discussion there. My opinion is that URLs should be unicate as long as they are not limiting some "important" functionality.

Calendar

Additional source of duplicated content is calendar. When user browses through calendar, URL changes, but the content is the same. Since we are not using social bookmarking on List view, the duplicated content is not considered so problematic if we take into account additional functionality that is provided. 

For indexing purposes, this can be overcomed with slighlty different RealURL configuration that enables us to limit indexing only on Single view pages. We have moved the translation of date outside the 'blog post'. I have left the definitions in Slovenian language, so you can learn something new and at the same time check the URLs on our page.

TypoScript 
  1. 'datum' => array(
  2.     'leto' => array(
  3.        'GETvar' => 'tx_t3blog_pi1[blogList][year]',
  4.      ),
  5.      'mesec' => array(
  6.         'GETvar' => 'tx_t3blog_pi1[blogList][month]' ,
  7.      ),
  8.      'dan' => array(
  9.         'GETvar' => 'tx_t3blog_pi1[blogList][day]',
  10.      ),
  11. ),
  12. 'zapis' => array(
  13.   'zapis' => array (
  14.     'GETvar' => 'tx_t3blog_pi1[blogList][showUid]',
  15.     'lookUpTable' => array(
  16.       'table' => 'tx_t3blog_post',
  17.       'id_field' => 'uid',
  18.       'alias_field' => 'uid',
  19.       'addWhereClause' => ' AND deleted !=1 AND hidden !=1',
  20.       'useUniqueCache' => 1,
  21.       'useUniqueCache_conf' => array(
  22.         'strtolower' => 1,
  23.         'spaceCharacter' => '-',
  24.       )
  25.     )
  26.   )
  27. ),
'datum' => array(
    'leto' => array(
       'GETvar' => 'tx_t3blog_pi1[blogList][year]',
     ),
     'mesec' => array(
        'GETvar' => 'tx_t3blog_pi1[blogList][month]' ,
     ),
     'dan' => array(
        'GETvar' => 'tx_t3blog_pi1[blogList][day]',
     ),
),
'zapis' => array(
  'zapis' => array (
    'GETvar' => 'tx_t3blog_pi1[blogList][showUid]',
    'lookUpTable' => array(
      'table' => 'tx_t3blog_post',
      'id_field' => 'uid',
      'alias_field' => 'uid',
      'addWhereClause' => ' AND deleted !=1 AND hidden !=1',
      'useUniqueCache' => 1,
      'useUniqueCache_conf' => array(
        'strtolower' => 1,
        'spaceCharacter' => '-',
      )
    )
  )
),

Now, clear the configuration cache and add following rules to the Mngogosearch configuration:

Configuration type Indexing path Indexing method Comparison type Description
Realm */blog/zapis/* Allow String Allow indexing of Single view pages.
Realm *blog* Disallow String Disallow everything else.

The first record should be listed before the second record in the list view of your mnogoSearch indexing configuration.

Probably, it would be wise to define canonical for List view, but I am not sure if this is really neccessary, since algorithms of search engines can handle such cases, see Google Webmaster: Specify your canonical.

back

« March 2024»
S M T W T F S
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31            

Archive

  • 2012(1)
    • March(1)
  • 2011(14)
    • October(2)
    • September(2)
    • June(2)
    • May(2)
    • April(1)
    • March(3)
    • February(2)
  • 2004(3)
    • April(3)
  • 2003(1)
    • April(1)
  • 2002(1)
    • February(1)
  • 2001(1)
    • April(1)