Wednesday, March 23, 2011

Alerts Not Working SharePoint 2007

Over the past several months I have written a couple of solutions that rely on SharePoint Alerts in order to let individuals in the organization know when activities occur that affect them.

Because these solutions utilize the SharePoint's Alerts mechanism, it is vitally important that the Alert infrastructure functions properly, however this is rarely the case in our Farm.

Our topology is a Medium Server Farm with 2 WFEs, 1 App Server, and a SQL cluster for the databases.

The issue that I have been dealing with is that for an as yet undetermined reason, the server that has established the TimerLock for a particular Site Collection's content database will no longer be able to send emails.

I am able to identify the server which has the TimerLock by running the following SQL command against the content database for the site collection containing the list on which alerts are set:

USE content_database

SELECT * FROM timerlock WITH (nolock)

These are Immediate Alerts that I am working with so in order to see if they get queued up, I query the eventcache table, again in the content database for the site collection containing the list on which alerts are set:

USE content_database
SELECT * FROM eventcache WITH (nolock) WHERE EventData is not null
My observations show that when the Timer Job runs (owstimer.exe), these events are processed and subsequently removed from the eventcache table leading me to believe that everything is working fine.

I've had my network security guy take a look at the firewall traffic, and he can see that traffic from the SharePoint server with the TimerLock to the SMTP server makes it through the firewall without issue, however no email is ever received for the alerts. It should be noted that at the same time that this server fails to send email, the other WFE may have a TimerLock for a different Site Collection's database, and those Alerts send email just fine!

Although everything appears to be working as it should, I am intermittently  left without alert emails.

The one thing that seems to work is to reset the local cache on the server that has the Timer Lock. This is accomplished by performing the following actions:

On the server with the Timer Lock:
  1. Stop the Windows SharePoint Services Timer service
  2. Navigate to "C:\Documents and Settings\All Users\Application Data\Microsoft\SharePoint\Config\ and delete all the .xml files - DO NOT DELETE THE cache.ini FILE!
  3. Open the cache.ini file in notepad and change the number value to 1 then save the file.
  4. Start the Windows SharePoint Services Timer service.
Once this action is taken, it usually takes an hour or two before alerts start processing though this server again. In the mean time, any activities that trigger alerts for this server will be queued up in the eventcache table and will be sent when the server resumes processing.

UPDATE:

I am happy to report that since I added a scheduled job to run the following batch script every morning at 4:45 am, Alerts have been runnining without fail. In order for this to work, you need to make a copy of the cache.ini file with the number value set to 1 and placed it in the C:\Documents and Settings\All Users\Application Data\Microsoft\SharePoint\Config\ directory.

net stop "Windows SharePoint Services Timer"

del /F /Q "C:\Documents and Settings\All Users\Application Data\Microsoft\SharePoint\Config\a58ec05c-344f-487c-a8e6-cf0365b86458\*.*"

xcopy "C:\Documents and Settings\All Users\Application Data\Microsoft\SharePoint\Config\cache.ini" "C:\Documents and Settings\All Users\Application Data\Microsoft\SharePoint\Config\a58ec05c-344f-487c-a8e6-cf0365b86458\*.ini" /Y

net start "Windows SharePoint Services Timer"


Reference
 More information about clearing the file cache is available from http://support.microsoft.com/kb/939308