Jump to content

Huge Spike in Load After Moving to IPB 4.1


Recommended Posts

You may be a victim of Memcached bug..... ?

Try to update to the new version Memcached 1.4.30:

Overview
Bugfix release, with a critical fix to large item support.

Fixes

  • Add MemoryDenyWriteExecute to the systemd service
  • Handle end of line comment on memcached.conf
  • Add missing parameters and escape hypens as minus to manpage
  • Modernize unit file in systemd
  • crawler now uses rate limiter sleeps properly (CPU overusage)
  • add slab_chunk_max to stats settings

Importantly:
 

  • fix over-allocating with large item support.

If you set -I 2m, memcached was allocating 2 megabytes of memory per page, then only using 1mb. This would lead to it hitting the malloc limit with only 50% of the memory available before. This gets worse the more distant -I is from 1MB.

If you are still seeing issues with memory efficiency with large item support (-I set higher than 1m default), try the startup setting: -o slab_chunk_max=524288 Most workloads will function fine, and it should nearly always be better than how memory efficiency worked prior to the new feature (when using large items).

Link to comment
Share on other sites

  • Replies 67
  • Created
  • Last Reply
21 minutes ago, ASTRAPI said:

You may be a victim of Memcached bug..... ?

 

I very much doubt that this is a memcached bug because I see the exact same behavior if I enable redis for caching instead of memcache. It must be something in PHP, either the IPB side or the PHP core/extensions.

Link to comment
Share on other sites

2 hours ago, Ghan said:

I very much doubt that this is a memcached bug because I see the exact same behavior if I enable redis for caching instead of memcache. It must be something in PHP, either the IPB side or the PHP core/extensions.

Do the same thing happen if you turn off guest page caching, while still leaving memcached/redis caching enabled?

You should be able to turn it off by adding/changing 

define('CACHE_PAGE_TIMEOUT', 0 );

To your constants.php.

If you haven't already done so, you could also try and see what happens when you change the store config from filesystem to database. 

define( 'STORE_METHOD', 'Database' );
define( 'STORE_CONFIG', '[]' );

 System -> Settings -> Advanced Configuration -> Data Storage if you would prefer to have the constants.php generated for you and would not like to tinker with it manually. 

Link to comment
Share on other sites

3 hours ago, Flitterkill said:

Lindy and Rhett are following this so they should read this eventually but still; bug report. 

Are the MySql reads/sec still stupid high with memcached off?

The MySQL reads are probably back to a more regular level now that his conversion is done, as high mysql reads are to be expected during the conversion process. For example IPS retrieves the entire smileys-table for each single post it converts. (Which is part of the reason why you see the huge increase for the select type scan on this image from a community upgrade I did)

Skjermbilde 2015-09-04 kl. 18.32.13.png

Contrary to what I've seen IPS Staff say I've found there to be no noticeable performance hit to surfing the board normally when the post rebuild is in process myself. 

Link to comment
Share on other sites

Oh I know about the increased load during rebuild, etc.

Quote

72.64 inserts/s, 213.26 updates/s, 47.98 deletes/s, 292044.99 reads/s

Almost 300,000 reads a second is a whole other thing...

I'm really curious how this plays out. 

EDIT: And I'd love to see some a/b testing between memcached/redis on and off with that reads/sec stat.

Link to comment
Share on other sites

The database seems mostly unchanged from when we had caching enabled. This is from 9 or so hours of the database running after disabling caching:

 

Main thread process no. 30590, id 140555399796480, state: sleeping
Number of rows inserted 376964, updated 14409007, deleted 414055, read 14591194762
17.33 inserts/s, 214.80 updates/s, 7.83 deletes/s, 277792.20 reads/s

 

We did switch the datastore from database to filesystem while the issues were going on - that doesn't seem to have had much of an impact either. I was hoping that changing it to filesystem would reduce load on the database, but that didn't happen.

Link to comment
Share on other sites

3 minutes ago, Ghan said:

The database seems mostly unchanged from when we had caching enabled. This is from 9 or so hours of the database running after disabling caching:

 

Main thread process no. 30590, id 140555399796480, state: sleeping
Number of rows inserted 376964, updated 14409007, deleted 414055, read 14591194762
17.33 inserts/s, 214.80 updates/s, 7.83 deletes/s, 277792.20 reads/s

 

We did switch the datastore from database to filesystem while the issues were going on - that doesn't seem to have had much of an impact either. I was hoping that changing it to filesystem would reduce load on the database, but that didn't happen.

I would still recommend ditching third party items to get a baseline, if one of them won't disable properly and is causing issues, I would have the author fix that. 

 

Link to comment
Share on other sites

Just now, Rhett said:

I would still recommend ditching third party items to get a baseline, if one of them won't disable properly and is causing issues, I would have the author fix that. 

 

Disabling all plugins with caching still leaves the load issue. The load issue seems to be from having caching enabled, regardless of addon status (whether enabled or disabled).

Link to comment
Share on other sites

2 minutes ago, The Dark Wizard said:

Disabling all plugins with caching still leaves the load issue. The load issue seems to be from having caching enabled, regardless of addon status (whether enabled or disabled).

There is only really one way to get to the bottom of this issue.

1. Disable all third party items, (all third party plugins & apps) 

2. Disable memcached for now, which I believe you have.

3. Get a baseline with only our software, no third party items at all.

Once you get a baseline, then you can enable them one at a time to isolate the issue. 

 

Link to comment
Share on other sites

7 minutes ago, Ghan said:

We did switch the datastore from database to filesystem while the issues were going on - that doesn't seem to have had much of an impact either. I was hoping that changing it to filesystem would reduce load on the database, but that didn't happen.

Have you tried with the guest page caching turned off while you leave redis / memcached on? 

Link to comment
Share on other sites

6 minutes ago, TSP said:

Have you tried with the guest page caching turned off while you leave redis / memcached on? 

I just tried that - same behavior. Within 60 seconds of enabling memcached, the server load went from 3 to around 30. Very odd.

Link to comment
Share on other sites

@Ghan - You are not the only one we are having exactly the same issue, memcached and load. We are using AWS so we are on max tier now for EC2 and RDS just to get the server stable. This is a huge cost upgrade just to get a stable forum.

We are disabling memcache tonight and reverting to filesystem to see if we can get the forum performance back to acceptable levels. We cannot try the latest version of memcached as amazon doesn't have it in elasticache yet.

Link to comment
Share on other sites

Digging a bit deeper i can see some delays in general like:

# User@Host: mydatabase_db[ mydatabase _db] @ localhost []
# Thread_id: 33367  Schema: mydatabase _db  QC_hit: No
# Query_time: 10.652031  Lock_time: 0.000056  Rows_sent: 0  Rows_examined: 1
# Rows_affected: 1
SET timestamp=1471572339;
DELETE FROM ibf_sessions WHERE ip_address='xxx.xxx.xx.xx';

and:

# User@Host: mydatabase _db[ mydatabase _db] @ localhost []
# Thread_id: 33358  Schema: mydatabase _db  QC_hit: No
# Query_time: 13.846971  Lock_time: 0.000014  Rows_sent: 2825756  Rows_examined: 2825756
# Rows_affected: 0
SET timestamp=1471572353;
SELECT /*!40001 SQL_NO_CACHE */ * FROM `ibf_posts`;

 

forum is not very active and everything is loading under 2 seconds....

 

Link to comment
Share on other sites

  • 1 month later...

This was quite an informative topic, lots of knowledgeable people responding.  Good to hear you're running smoothly.  If you don't mind sharing, what kind of concurrent usage do you have registered user and guest wise on 1.4.31?  I know that's a very loose question depending on what your site actually has/does, but it's a general baseline for comparison ^_^ 

Link to comment
Share on other sites

Quote

I know that's a very loose question depending on what your site actually has/does

Exactly it is not good to replicate that kind of settings on any server....

Just post your configuration and i may be able to advice on it if something is wrong there.....Also post server specs....VPS or dedicated?

Link to comment
Share on other sites

OK, I'm actually on 3.4.9, but have been through 3 hosts now on various plans due to high resource usage for relatively few users (~100 or less...often closer to 50-60 which is really "nothing" for a site).  It's almost always CPU tied to php and/or mySQL processes based on a top and SHOW PROCESSLIST; never has anything running at the time of the issues.  Currently on a VPS.  Just debating if IPS is an intensive suite of software (especially as you throw in developer add-ons) with inherently more demanding resource requirements in 4.x.  My concern is upgrading to 4.x and it REALLY tanking my host.  I just hooked up CloudeFlare per their recommendation to see if there's anything else out of the ordinary that might be translating to high resource usage from high traffic.  I'm suspecting not based on the usage logs by IP, and I have most bots regulated on whitelists or blocked entirely on blacklists, likewise foreign (not demographic relevant) IP blocks (i.e. Russia, China, etc) given it's a locale specific site.

On the fence between going 4.x or just bailing out to something more simple since these users really only care about the forum but want things like Adriano's Recent Topics as a main page feature, and like the thumbnail topic add-on as an eye-catcher (similar to reddit). They're also fans of the SuperNewsFeed's activity stream.  These are pretty straightforward "wants" but seem to correlate to some high CPU usage, and admittedly the mainpage of the site is a beast in size trying to give them a "full" view of the latest and greatest site related content ala sidebars and topbar menu items such as countdown timers to site events.  May be the inherent nature of the site, I already bailed on the embedded Shoutbox on the mainpage, as well as a Featured Content slider.

The crux of it is whether IPS is causing via the amount of SQL and PHP processing, or if that is site software agnostic and more the nature of the site's design.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.

×
×
  • Create New...